Tools, algorithms and methods in the context of
Model-Driven Engineering (MDE) have to be assessed,
evaluated and tested with regard to different aspects
such as correctness, quality, scalability and efficiency.
Unfortunately, appropriate test models are scarcely
available and those which are accessible often lack
desired properties.
Therefore, one needs to resort to artificially generated
test models in practice.
Many services and features of model versioning systems are
motivated from the collaborative development paradigm.
Testing such services does not require single models, but
rather pairs of models, one being derived from the other
one by applying a known sequence of edit steps.
The edit operations used to modify the models should be
the same as in usual development
environments, e.g. adding, deleting and changing of
model elements in visual model editors.
Existing model generators are motivated from the
testing of model transformation engines,
they do not consider the true nature of evolution
in which models are evolved through iterative editing steps.
They provide no or very little control over the
generation process and they can generate only single models
rather than model histories.
Moreover, the generation of stochastic and other properties
of interest also are not supported in the existing approaches.
Furthermore, blindly generating models through random
application of edit operations does not yield useful models,
since the generated models are not (stochastically) realistic
and do not reflect true properties of evolution in real software
systems.
Unfortunately, little is known about how models of
real software systems evolve over time, what are the properties
and characteristics of evolution, how one can mathematically formulate
the evolution and simulate it.
To address the previous problems, we introduce a new general
approach which facilitates generating (stochastically) realistic
test models for model differencing tools and tools for analyzing
model histories.
We propose a model generator which addresses
the above deficiencies and generates or modifies models by
applying proper edit operations.
Fine control mechanisms for the generation process
are devised and the generator supports stochastic and other
properties of interest in the generated models.
It also can generate histories, i.e. related sequences,
of models.
Moreover, in our approach we provide a methodological framework for
capturing, mathematically representing and simulating the evolution
of real design models.
The proposed framework is able to capture the evolution in terms of
edit operations applied between revisions.
Mathematically, the representation of evolution is based on
different statistical distributions as well as different
time series models.
Forecasting, simulation and generation of stochastically realistic
test models are discussed in detail.
As an application, the framework is applied to the evolution of design
models obtained from sample a set of carefully selected Java systems.
In order to study the the evolution of design models,
we analyzed 9 major Java projects which have at least 100 revisions.
We reverse engineered the design models from the Java source code
and compared consecutive revisions of the design models.
The observed changes were expressed in terms of two
sets of edit operations.
The first set consists of
75 low-level graph edit operations, e.g. add, delete, etc.
of nodes and edges of the abstract syntax graph of the models.
The second set consists of
188 high-level (user-level) edit operations which are
more meaningful from a developer’s point of view and
are frequently found in visual model editors.
A high-level operation typically comprises several
low-level operations and is considered as one user action.
In our approach, we mathematically formulated the pairwise evolution,
i.e. changes between each two subsequent revisions,
using statistical models (distributions).
In this regard, we initially considered many distributions
which could be promising in modeling the frequencies of the
observed low-level and high-level changes.
Six distributions were very successful in modeling the changes
and able to model the evolution with very good rates of success.
To simulate the pairwise evolution, we studied random variate
generation algorithms of our successful distributions
in detail.
For four of our distributions which no tailored algorithms
existed, we indirectly generated their random variates.
The chronological (historical) evolution of design models
was modeled using three kinds of time series models,
namely ARMA, GARCH and mixed ARMA-GARCH.
The comparative performance of the time series models for
handling the dynamics of evolution as well as accuracies of their
forecasts was deeply studied.
Roughly speaking, our studies show that mixed ARMA-GARCH
models are superior to other models.
Moreover, we discuss the simulation aspects of
our proposed time series models in detail.
The knowledge gained through statistical analysis
of the evolution was then used in our test model generator
in order to generate more realistic test models for
model differencing, model versioning, history analysis tools,
etc.