©Copyright JASSS
Oliver Mannion, Roy Lay-Yee, Wendy Wrapson, Peter Davis and Janet Pearson
(2012)
JAMSIM: a Microsimulation Modelling Policy Tool
Journal of Artificial Societies and Social Simulation 15 (1) 8
<http://jasss.soc.surrey.ac.uk/15/1/8.html>
Received: 15-Mar-2011
Accepted: 13-Oct-2011
Published: 31-Jan-2012
Abstract
JAMSIM (JAva MicroSIMulation) is an innovative synthesis of open source packages that provides an environment and set of
features for the creation of dynamic discrete-time microsimulation models that are to be executed, manipulated and interrogated
by non-technical, policy-oriented users. Combining the leading open source statistical package R and one of the foremost
agent-based modelling (ABM) graphical tools Ascape, JAMSIM is available as an open source tool, for public reuse and
modification. Here we describe microsimulation, our functional requirements, a review of tools used by other micro-simulators
and an evaluation of existing software, followed by the architecture, features and use of JAMSIM.
Keywords:
Microsimulation, Software Frameworks, Policy Tool, Java, R
Introduction
1.1
Microsimulation is an empirically based data modelling methodology traditionally used in the areas of taxation, pensions, and
other types of economic activity, but is increasingly being applied to the social sciences (Brown and Harding 2002).
Microsimulation can be distinguished from other types of computer modelling in that it simulates the state and behaviour of
individual units such as people or households and relies on empirical data for initial conditions and rules (Gilbert and Troitzsch
2005). Investigation of point estimates and distributions of outcomes can then be performed for a range of heterogeneous
subgroups of interest. In empirical terms, each unit is represented by a data record containing a unique identifier and a set of
associated attributes, such as age, gender, marital status and education. A set of rules (for example transition probabilities)
intended to represent individual preferences and tendencies, is then applied to each unit, leading to simulated changes in state
and behaviour. The end result is an individual life history and its projection into a future time period.
1.2
Microsimulation can be a useful tool for policymakers because it allows an evaluation to be conducted of the effects of proposed
government policy interventions before they are implemented in the real world. We have undertaken two policy-oriented
microsimulation projects that have required us to model characteristics of certain sub-populations (children in one and elderly
people in another), forecast outcomes, and develop and run scenarios of interest to policymakers. The ultimate goal of our
research is to assist policymakers to provide sound advice and thus, as with many other microsimulation models, these models
are designed for use by government agencies.
1.3
Microsimulation models used by policymakers ideally need to be encapsulated in self-contained software applications that can be
run on a desktop computer and easily operated by non-technical users. While there is a large pool of software toolkits for agentbased modelling (ABM), there is a dearth of tools for microsimulation. We firstly describe the functional requirements of policyoriented microsimulation models. We then evaluate existing software tools for this purpose, before describing the microsimulation
tool JAMSIM (JAva MicroSIMulation) that we have developed.
Requirements
2.1
The functional requirements of the microsimulation toolkit were developed from the nature of the particular modelling problems we
were addressing. The first was modelling primary care in an ageing society (PCASO) (Davis et al. 2010; Pearson et al. 2011)
and was originally programmed for a technical audience in SAS. We wanted to convert this into a tool usable by policymakers.
http://jasss.soc.surrey.ac.uk/15/1/8.html
1
02/02/2012
The second was a life course approach modelling health outcomes for children using a longitudinal data set. The requirements
we developed for an end-user tool implementing these models are listed in Table 1.
Table 1: Functional requirements of our microsimulation models
Base file input
Parameter input
Scheduling mechanism
Simulation techniques
Scenario testing
Output
User interface
Performance
Create individual micro-level population units, or agents, from multiple data sets, which may require
some data manipulation and merging.
Load multiple tables of parameters from CSV/Excel files, which can be modified in the user interface
to test different scenarios.
Simple discrete-time. State changes occur at every time period.
Stochastic equations involving Monte Carlo draws from discretely specified probability distributions.
Requires random number generation.
Prediction from statistical models.
Transition matrices.
Multiple runs (with same parameters but different random seeds) to obtain a robust mean estimate
and confidence interval.
Reweighted inputs or outputs to adjust population distributions.
The ability to directly change and/or reweight any continuous or categorical variable used in the
model. The model can then be rerun with the changed variables and the outputs compared with a
baseline run.
The ability to export simulated results for analysis in external software as well as GUI features for the
display of:
i) Aggregates: charts and tables of aggregate information per year, e.g. frequency distribution, mean,
standard deviation and other descriptive statistics; a facility to weight and reweight the sample to
obtain representativeness and for scenario testing.
ii) Individuals: the complete data set (i.e. state of each individual) after each year; ability to track an
individual history (state changes over time).
To encourage a wide user-base, the user interface should be intuitive. It should be "easy" for an enduser to operate functions including simulation control, scenario testing, and output display.
External end-users should be able to perform multiple runs of a multi-year simulation on a desktop
computer in a reasonable amount of time (maximum of 5 minutes).
Software evaluation—Microsimulation models
3.1
In determining an appropriate software platform for implementing our models, we began by researching software implementation
details and application features of existing microsimulation models. In addition we consulted articles providing general software
selection and development advice for microsimulation (Percival 2007; Scott 2003). We also communicated with existing
microsimulation model users and developers on the SIMSOC mailing list, and our own contacts, to seek their feedback and
advice. A summary of our microsimulation model survey is shown in Table 2. Rather than a comprehensive survey of all available
microsimulation models, this table shows models that were either well known and widely used, similar enough in domain content
and/or features to the type of models we sought to build, or distinct and unique in implementation and approach from other
models in the field.
Table 2: Relevant Microsimulation Models and Software Application Details
Name / Yr
Type, Domain
Language
GUI Features
Availability
APPSIM
2005-
Dynamic life-course
C# (Percival 2007)
Simulation control, alignment,
scenario testing, sensitivity
analysis
Partner
organisations
only
CARESIM
1998-?
Dynamic,
distributional effects
of care charging
regimes
SAS (Wittenberg et al. 2007)
No GUI
Unknown
EUROMOD
1996-
Static, tax-benefit
system
C++ (Immervoll et al. 1999;
Sutherland 2007; Sutherland et
al. 2008)
Simulation control, scenario
testing
On request
for not-forprofit use
LABORsim
2006-?
Labour supply
modelling
Java and JAS (Leombruni and
Richiardi 2006)
Simulation control, parameter
changes,
output tables and graphs
JAS
framework
downloadable
LifePaths
1994-
Dynamic life-course
C++ and Modgen (Rowe and
Gribble 2007)
Simulation control, output
tables, BioBrowser
Downloadable
MicMac
2005-2009
Dynamic life-course
Java, R and JAMES II (Gampe
et al. 2009; Zinn et al. 2009)
Simulation control, (nonintegrated pre- and postsimulation processing in R)
Downloadable
http://jasss.soc.surrey.ac.uk/15/1/8.html
2
02/02/2012
MIDAS
Dynamic life-course
C++ and LIAM
Provides GUI but features
unknown
On request
PENSIM
1997-
Pension plans
C++ with custom model
specification language (Holmer
et al. 2011)
Simulation control
Downloadable
SAGE
1999-2005
Dynamic life-course
C++ with custom model
specification language
(Evandrou et al. 2007; Scott
2003)
Simulation control
Unavailable
SVERIGE-3
?
Dynamic, spatial lifecourse
C# with custom model and
equation specification (Holm et
al. 2007)
Provides GUI but features
unknown
Unavailable
due to data
restrictions
3.2
Several of these models (LABORsim, LifePaths, MicMac, MIDAS, PENSIM, SAGE, SVERIGE-3) were built within a generalised
framework that caters to more than a single specific model. However, because of the complexity of different specialist needs,
there is currently no single predominant or generic gold standard microsimulation framework. Using a generalised framework
allows the model developer to enjoy the benefits of modularisation and reuse. This includes reducing the amount of code and
documentation that needs to be written from the ground up, which shortens development time and decreases costs. In addition,
reusing software components that have already been extensively tested reduces error potential. Some level of reuse, if possible,
is therefore desirable, and we agree with Ropella et al. (2002: 10) that in only very rare situations would it be advantageous to
develop a system without utilising any existing software packages.
3.3
The generalised frameworks used by the above models are of two types. The first is a model specification framework (MSF)
that consists of: i) a bespoke declarative language for model features such as agents, events/state transitions, parameters and
output tables, and ii) an application environment that provides an event queuing/simulation scheduling mechanism, a user
interface and input and output formats. Typically these frameworks do not require a model developer to be a computer
programmer as well, but they do require learning the specifics of the model specification language and operation environment
provided. Model specification frameworks are used by LifePaths, PENSIM, SAGE and SVERIGE-3.
3.4
Of the MSFs employed by these models only the Modgen framework (Statistics Canada 2011a) used by LifePaths is readily
available and well supported. Modgen is a dynamic microsimulation-specific toolkit, as opposed to an ABM one, with a complete
set of features including base file loading (from a custom .DAT file format), simulation, aggregate outputs (in tables) and individual
agent life history output (via the included BioBrowser tool). Its custom model specification language introduces a learning curve
for the model developer, but once learnt it can be used to implement a wide range of dynamic models. It supports both discrete
and continuous time event frameworks (for details on the distinction, see Scott 2003:4). Although time-based processing is
possible, in which all individuals are simulated at time t before moving to time t+1, Modgen primarily uses case-based processing
of individuals. In other words, each case is simulated from the beginning to the end before the simulation of the next case begins
(Statistics Canada 2011b). This limits the modelling of interactions to only between individuals in each case rather than the whole
population. The advantage of case-based processing is that cases can be simulated in parallel. Modgen also allows open
populations, or ones in which new individuals can be generated in response to certain events such as partnership. Overall,
Modgen came very close to meeting all our requirements. However, although the support options were significant, we felt
uncomfortable relying on the continued availability of that support within a closed source model. The closed source model also
limited the ability of Modgen to be extended and extra functionality added if and when we required it.
3.5
Two other specification-style frameworks deserve mention. While SAS is a general purpose data analysis tool rather than a
model specification framework per se, it is similar in that it provides a custom language and application environment. In part
because of prior experience with the SAS language in other domains, SAS has been adopted by some microsimulation
developers. CARESIM is an example of a SAS microsimulation model, as is our earlier work with the PCASO model (Pearson et
al. 2011). However, because of SAS's lack of a user friendly GUI, it was ruled unsuitable for a tool that was aimed at policy endusers. Secondly, UMDBS (Sauerbier 2002) is a dynamic time-based MSF written in Smalltalk that often receives mention in
software reviews. We also considered it for our modelling purposes but decided against it because of its limited user interface, its
closed source and non-extensible nature, and its lack of support and recent releases.
3.6
The second type of generalised framework is a set of libraries, or packages, that together provide functionality to implement a
microsimulation model. In comparison to an MSF like Modgen these frameworks are typically more modular and extensible,
especially so if they are open source, but less complete in functionality. The functionality that is typically provided includes a
simulation event queue, a user interface for simulation control, and varying levels of output functionality. Implementing a specific
model in a library framework requires additional programming in a general purpose programming language like C++ or Java,
rather than a model-specific language.
3.7
Of the microsimulation models sampled above, LABORsim, MicMac and MIDAS use a library framework. MIDAS uses the
comprehensive and flexible C++ Life-cycle Income Analysis Model (LIAM) framework (O'Donoghue et al. 2009). LIAM offers
single and multi-cohort dynamic life-course microsimulation with a graphical user interface and has been used to model the
redistributive impacts of tax-benefit and pension systems. LIAM is used for problems that have closed populations, i.e. in which
http://jasss.soc.surrey.ac.uk/15/1/8.html
3
02/02/2012
no new individuals are generated, and which use discrete-time state changes. It supports transition matrices, regression and
arithmetic transformations, and includes a marriage market module and functionality to handle migration. Behavioural feedback
loops can be incorporated and alignment with external data sources is possible. However, after consideration, the availability of
this framework on a request only basis and a lack of documentation and structured support limited its appeal to us.
3.8
LABORsim uses the open source Java Agent-based Simulation (JAS) library ( Sonnessa 2003). It is primarily an ABM framework
with an event list that can fire events at specified discrete time intervals. It has not been explicitly designed with microsimulation
in mind, although microsimulation can be accommodated within its framework. The advantage of JAS over other ABM toolkits is
that it provides support for reading from CSV and Excel files. However its last release (v1.2.1) was in 2006 (Sonnessa 2011) and
so without recent activity or a viable support base it was ruled out.
3.9
MicMac is a Java based dynamic microsimulation model that uses the JAMES II (Himmelspach and Uhrmacher 2007) library
framework, which had a v0.8.3 alpha release in 2009 (Uhrmacher et al. 2011) and has planned future releases. JAMES II is
readily available, open source, and based on an extensible plug-in framework with a large repository of existing plug-ins. Its
focus is on biological simulation problems, although its plug-in design means it need not be restricted to any one particular
simulation paradigm. JAMES II supports both discrete and continuous time simulations of open and closed populations. There
has been recent activity on the JAMES II project, a range of publications that cite the Himmelspach and Uhrmacher (2007)
conference paper, good documentation, and several projects that branch off JAMES II. However, its status as an "alpha" release
led to our perception that it had not yet reached the maturity of other offerings.
Software evaluation—ABM toolkits
4.1
Outside of microsimulation, library frameworks have been very popular, and they are well developed in the ABM world. The main
difference between ABM and microsimulation is that microsimulation models rely on empirical micro data sets where agent-based
models do not. ABM toolkits typically lack the ability to easily read data sets of multiple variables and convert them into agents for
simulation. Microsimulation models make use of statistical modelling techniques, and probabilities derived from empirical data, to
transform inputs into outputs. ABM toolkits do not generally cater for these sorts of transitions but instead provide rule-based
transformations which are more useful for the theoretically-derived models of the ABM world. Although ABM and microsimulation
differ in intent, purpose and technique, there are enough similarities in software functionality that we decided to include general
purpose ABM library toolkits in our selection process.
4.2
In August 2009 we searched relevant computer simulation journals and conducted internet searches to determine what was
available. In particular we found and consulted a number of ABM software reviews (Gilbert and Bankes 2002; Lawson 2008;
Nikolai and Madey 2009; Railsback et al. 2006; Tobias and Hofmann 2004). In total we identified 30 potential microsimulation/ABM
toolkits or environments, of which most were discounted because their focus was on a different problem domain, or they lacked
functionality, were not extensible, or were outdated and no longer offered support.
4.3
Of those we considered in depth, Anylogic Professional Edition v6.5 ( XJ Technologies 2011) came closest to meeting our
requirements. It provided the most comprehensive and visually appealing user interface, including graphs and advanced
visualisations, of any of the tools we examined. It is, however, closed source, and it requires a license fee for each run-time
instance of a deployed model. We felt that the inability to modify the source code meant we would not be able to fully tailor it to our
needs, particularly in the area of statistical analysis. The remaining contenders were Repast Simphony v1.2 (North 2011a),
Repast v3 (North 2011b), and Ascape v5.6 ( Parker 2011). Like Anylogic they are all well developed, with substantial histories and
ongoing development, and existing user and support bases. Unlike Anylogic they are all open source and so provide complete
extensibility. However, they are ABM-specific and so lack data input, output, and analysis functionality.
4.4
The advantage of an open source library framework like Repast or Ascape is that a developer can select and reuse only those
library functionalities that they require. Because no one tool provided all the functionality we required, we finally decided to
combine elements of multiple open source libraries in order to meet our requirements. Our solution was to use Ascape as the
user interface and for simulation control, and combine this with R (R Development Core Team 2011) for statistical analysis and
output. This left us to write our own microsimulation transformation code in Java. This hybrid solution, "JAMSIM", was a
compromise between relying completely on a single software package (none of which offered total flexibility or met all our
requirements) and writing our own from scratch (a time-consuming process).
4.5
The existing packages we sought to utilise were either written in Java or provided interfaces to Java. We realise however, that,
as the above survey shows, C++ has clearly been the language of choice of micro-simulators. This is probably largely influenced
by their age, as at the time of model inception, alternative languages such as C# or Java were unavailable or lacked popularity. In
addition C++, as a compiled language, offers the greatest performance potential. However, given the low computational
requirements of our particular microsimulation models, we decided the potential performance gains from a machine-compiled
language would not be necessary. Overall, we reasoned that the availability of higher level programming components in an
interpreted language such as Java outweighed the advantages of speed provided by a direct machine-compiled language like
C++. This was the same conclusion reached by Percival (2007) in selecting C# over C++. Finally, Java matched the skill set
available in our organisation.
http://jasss.soc.surrey.ac.uk/15/1/8.html
4
02/02/2012
JAMSIM overview
5.1
JAMSIM is a unique synthesis of open source packages that provides an environment and set of features for the creation of
microsimulation models that are to be executed, manipulated and interrogated by non-technical, policy-oriented users. JAMSIM is
less a framework and more a loose coupling of a set of open source packages to provide a base set of functionalities for
microsimulation. Table 3 lists the packages JAMSIM combines and the functionalities they provide. These functionalities include a
user interface, file input, in-memory data set storage and transformation, access to statistical functions and graphing and table
output capabilities. In particular, JAMSIM combines the leading open source statistical package R and the Java-based Ascape,
one of the foremost agent-based modelling graphical tools. JAMSIM's loose coupling and open source heritage and development
make it an accessible, extensible toolkit for developers who want complete control over the code base. The source code and
binaries are freely available at http://code.google.com/p/jamsim/. This paper discusses features currently available in JAMSIM
v1.3.4.
Table 3: Third party Java packages used in JAMSIM
Package
Description
Ascape
The Ascape agent-based modelling toolkit is used by JAMSIM as the GUI, for the main discrete-time based
simulation loop, and for simulation control (stop, start, pause, etc.). It has been adapted to provide access to
tabular and graphical outputs in a fashion modelled loosely on Modgen.
R
JAMSIM embeds the R statistical programming language which makes available the extensive statistical and
graphing capabilities of R. Simulation objects, such as the agents, parameter files, and simulation outputs are
available in R for analysis such as producing descriptive statistics. An R console is also embedded in the
Ascape GUI for direct user interaction, and the Ascape GUI has been extended to display R graphics.
Casper
datasets
A microsimulation model requires the manipulation of micro-data sets and additional parameter files. These
data sets need to be modifiable and to have input and output interfaces. Data sets in JAMSIM are processed
using Casper datasets, a generic, in-memory data set manipulation library.
Read My
Tables
For the loading of tabular data stored in CSV or Excel files, such as the base file and parameter files.
Colt
JAMSIM provides conversion of microdata sets to matrices for manipulation in Colt. Colt is a high performance
scientific and technical computing library for Java that provides multidimensional matrices, linear algebra,
histogram and basic statistical capabilities.
JAMSIM Models
6.1
This section describes the types of microsimulation models JAMSIM can be and has been used for. In particular, JAMSIM is
currently being used to implement policy-oriented health dynamic microsimulation models. Table 4 details the essential
components of a JAMSIM model (JSM), describing what they do and in which location (language) they may be implemented.
Table 4: Components making up a JAMSIM model (JSM)
Component
Location
RootScape
Java
Root component that defines, loads and sets up all other components. Responsible for the
loading and initialisation processes.
ScapeData
Java
Defines all inputs used by the JSM, including the base file, data sets, data dictionary, parameter
sets (including weighting calculators) and analysis menu commands.
MicroSimCell
Java
Defines the population unit, or agent, that makes up the base file and is the micro unit of
analysis, e.g. patient, child, household, etc.
Iteration
steps
Java
Steps that transform the base file. These implement statistical techniques and models that
move the base file from time step t to time step t+1.
http://jasss.soc.surrey.ac.uk/15/1/8.html
Description
5
02/02/2012
Outputs
Java / R
The output tables and graphs that display simulation results.
6.2
JAMSIM supports both static—providing a cross-sectional snapshot—and dynamic—incorporating change over time—models.
To project a static model forward in time, final simulated results can be reweighted to produce a distribution that matches that of a
future population. In a more realistic way, JAMSIM can dynamically model a series of state changes, or outcomes, for a set of
individuals. For example, if we were modelling the early life course of children these outcomes may be a change in parental
characteristic (e.g. Mother starts or stops smoking), a change in family status (e.g. parents break up or partner) or a health
status outcome (e.g. a visit to the doctor). JAMSIM is dynamic in that these changes are generated as part of a time sequence in
which the current state at time t is dependent on the previous state at time t-1.
6.3
JAMSIM was designed for modelling problems that involve discrete time state changes, rather than continuous time or case
based state changes. JAMSIM's core simulation process is a sequential loop in which state changes can be generated at fixed
intervals (iterations) simultaneously for all individuals. State changes may occur at each iteration for each individual according to
a set of transition probabilities and the individual's unique demographic attributes. Transition probabilities for individuals may be
specified via external tables.
6.4
After generating a unique transition probability for each individual, a Monte Carlo simulation is typically used to determine if a state
change will actually occur for an individual or not. This is done by drawing a random number on the interval [0,1] from a specified
distribution. The drawn random number is compared to the transition probability, and if it is less than the random number then the
transition occurs. While JAMSIM is aimed at simulation models that involve probabilistic state changes, its Ascape heritage allows
deterministic rule models to be used in much the same way as they are with ABMs.
6.5
The Ascape component of JAMSIM provides the ability to create not only closed but also open models, in which new population
members can be generated during the simulation. As in Ascape, JAMSIM models can be composed of a hierarchy of `scapes',
which is a collection of agents that can represent basic units such as a child, parent, household, etc. (see Parker 2001). This
allows for models that have multiple levels of simulation and analysis, such as individual, family, household or subpopulation.
Such models may also accommodate specific linkages between individuals and/or different units, such as links between parents
and children, or individuals that are partnered.
6.6
JAMSIM has been used for the development of static health microsimulation models. In particular it has been used to model
pathways to primary care under scenarios of population ageing (Davis et al. 2010; Pearson et al. 2011). The implementation
involved a data set of 13,500 patients. From sets of discretely specified probability distributions, the simulation models recent
illness conditions for each patient, the number of times they might visit a general practitioner (GP), the primary diagnosis given by
the GP, and the associated GP activity (e.g. investigation, prescription, followup, referral).
JAMSIM Model Simulation Process
7.1
The JAMSIM model simulation process consists of setup, simulation and output phases. This section describes these phases
using the JAMSIM Example Model (JEM). JEM is an implementation of the Simpex model from Modgen (Statistics Canada
2011a). It is a fictitious and simplistic model that serves to illustrate key features of JAMSIM. JEM simulates the disability state of
a population of males and females. A disability state is either no disability, or mild, moderate or severe disability. The disability
state of each individual influences their earning capacity, which is the key output of the model.
Setup
7.2
A population in JAMSIM is represented by a collection of Java MicroSimCell objects. The initial state of a population can be
generated from parameters, or each individual unit may be pre-specified in a base file. When starting from a base file, each item
in the base file represents a single unit (e.g. child, patient, household) with a corresponding set of variables (e.g. gender, age,
income, visits to the GP). When a JSM starts, it loads a user-specified CSV or Excel file that represents this initial base file. The
variable data types can be determined automatically by inspection, or specified by a data set definition file. A variable data type
may be specified as "optional", which will allow missing values. For each row in the base file, a corresponding MicroSimCell is
created with the variables specified in the base file. After loading, the entire set of MicroSimCells is stored in R as a dataframe.
http://jasss.soc.surrey.ac.uk/15/1/8.html
6
02/02/2012
Figure 1. Base file display
Figure 1 exhibits the base file from JEM. The fabricated base file, which is not based on any real world data, contains the
variables age, alive, sex, weight, etc. Those variables that are currently empty (e.g. disabilityState.1) will be generated during
simulation.
7.3
A JSM may rely on a range of external data, e.g. transition tables, statistical coefficients and intercepts, incidence rates,
adjustment factors, and event probabilities. These can all be supplied as CSV or Excel files, and loaded in and represented as
Casper data sets, matrices, and/or R dataframes. Once loaded they are available globally for use in both simulation and analysis.
Figure 2 shows a screenshot of the data sets loaded from external data sources by JEM to be used in the simulation. The
disability state transition probabilities shown below are used to model the change of disability state from year to year, and the
earnings scale is used to calculate the amount earned per year, by disability state.
http://jasss.soc.surrey.ac.uk/15/1/8.html
7
02/02/2012
Figure 2. Data sets displaying statistical model parameters
Before the simulation begins, there may be some pre-simulation calculations that need to take place. For example the base file
may need to be augmented with additional fields from data sets, or data sets may need to be replicated and adjusted to
accommodate seasonal variations. Such manipulation can be performed in either Java or R.
7.4
In particular the user may wish to test a particular scenario. This can be done by changing the weights applied to the results of
the simulation according to the desired proportions of a particular categorical variable, e.g. to see what the results would be if the
balance of gender proportions of the results were changed from 50-50 to 10-90. Figure 3 shows the user interface to change the
weightings of the 'sex' variable in order to test a different gender scenario in JEM. Alternatively it is possible to adjust continuous
and categorical variables directly in the base file. Continuous variables can be displayed in bands and an adjustment applied to
each band. Categorical variables can be adjusted to match desired proportions by reassigning the variable randomly or
according to propensity scores. At the end of the simulation, results will be displayed for both the baseline and the scenario
tested.
Figure 3. Reweighting results by the variable Sex
http://jasss.soc.surrey.ac.uk/15/1/8.html
8
02/02/2012
7.5
Finally, the user may wish to perform analyses on the base file before it is transformed by the simulation. This may involve
inspecting graphs specified by the JSM, or by user-specified R commands entered on the console. Figure 4 illustrates a
graphical analysis that shows how the distribution of gender has been changed to test the 10-90 female-male scenario in JEM.
Figure 4. Graphing variables in the base file before simulation
Simulation
7.6
The simulation process is Ascape-based and consists of two nested loops. The outer loop is a user-specified number of
simulation runs, and the inner loop is a JSM-specified number of iterations. An iteration represents a single discrete time period,
e.g. a year, and may occur any number of times, e.g. 5 times or 5 years. Within each iteration a series of state changes or
outcomes can be modelled for a unit, e.g. individual or household. Outcomes within each iteration are ordered, so that dependent
outcomes are processed after independent outcomes. JAMSIM does not provide a declarative framework to define outcomes,
the way they are generated, or their dependency hierarchy. Instead outcomes are defined directly via Java code and so their
order is implicit.
7.7
The code to model outcomes is specified in Java relative to a single Java object (the MicroSimCell). Outcomes are generated for
all objects, although not all objects may have their state changed. An outcome is stored in a standard Java variable and generic
Java code can be used to generate the outcome. Typically, although not always, outcomes in JSMs are stochastic and occur via
a Monte Carlo draw from a set of probabilities. These probabilities are generated based on the unique attributes of the current
unit, and may be loaded from external tables of probabilities. As an example, in JEM the iteration begins by probabilistically
calculating the current disability state of an individual from discretely specified probability distributions based on their sex, age,
and current disability state. From the current disability state their annual income is determined by looking up a scale and adding
the result to their cumulative life total of earnings. Finally, whether an individual dies is calculated from a probability which varies
according to their sex and age.
http://jasss.soc.surrey.ac.uk/15/1/8.html
9
02/02/2012
7.8
Each outcome can be stored as an attribute of the unit, and thus may be used as an input to other outcomes in the current or
next iteration. In addition, an outcome may be stored in an outcome array as a series across all iterations. This preserves the
outcome's value in all previous iterations, rather than overwriting it on each iteration, so that it can be used to produce periteration results. After each iteration, the entire set of MicroSimCells is output to R as a dataframe. Output tables, showing
frequencies or means, and graphs, may be generated between iterations (in either Java or R) and displayed to provide inprogress results.
Outputs
7.9
A simulation run is a single run through all iterations. After a run, the set of MicroSimCells will be in their final state and will include
any outcome arrays containing results from all iterations. A set of run results may then be output and stored in R before the
MicroSimCells are reset and the run is repeated. At the end of all runs, the individual run results can be collated in R or Java and
the mean of results across all runs can be displayed in tables or graphs.
Figure 5. End of simulation output tables
7.10 Figure 5 shows the following single run table outputs from JEM:
Number of agents and people
A summary table which shows the total number of agents, or cases, simulated—in this case 1,000. These are
scaled up to a population size of 69,899,568 for both the base simulation and the scenario.
Population by gender
A breakdown of the population by gender, for both the base simulation and a scenario in which 10% of the
population are female and 90% are male.
Population age groups at death by gender (scenario)
A cross tabulation showing the age group at death by gender.
Population average age at death
http://jasss.soc.surrey.ac.uk/15/1/8.html
10
02/02/2012
Population average age at death
The average age at death of the base and scenario populations.
Earnings summary (scenario)
A summary of total and average earnings for the scenario population.
Accumulated earnings (scenario)
The accumulated earnings per year by gender.
7.11 Likewise, output data can also be graphed using the functionality of R graphics. Figure 6 displays an age-sex pyramid, a bar
graph of earnings by age group and gender for the scenario population, and line graphs which can be used to compare baseline
and scenario accumulated earnings over the life course.
Figure 6. Output graphs including baseline comparison vs. scenario (weighted)
Limitations and future enhancements
8.1
JAMSIM has been built for models that run within the capacity and time constraints of a typical modern desktop machine. There
is no hard upper limit to population sizes or the numbers of variables that may be used—instead, the main restriction is the
amount of memory available. JAMSIM has been comfortably used on an Intel Core 2 Duo machine running Windows XP, with
4GB of RAM. As an example, in one simulation with a population of 80,000 units and 35 variables, the JAMSIM process running
on this machine consumed 250MB of memory. The simulation consisted of 4 iterations in which 10 transitions were calculated
every iteration. A single run took 9 seconds.
http://jasss.soc.surrey.ac.uk/15/1/8.html
11
02/02/2012
8.2
JAMSIM uses the same discrete-time simulation loop as Ascape. Because of this, it does not support some of the more
comprehensive discrete-event scheduling mechanisms. For example, JAMSIM does not have an event queue and cannot
generate events that occur on state changes. More generally, JAMSIM does not support continuous-time changes. Continuoustime allows state change events to occur at any time, rather than at a fixed time interval, and enables them to be triggered by
other changes. For example, a fertility event can be re-computed whenever a partnering event occurs, using the new status
immediately rather than waiting until the next cycle. This allows for more flexibility in the dependency of processes, and can
better approximate the real world (for more on this see Scott 2003:15). While continuous-time changes and the associated use of
survival functions has been prominent in some well-known microsimulation models and toolkits (e.g. Modgen), it has not been a
feature of the health-related models we have developed.
8.3
JAMSIM does not provide a comprehensive set of domain level modules, e.g. marriage/partnering, migration, or labour market
modules. Again, this is largely because the types of health situations we have modelled have not required these modules. These
types of modules can be implemented in JAMSIM but the lack of any pre-existing code may make JAMSIM less appealing to
those performing more general life-cycle dynamic microsimulation. Nor does JAMSIM provide other functionality typical of these
types of simulations, in particular alignment functionality or a facility for behavioural feedback loops.
8.4
One of the key features of JAMSIM is the ability to test different scenarios. However, this is somewhat underdeveloped.
Currently it is possible to reweight results, or to change continuous and categorical variables on individuals in the base file to
produce a single scenario. Future development will include the ability to generate, save and load multiple scenarios, and to
compare them with the baseline and with one another.
8.5
JAMSIM lacks a declarative framework for model variables and instead they are specified directly as Java variables and R
objects. The problem this poses is that as the model grows it becomes difficult to keep track of all the instances where a variable
is used, which increases the time it takes to make model changes. The disadvantage of a declarative framework is that it can
add an extra level of runtime overhead. However, computing time is generally cheaper than programmer time and so future work
on JAMSIM will involve development of a declarative framework which will allow model variables to be parameterised and
specified in external parameter files rather than hard coded.
8.6
JAMSIM has been developed to be used by policy end-users, but data agreements may not allow such users access to the
original data set. JAMSIM does not provide base file encryption functionality, which would in any case only offer a moderate level
of protection as at some point the base file would need to be decrypted to be used. Instead the approach we plan to take is to
use a synthetic base file that does not represent any real individuals but has distributions of relevant characteristics that are
similar to a desired population. A safe alternative would be to only offer access to the application remotely in a secured
environment. This would require establishing and maintaining the appropriate server infrastructure.
8.7
While JAMSIM combines R and Java, model outcomes to date have only been implemented in Java. R has only been used for
results generation and graphical output. This underutilises the vast array of existing R code and packages that are useful for
modelling outcomes, for example, the ability to generate outcomes from logistic and other types of regression models. Many of
these R packages have underdeveloped corresponding open source packages in Java, or none at all, and replicating them
would require significant work. In addition, modelling outcomes in R makes the transition to microsimulation easier for statisticians
who may not be familiar with Java. For these reasons, development is currently underway to move the simulation loop and the
modelling of outcomes into R. The Java components will be retained and used specifically for the user interface, as existing R
user interfaces are relatively undeveloped and less attractive.
8.8
Other future plans include the migration of JAMSIM from the Ascape Swing GUI to the alternative Eclipse Rich Client Platform
(RCP). The Eclipse RCP is a desktop application environment that provides a high-quality native looking GUI (McAffer and
Lemieux 2006). The transition to the Eclipse RCP is planned because the current Ascape user interface has some limitations in
terms of usability. For example, the Navigator tree used to access the components of a model is unordered and the hierarchy is
unintuitive. Furthermore, some of the Ascape agent-based modelling features are present but are not used by JAMSIM and ought
to be removed for a cleaner user interface. In contrast, an Eclipse-based user interface will allow more direct control over all user
interface components. In addition, Eclipse incorporates the robust OSGI component model which supports modular and
extensible plug-ins. This increases reusability of software components, contributing to reduced error rates and lower
development costs. Overall, the Eclipse RCP environment will make the development of additional output features, such as a
single scrollable window containing multiple tables and graphs, much easier.
8.9
Prototypes of these enhancements are currently under development and are being tested in a dynamic microsimulation model
that simulates health and education outcomes for children. The implementation uses a base file of 1,100 children derived from a
longitudinal study. For each individual it simulates child, parental and family factors and from these a set of final health and
education outcomes. Intermediate factors and final outcomes are generated from probabilities derived from binomial, negative
binomial and Poisson regression models. In addition, any of the variables used in the model can be altered to test a particular
scenario and its influence on outcomes.
Conclusion
9.1
JAMSIM combines relevant components from open source packages to provide an environment and features for the
http://jasss.soc.surrey.ac.uk/15/1/8.html
12
02/02/2012
development of dynamic discrete-time microsimulation models and their use by non-technical, policy-oriented users. It has been
designed to be as flexible as possible, and a major strength is its open source nature, which gives it the potential for further
enhancement by others in the modelling community.
Acknowledgements
The completed 'Primary Care in an Ageing Society' project was funded by the Health Research Council of New Zealand. The inprogress 'Modelling the Early Life Course' project is being funded by the Ministry of Science and Innovation. We thank all project
team members. We would also like to acknowledge the constant, ongoing and ever responsive support of Miles Parker during
the development of JAMSIM, and the very helpful comments of the anonymous reviewers of this paper. Finally, we are indebted
to Martin von Randow for his meticulous proof reading of the final draft of this paper.
References
BROWN, Laurie and Harding, Ann (2002), 'Social Modelling and Public Policy: Application of Microsimulation Modelling in
Australia', Journal of Artificial Societies and Social Simulation, 5 (4), 6 http://jasss.soc.surrey.ac.uk/5/4/6.html.
DAVIS, Peter, Lay-Yee, Roy, and Pearson, Janet (2010), 'Using micro-simulation to create a synthesised data set and test policy
options: the case of health service effects under demographic ageing', Health Policy, 97 (2), 267-74.
EVANDROU, Maria, et al. (2007), 'The SAGE Model : A Dynamic Microsimulation Population Model for Britain', in Anil Gupta and
Ann Harding (eds.), Modelling Our Future: Population Ageing, Health and Aged Care (Amsterdam, The Netherlands: Elsevier).
GAMPE, Jutta, et al. (2009), 'The Microsimulation Tool of the MicMac-Project', 2nd General Conference of the International
Microsimulation Association (Ottawa, Canada).
GILBERT, Nigel and Bankes, Steven (2002), 'Platforms and methods for agent-based modeling', Proceedings of the National
Academy of Sciences of the United States of America, 99 (3), 7197-8.
GILBERT, Nigel and Troitzsch, Klaus (2005), Simulation for the social scientist (2nd edn.; Maidenhead: Open University Press).
HIMMELSPACH, Jan and Uhrmacher, Adelinde M. (2007), 'Plug'n simulate', 40th Annual Simulation Symposium (Norfolk, Virginia,
USA: IEEE), 137-43.
HOLM, Einar, et al. (2007), 'SVERIGE', in Anil Gupta and Ann Harding (eds.), Modelling Our Future: Population Ageing, Health and
Aged Care (Amsterdam, The Netherlands: Elsevier).
HOLMER, Martin, Janney, Asa, and Cohen, Bob (2011), 'PENSIM Overview', (Office of Policy and Research, Employee Benefits
Security Administration, U.S. Department of Labor).
IMMERVOLL, Herwig, O'Donoghue, Cathal, and Sutherland, Holly (1999), 'An Introduction to EUROMOD', EUROMOD Working
Papers.
LAWSON, Tony (2008), 'Methods and Tools for the Microsimulation and Forecasting of Household Expenditure - A Review',
Technology and Social Change Working Papers.
LEOMBRUNI, Roberto and Richiardi, Matteo (2006), 'LABORsim: An Agent-Based Microsimulation of Labour Supply—An
Application to Italy', Computational Economics, 27 (1), 63-88.
MCAFFER, Jeff and Lemieux, Jean-Michel (2006), Eclipse Rich Client Platform: Designing, Coding, and Packaging Javaª
Applications (Upper Saddle River, NJ: Addison-Wesley).
NIKOLAI, Cynthia and Madey, Gregory (2009), 'Tools of the trade: A survey of various agent based modeling platforms', Journal
of Artificial Societies and Social Simulation, 12 (2), http://jasss.soc.surrey.ac.uk/12/2/2.html.
NORTH, Michael (2011a), 'Repast Simphony', http://repast.sourceforge.net/repast_simphony.html, accessed 7 March.
NORTH, Michael (2011b), 'Repast 3', http://repast.sourceforge.net/repast_3/, accessed 7 March.
O'DONOGHUE, Cathal, Lennon, John, and Hynes, Stephen (2009), 'The Life-cycle Income Analysis Model (LIAM): a study of a
flexible dynamic microsimulation modelling computing framework', International Journal of Microsimulation, 2 (1), 16-31.
PARKER, Miles (2001), 'What is Ascape and Why Should You Care?', Journal of Artificial Societies and Social Simulation, 4 (1), 5
http://jasss.soc.surrey.ac.uk/4/1/5.html.
PARKER, Miles (2011), 'Ascape', http://ascape.sourceforge.net/, accessed 7 March.
http://jasss.soc.surrey.ac.uk/15/1/8.html
13
02/02/2012
PEARSON, Janet, et al. (2011), 'Primary Care in an Aging Society: Building and Testing a Microsimulation Model for Policy
Purposes', Social Science Computer Review, 29 (1), 21-36.
PERCIVAL, Richard (2007), 'APPSIM-Software Selection and Data Structures', NATSEM Working Papers (Canberra).
R Development Core Team (2011), 'R: A Language and Environment for Statistical Computing', http://www.r-project.org/,
accessed 7 March.
RAILSBACK, Steven, Lytinen, Steven, and Jackson, Stephen (2006), 'Agent-based simulation platforms: Review and
development recommendations', Simulation, 82 (9), 609-09.
ROPELLA, Glen, Railsback, Steven, and Jackson, Stephen (2002), 'Software Engineering Considerations For Individual-Based
Models', Natural Resource Modeling, 15 (1), 5-22.
ROWE, Geoff and Gribble, Steve (2007), 'LifePaths Model', in Anil Gupta and Ann Harding (eds.), Modelling Our Future:
Population Ageing, Health and Aged Care (Amsterdam, The Netherlands: Elsevier).
SAUERBIER, Thomas (2002), 'UMDBS - A New Tool for Dynamic Microsimulation', Journal of Artificial Societies and Social
Simulation, 5 (2), 5 http://jasss.soc.surrey.ac.uk/5/2/5.html.
SCOTT, Anne (2003), 'A computing strategy for SAGE: 2. Programming considerations', (London: Citeseer).
SONNESSA, Michele (2003), 'JAS: Java Agent-based Simulation library. An open framework for algorithm-intensive simulations',
Workshop on Industrial and Labor Dynamics - The Agent-Based Computational Aproach (Gandolfi 1999 edn.; Torino, Italy: World
Scientific Publishing Co. Pte. Ltd.).
SONNESSA, Michele (2011), 'JAS: Java Agent-based Simulation Library', http://jaslibrary.sourceforge.net/, accessed 7 March.
STATISTICS CANADA (2011a), 'Modgen (Model generator)', http://www.statcan.gc.ca/microsimulation/modgen/modgen-eng.htm,
accessed 7 March.
STATISTICS CANADA (2011b), 'Microsimulation approaches', http://www.statcan.gc.ca/microsimulation/modgen/newnouveau/chap2/chap2-eng.htm, accessed 7 March.
SUTHERLAND, Holly (2007), 'EUROMOD - The Tax-Benefit Microsimulation Model for the European Union', in Anil Gupta and
Ann Harding (eds.), Modelling Our Future: Population Ageing, Health and Aged Care (Amsterdam, The Netherlands: Elsevier).
SUTHERLAND, Holly, et al. (2008), 'Improving the Capacity and Usability of EUROMOD—Final Report', EUROMOD Working
Papers.
TOBIAS, Robert and Hofmann, Carole (2004), 'Evaluation of free Java-libraries for social-scientific agent based simulation',
Journal of Artificial Societies and Social Simulation, 7 (1), 6 http://jasss.soc.surrey.ac.uk/7/1/6.html.
UHRMACHER, Adelinde M., et al. (2011), 'JAMES II', http://wwwmosi.informatik.uni-rostock.de/mosi/projects/cosa/james-ii/,
accessed 7 March.
WITTENBERG, Raphael, et al. (2007), 'PSSRU Long-Term Care Finance Model and CARESIM: Two Linked UK Models of LongTerm Care for Older People', in Anil Gupta and Ann Harding (eds.), Modelling our future: population ageing health and aged care
(Amsterdam, The Netherlands: Elsevier).
XJ TECHNOLOGIES (2011), 'Anylogic Professional Edition', http://www.xjtek.com/anylogic/, accessed 7 March.
ZINN, Sabine, et al. (2009), 'MIC-CORE: A Tool for Microsimulation', Winter Simulation Conference (Austin, Texas, USA).
http://jasss.soc.surrey.ac.uk/15/1/8.html
14
02/02/2012