Tutorial:RunningSimulations

From ALPS
Jump to: navigation, search

General overview

The data structures/files used in the workflow are illustrated in the following figure

Running simulations

In the ALPS library simulations are based on the scheduler library which allows you to specify parameters for your simulations, including multiple definitions of parameters (e.g. if you want to simulate a physical system at a couple of temperatures). The scheduler library will then start jobs for every single parameter set, either on a serial or parallel machine, and uses checkpoints to prevent data loss when exceeding machine walltimes. The scheduler library asks for a job file which specifies task files for every set of parameters for which a Monte Carlo simulation shall be run. The job and task files are given in XML format, following the schema at http://xml.comp-phys.org. The scheduler will read in these files and write observables into the task file. An example job file could look like this:

<JOB>
 <OUTPUT file="parm.xml"/>
 <TASK status="new">
   <INPUT file="parm.task1.in.xml"/>
   <OUTPUT file="parm.task1.xml"/>
 </TASK>
 <TASK status="new">
   <INPUT file="parm.task2.in.xml"/>
   <OUTPUT file="parm.task2.xml"/>
 </TASK>
 <TASK status="new">
   <INPUT file="parm.task3.in.xml"/>
   <OUTPUT file="parm.task3.xml"/>
 </TASK> 
</JOB>

and an example task file like:

<SIMULATION>
 <PARAMETERS>
   <PARAMETER name="L">100</PARAMETER>
   <PARAMETER name="SWEEPS">10000</PARAMETER>
   <PARAMETER name="T">0.5</PARAMETER>
   <PARAMETER name="THERMALIZATION">100</PARAMETER>
   <PARAMETER name="WORK_FACTOR">SWEEPS * L</PARAMETER>
 </PARAMETERS> 
</SIMULATION>

Before a simulation starts, the task file just lists all simulation parameters. Afterwards results and checkpoint information will be added. See the schema documentation for more details.

Tools

Since the XML format of the job and task files is probably not what you want to deal with on a daily basis, the parameter2xml tool lets you specify the simulation parameters in a plain text file which is converted to the XML format for your conveniece.

parameter2xml

The parameter2xml tool transforms a plain text parameter file into the above XML format,thereby creating the job and all neccessary task files. The parameter file consists of a number of parameter assignments of the form:

 MODEL="Ising";
 SWEEPS=1000;
 THERMALIZATION=100; 
 WORK_FACTOR=L*SWEEPS;
 { L=10; T=0.1; }
 { L=20; T=0.05; }

where each group of assignments inside a block of curly braces {...} indicates a set of parameters for a single simulation. Assignments outside of a block of curly braces are valid globally for all simulation after the point of definition. Strings are given in double quotes, as in "Ising".

Two parameters have a special meaning:

Parameter Default Meaning
SEED 0 The random number seed used in the next Monte Carlo run created. After using a seed in the creation of a Monte Carlo run, this value gets incremented by one.
WORK_FACTOR 1 A factor by which the work that needs to be done for a simulation is multiplied in load balancing.


The syntax to invoke parameter2xml is:

 parameter2xml parameterfile [xmlfileprefix]

which converts a parameterfile into a set of XML files, starting with the prefix given as optional second argument. The default for the second argument is the name as the parameterfile.


Invoking the program

Running the simulation on a serial machine

The simulation is started by first creating the job file, and then giving the name of the XML job file as argument to the program. In our example, the program is called my_program and the sequence for running it is:

 parameter2xml parm job 
 my_program  job.in.xml

The results will be stored in a file job.out.xml, which refers to the files job.task1.out.xml, job.task2.out.xml and job.task3.out.xml for the results of the three simulations.

Command line options

The program takes a number of command line options, to control the behavior of the scheduler:

Option Default Description
--time-limit timelimit infinity gives the time (in seconds) which the program should run before writing a final checkpoint and exiting.
--checkpoint-time checkpointtime 1800 gives the time (in seconds) after which the program should write a checkpoint.
--Tmin checkingtime 60 gives the minimum time (in seconds) which the scheduler waits before checking (again) whether a simulation is finished.
--Tmax checkingtime 900 gives the maximum time (in seconds) which the scheduler waits before checking (again) whether a simulation is finished.

Running the simulation on a parallel machine

is as easy as running it on a single machine. We will give the example using MPI. After starting the MPI environment (using e.g. lamboot for LAM MPI, you run the program in parallel using mpirun. In our example, e.g. to run it on four processes you do:

 parameter2xml parm job 
 mpirun -np 4 my_program job.in.xml

Command line options

In addition to the command line options for the sequential program there are two more for the parallel program:

Option Default Description
--Nmin numprocs 1 gives the minimum number of processes to assign to a simulation.
--Nmax numprocs infinity gives the maximum number of processes to assign to a simulation.

If there are more processors available than simulations, more than one Monte Carlo run will be started for each simulation.

Analysing the results of a Monte Carlo simulation

During the Monte Carlo simulation expectation values of a couple of observables (specified and implemented in the simulation code) are measured and stored in the respective task files. To archive the task files produced from a simulation and to extract data from these files or the archive respectively a couple of tools are documented in the following.


Tools

use_local_stylesheet

Viewing the XML output files using e.g. a browser such as Firefox depends on ALPS being installed on the same machine that the viewer runs on. In the case that you want to view the files on a different computer than the one they were created on, you can convert them using:

  use_local_stylesheet example.task1.out.xml

After running the script, a file called ALPS.xsl can be found in the same directory, which you have to copy along with the XML file you want to view. This also provides a workaround for some buggy versions of Firefox that cannot access stylesheets at absolute paths.

convert2xml

The simulation output files only contain the collected measurements from all runs. Details about the individual Monte Carlo runs for each simulation can be obtained by converting the checkpoint files to XML, using the convert2xml tool, e.g.:

 convert2xml run-file

This will produce an xml file of the task, containing information extracted from this Monte Carlo run.

archivecat

The archivecat tool wraps the specified task files into an archive file.

 archivecat task-file [task-file [task-file ...]]

extracttext

The extracttext script can be used to extract data in form of a plot from an archive or a set of task files. An input plot file in XML format specifies which observables should be extracted. For an example see below. The output format is plain text.

 extracttext plot-file archive-file 
 $HOME/ALPS/bin/extracttext plot-file task-file [task-file [task-file ...]]

extractxmgr

The extractxmgr script works similar to the extracttext, but produces output in the xmgrace plot format.

 extractxmgr plot-file archive-file 
 extractxmgr plot-file task-file [task-file [task-file ...]]

extracthtml

The extractxmgr script works similar to the extracttext, but produces output in html format.

 extracthtml plot-file archive-file 
 extracthtml plot-file task-file [task-file [task-file ...]]

Examples

An example plot file describing a plot of energy versus temperature for all system sizes calculated could look like this:

 <?xml version="1.0" encoding="UTF-8"?> 
 
 <plot name="Energy versus temperature for some model">
 
   <legend show="true"/>
   <xaxis label="Temperature" type="PARAMETER" name="T"/>
   <yaxis label="Energy"      type="SCALAR_AVERAGE"/>
 
   <for-each name="SystemSize"/>
 
   <constraint name="Energy"  type="SCALAR_AVERAGE" condition="<0" />
 
   <set label="start "/>
 
 </plot>

There is a number of constraints used in this example to filter data from the archive. The first constraint <for-each name="SystemSize"/> used in this example describes a loop over all possible values of the specified parameter. In the given example multiple sets are generated one for each system size found in the archive. The second constraint <constraint name="Energy" type="SCALAR_AVERAGE" condition="<0" /> restricst the energy range to negative values. For more details and further examples go to the tool page.

Evaluation of observables

Examples

The following example reads the expectation values of the particle number operators n and n2 of the simulation of a bosonic Hubbard model, calculates the expectation value of the compressibility and writes it back to the checkpoint.

#include <alps/scheduler.h>
#include <alps/alea.h>
 
void evaluate(const boost::filesystem::path& p, std::ostream& out) {
  alps::ProcessList nowhere;
  alps::scheduler::MCSimulation sim(nowhere,p);
 
  // read in parameters
  alps::Parameters parms=sim.get_parameters();
  double beta=parms.defined("beta") ? static_cast<double>(parms["beta"]) : (1./static_cast<double>(parms["T"]));             
 
  // determine compressibility
  alps::RealObsevaluator n  = sim.get_measurements()["Particle number"];
  alps::RealObsevaluator n2 = sim.get_measurements()["Particle number^2"];
  alps::RealObsevaluator kappa= beta*(n2 - n*n);  
  kappa.rename("Compressibility");
 
  // write compressibility back to checkpoint  
  sim << kappa;
  sim.checkpoint(p);
}
 
int main(int argc, char** argv)
{
  alps::scheduler::BasicFactory<alps::scheduler::MCSimulation,alps::scheduler::DummyMCRun> factory;
  alps::scheduler::init(factory);
  boost::filesystem::path p(argv[1],boost::filesystem::native);
  evaluate(p,std::cout);
}

Comment on random number generators

Whenever you use Monte-Carlo simulations, you need to remember that you work with pseudo-random numbers. There is always a small chance that your application is just by chance the one that shows that a so-far good pseudo random number generator is not ideal. Hence, as is standard practice for all high-accuracy Monte Carlo simuations, you should run a simulation with more than one random number generator if you strive for high accuracy. The RNG parameter of the simulation allows you to change the random number generator in order to validate your results.

© 2003-2008 by Simon Trebst and Synge Todo