ALPS using Vistrails
From ALPS
|
Languages: |
English • 日本語 (ja) • 繁體中文 (zh-tw) • 简体中文 (zh) |
Contents |
Running ALPS using Vistrails
Introduction to VisTrails
VisTrails is a provenance-enabled workflow system, allowing one to perform all steps from data preparation to the final plot in one graphical workflow system. An added advantage is that the final figures can link back to the workflow that created them, thus giving access to all details of the calculation that led to the final image. For a detailed introduction to VisTrails we refer to the documentation on the VisTrails page. Don't let the visualization examples there scare you off - VisTrails is far more than a visualization system for medical images.
For a quick introduction we recommend our screencast: [1]
Running ALPS using Vistrails should be straightforward to lerarn from the tutorials and especially the simple workflows below.
The history view
The History view of VisTrails shows a history tree of all the modifications done to your workflows. An example of a workflow preparing a plot is shown here:
Important versions should be tagged by labels and can have notes added. An example of a workflow tree is shown here:
All the workflow images on this page and the history tree are taken directly from the server, requesting the vistrail used in this tutorial. You can enlarge the images using your browser or click on them to download the workflow file or the vistrail file with the full history respectively.
Below we will walk you through the various example workflows in this tree, which each illustrate some important steps in preparing a workflow for VisTrails.
Running a simulation and preparing a plot
A basic example
Using Python's matplotlib and the VisTrails spreadsheet
We will start with a full workflow that runs a simulation and creates a plot using Python's matplotlib library. In the workflow shown here
the top modules prepare the lattice, model and Monte Carlo parameters.
The module PrepareMonteCarlo collects all the needed parameters for a Monte Carlo simulation. Then the module WriteInputFiles writes the ALPS XML input filesand RunSpinMC runs the ALPS spinmc application for classical Monte Carlo simulations.
After running the simulation, we use a PersistentIntermediateDir module to store the output in a persistent storage archive. The advantage of this is that if we run the workflow again at a later time, the time-consuming simulation is not run again but the output files are taken from the archive. Since the persistent file and directory modules check only the upstream workflow, we cna arbitrarily change the analysis part below this module, and the simulation will not be rerun. Only changes to the upstream simulation parameters will cause a new simulation to be performed.
In the evaluation part we first call GetResultFiles to look for all ALPS result files in the simulation directory and then LoadAlpsMeasurements to load the simulation results. In CollectDataSets we collect the magnetization as a function of temperature, and then prepare a plot in PreparePlot, setting Axis properties such as the minimal and maximal values for the x- and y- axis. Finally we show the plot using ShowMplPlot.
The final plot looks like this, here embedded straight from running the workflow, using the Vistrails wiki extension:
Preparing a grace plot
Instead of, or in addition to, showing the plot as a matplotlib plot in the spreadsheet we can also use xmgrace to display the plot using the grace format. This requires two modules: WriteGraceFile and DisplayGracePlot. In addition, you will need to set the xdisplay variable in the ALPS module configurations in VisTrails to point to the correct X11 display, and will need to set the toolpath configuration variable to point to the directory containg your xmgrace application:
ALternatively you can write the grace file to disk, using the FileSink module after WriteGraceFile. Make sure to set the path in theFileSink module to a location that you can write to before executing this workflow.
Preparing a gnuplot plot
Alternatively the WriteGnuplotFile and DisplayGnuplot modules can be used to create and display gnuplot plots. The WriteGnuplotFile also takes an optional output file input value, specifying a file into which the plot should be written when the script is executed. File suffixes .pdf or .eps will automatically set pdf or eps file format in the gnuplot script.
A more advanced example with post-processing
As a next step we want to perform a more involved plot, plotting a Binder cumulant ratio <Magnetization^2>/<|Magnetization|>^2. To do so we include a TransformGroupedDataSets module which lets us create a new data set with the Binder cumulant for each result file. This module contains a Python source text to create a new dataset for the Binder cumulants, setting the values to the calculated Binder cumulant ratio:
obschoose = lambda d, o: np.array(d)[np.nonzero([xx.props['observable'] == o for xx in d])] magn2 = obschoose(data, 'Magnetization^2') magn2 = magn2[0] magnabs = obschoose(data, '|Magnetization|')[0]
binder = DataSet() binder.props = data[0].props.copy() binder.y = np.array([magn2.y[0]/(magnabs.y[0]*magnabs.y[0])]) binder.props['observable'] = 'Binder cumulant'
return binder
Interoperating with command line or Python tools
We might not want to perform all parts of the simulation in VisTrails but, e.g. use VisTrails only to create input files and/or perform the data analysis. In this example we will thus split the workflow into three separate parts: preparation of input files, running of a simulation, and analysis and plotting.
Preparing input files to run a simulation elsewhere
In the first workflow we want to create a directory containing the input files to the simulation. We do this by picking the top part of the workflow and storing the input files to a location in our home directory, using the DirectorySink module which copies the directory from a temporary location to the specified directory.
To use this workflow you will need to edit the inputs to the DirectorySink module to specify a suitable location on your disk.
Running a simulation from input files prepared elsewhere
Although one will typically run a big simulation on a cluster of from the command line, we here show for completeness a workflow that takes input files from a directory in your disk and runs a simulation. To use the workflow edit the File input module and choose the main XML input file for your simulation.
Loading results from a simulation runs elsewhere and analyzing them
Finally we want to load simulation results and perform the analysis and plotting. For this we just take the lower part of the workflow and use a Directory module to specify the location of the result files that should be analyzed. Again, the Directory module's inputs have to be modified to point to the directory on your disk containing the results.
Using persistent directories and files
The final tutorial is about naming persistent results. Instead of just using persistent files and directories as anonymous persistent caches in workflows, one can name the results and use them in other workflows. To show this, we will split the workflow into two parts, one to run the simulation, and one to analyze the data in a separate workflow.
Naming a persistent directory or file
The first workflow performs the simulation and stores the results in a persistent directory. To name the outputs, edit the configuration of the persistent directory module. You will get a window that let's you choose to 'create a new reference'. Click that option and enter a name for the results.
Using a named persistent directory or file
In a second workflow we want to start from the results of the previous simulation and make a plot. To access the results we use a PersistenInputDirectory module. Editing its configuration we select 'Use existing reference' and choose the directory with the name we've used in the simulation workflow. Clicking in the triangle on the left of the name shows the various versions that have been created for this directory. Choosing the top level always picks the latest version. Alternatively one can select a specific version of the directory. As you can see, no data is overwritten and lost - the results are automatically versioned and archived.











