Developers:Workshops:Evaluation:Meeting Notes

Jump to: navigation, search

November 1st, Morning

What do we need in ALPS2 RC1

  • ALEA (Key Component for APLS2)
    • Clean Up
    • Replace recording part with boost
    • Accumulators (mean, median, statistics, errors)
    • Serialization (storgae to file)
    • Combining Data from several runs
  • RNG (brigitte's & boost) [discuss in afternoon]
  • Parameter (later with expressions) [Lukas]
  • Lattice Implementations [copy from alps 1]
  • Scheduler [second priority]
  • XML Support (Wrapper for some library) [Lukas]
  • Hierarchical Structure for Parameters [discuss on Monday]]

Discuss in the Afternoon

  • RNG
  • CMake
  • ALEA
  • Scheduler

Current Issues

  • Random Number Generators
  • Scheduler
  • SVN Repository for Development
  • Quickbook for Documentations
  • Build System => Move to CMake?
  • Provenance
  • Database


  • Parameter classes will be rewritten by Lukas Gamper and Synge Todo, Emanuel Gull
    • use runtime polymorphism to access the value/convert
    • hierarchical naming
    • make sure that it is written out in the same order as it was read
  • Parallel random number generators: Jeongnim Kim, Matthias Troyer and Brigitte Surer
    • improve seeding for Well generator
    • make a parallel MT class
  • Build system: Synge Todo, Matthias Troyer, Jeongnim Kim, Peter Anders
    • Use NCSA CMake tools for ALPS 2, make it build using CMake
    • Contact Boost.CMake team regarding
      • Quickbook toolchain
      • Building Boost
      • Variants
  • rewrite Alea: Matthias Troyer, Emanuel Gull, Peter Anders
    • record numbers in a class based on Boost.Accumulator
    • write specific evaluation classes, that might later be moved to Boost
  • rewrite Lattice: Sergei Isakov, Matthias Troyer, Lukas Gamper
    • define concepts for building and accessing hypergraphs
    • then implement it
    • afterwards define the lattice descriptions in XML
  • scheduler: Synge Todo, Emanuel Gull, Matthias Troyer
    • flexible tree structure with abstract nodes and a common factory
      • nodes get a process group
    • expose it to the user at various levels in C++ and Python
    • use threading to respond to messages

November 2nd

November 3rd, Morning

Brainstorming on Monte Carlo file format

  • what do we want to store or calculate
    • timeseries (~10^2...10^7 samples), mean, error, autocorrelation (by binning or fitting), equilibration
    • binning in linear/log scales
    • typical sample size: 1...10^8
    • we want to filter, bin, or block the time series (size is determined by covariance matrices)
    • functions (moments, etc) with bootstraping/jackknife (O(10^2)...?)
    • running means of time series
    • histogram of time series
    • mean vs cutoff
    • "movies"
    • Fourier transform (FFT)
    • functions on multiple timeseries (arithmetic operations such as <m^2>-<m>^2)
    • reweighting (store whole timeseries or microcanonical averages)
    • checking reweighting range (from histogram)
    • linear least-squares fitting on (un)correlated data
    • nonlinear least-squares fitting (eg. L^(-x) (1+a L^(-y)))
    • MaxEnt for analytic continuations
  • complex observables
    • extracting real/imaginary part of timeseries
    • Do we need to store errors (i.e. covariance matrices) for complex observables? Currently, no.
  • microcanonical averages for multi-canonical sampling
    • store full timeseries or histograms
  • optimizer
  • for random systems?
    • combine data from clones
  • alea evaluator by Matthias Troyer
    • python wrapper of alea?
  • archive/retrieve
  • plotting
    • mean vs parameter
    • g(r) vs r
  • code validation

November 3rd, Afternoon

  • HDF5 talk by Jeongnim Kim
    • HDF-EOS
    • HDF5 tools: HDFView, h5ls, h5dump,
    • discussion about naming convention
  • Review of XML Schema from the last workshop by Matthias Troyer

November 4th


  • Summary of the last days
  • Program for today:
    • Find out most important operations on data
    • Define minimal set of data operations that tools should use
    • Basic provenance information that should be stored: who, when, program version, ...?
    • Fix timetable for first, small steps
  • Data provenance
    • What computer/node was the code run on? -> hostname
    • Date and time
    • Code (program name), code revision
    • Compiler information: name, version, options
    • Comment
    • Library information
    • Status: test run, production data, ...
    • Restart file (ALPS-specific)
  • ALPS currently stores hostname, user, date/time
  • All data should be stored with every simulation, not just in one master file
  • Discussion of possible data corruption issues: keep checkpoints? save to a copy? ...

After coffee

  • Prioritization:
  1. Subversion repository, Wiki page: CSCS?
  2. mean, error, autocorrelation
  3. filter, plot, running means
  4. Bootstrapping, ...
  • Workflow management with the tools to be developed
  • Finalize XML schema


  • questions on possible ALPS workflows and provenance in VisTrails
    • submitting jobs, retrieving result files to/from remote computers
    • use of plotting tools, such as xmgrace, etc
    • typical ALPS workflow
      • parameter2xml
      • run applications
      • extracttext
      • plot
    • Workflow element/node for each lattice (eg. "chain lattice"), model ("spin"...), application ("loop"...), etc in VisTrails
    • VisTrails provenance API for Fortran 77/90, C, C++,

November 5

HDF5 Implementation Discussion

Participants: Lukas Gamper, Jeongnim Kim, Emanuel Gull

We need to avoid 2GB xml files!! - How should be implement the schema of yesterday in HDF5?

The HDF5 Container:

h5container c("filename");
int a=c["/Parameters/Energy"];
c[" ... "] = 5;                 //access values
c["/Parameters/"];              //group iterator
c["/Parameterrs/Eergy/@Enor"]  //access attributes

How do we create groups? stripping:

  • access /a/b/c/d?
  • take /a/b/c, check if it exists, otherwise create recursively.

We need to do multi-dimensional array type!!

  • operator = maps to set and get functions
  • specialization for each set and get function!!

we need an interface for just writing part of a vector, resizing: e.g. time series? I want to be able to write just the last part! interface proposition:

for e.g. std::vector


start to write v from index 1052

for multidimensional vectors?

  • currently: specify the vector, give the shape, put it in as multidimensional array
  • multidimensional array wrapper?
  • consensus: ublas is lame, should we support it?

If we need more complex data structure: we need to be able to derive from the set, or specialize it set the vars from private to protected.

check out test_repository/sandbox/gamperl/hdf5 for what we have

Do we want to copy the BLAS specifications for accessing matrices? LDA, size A? such that one does not need contiguous memory?

how about an interface set<uint D>(V,LDA,A,B)?

We can probably do it without boost (maybe group iterators...)

We need to have access to size information.

Generic information -> can do resize. We still need shape of data!

how about interface with pointers? convenient, not really c++ Give a std::vector to get function -> nothing to worry about. but multi dimensional?

set function should mirror get function.... we don't need to know about B...

container needs a flush function