Difference between revisions of "Developers:Workshops:Evaluation:Meeting Notes"

From ALPS
Jump to: navigation, search
m (November 4th, Morning)
Line 142: Line 142:
  
 
* Discussion of possible data corruption issues: keep checkpoints? save to a copy? ...
 
* Discussion of possible data corruption issues: keep checkpoints? save to a copy? ...
 +
 +
Interesting plot package
 +
http://en.wikipedia.org/wiki/HippoDraw

Revision as of 12:23, 4 November 2008

November 1st, Morning

What do we need in ALPS2 RC1

  • ALEA (Key Component for APLS2)
    • Clean Up
    • Replace recording part with boost
    • Accumulators (mean, median, statistics, errors)
    • Serialization (storgae to file)
    • Combining Data from several runs
  • RNG (brigitte's & boost) [discuss in afternoon]
  • Parameter (later with expressions) [Lukas]
  • Lattice Implementations [copy from alps 1]
  • Scheduler [second priority]
  • XML Support (Wrapper for some library) [Lukas]
  • Hierarchical Structure for Parameters [discuss on Monday]]

Discuss in the Afternoon

  • RNG
  • CMake
  • ALEA
  • Scheduler

Current Issues

  • Random Number Generators
  • Scheduler
  • SVN Repository for Development
  • Quickbook for Documentations
  • Build System => Move to CMake?
  • Provenance
  • Database


Decisions

  • Parameter classes will be rewritten by Lukas Gamper and Synge Todo, Emanuel Gull
    • use runtime polymorphism to access the value/convert
    • hierarchical naming
    • make sure that it is written out in the same order as it was read
  • Parallel random number generators: Jeongnim Kim, Matthias Troyer and Brigitte Surer
    • improve seeding for Well generator
    • make a parallel MT class
  • Build system: Synge Todo, Matthias Troyer, Jeongnim Kim, Peter Anders
    • Use NCSA CMake tools for ALPS 2, make it build using CMake
    • Contact Boost.CMake team regarding
      • Quickbook toolchain
      • Building Boost
      • Variants
  • rewrite Alea: Matthias Troyer, Emanuel Gull, Peter Anders
    • record numbers in a class based on Boost.Accumulator
    • write specific evaluation classes, that might later be moved to Boost
  • rewrite Lattice: Sergei Isakov, Matthias Troyer, Lukas Gamper
    • define concepts for building and accessing hypergraphs
    • then implement it
    • afterwards define the lattice descriptions in XML
  • scheduler: Synge Todo, Emanuel Gull, Matthias Troyer
    • flexible tree structure with abstract nodes and a common factory
      • nodes get a process group
    • expose it to the user at various levels in C++ and Python
    • use threading to respond to messages

November 2nd

November 3rd, Morning

Brainstorming on Monte Carlo file format

  • what do we want to store or calculate
    • timeseries (~10^2...10^7 samples), mean, error, autocorrelation (by binning or fitting), equilibration
    • binning in linear/log scales
    • typical sample size: 1...10^8
    • we want to filter, bin, or block the time series (size is determined by covariance matrices)
    • functions (moments, etc) with bootstraping/jackknife (O(10^2)...?)
    • running means of time series
    • histogram of time series
    • mean vs cutoff
    • "movies"
    • Fourier transform (FFT)
    • functions on multiple timeseries (arithmetic operations such as <m^2>-<m>^2)
    • reweighting (store whole timeseries or microcanonical averages)
    • checking reweighting range (from histogram)
    • linear least-squares fitting on (un)correlated data
    • nonlinear least-squares fitting (eg. L^(-x) (1+a L^(-y)))
    • MaxEnt for analytic continuations
  • complex observables
    • extracting real/imaginary part of timeseries
    • Do we need to store errors (i.e. covariance matrices) for complex observables? Currently, no.
  • microcanonical averages for multi-canonical sampling
    • store full timeseries or histograms
  • optimizer
  • for random systems?
    • combine data from clones
  • alea evaluator by Matthias Troyer
    • python wrapper of alea?
  • archive/retrieve
  • plotting
    • mean vs parameter
    • g(r) vs r
  • code validation

November 3rd, Afternoon

  • HDF5 talk by Jeongnim Kim
    • HDF-EOS http://hdfeos.org/
    • HDF5 tools: HDFView, h5ls, h5dump,
    • discussion about naming convention
  • Review of XML Schema from the last workshop by Matthias Troyer

November 4th, Morning

  • Summary of the last days
  • Program for today:
    • Find out most important operations on data
    • Define minimal set of data operations that tools should use
    • Basic provenance information that should be stored: who, when, program version, ...?
    • Fix timetable for first, small steps
  • Data provenance
    • What computer/node was the code run on? -> hostname
    • Date and time
    • Code (program name), code revision
    • Compiler information: name, version, options
    • Comment
    • Library information
    • Status: test run, production data, ...
    • Restart file (ALPS-specific)
  • ALPS currently stores hostname, user, date/time
  • All data should be stored with every simulation, not just in one master file
  • Discussion of possible data corruption issues: keep checkpoints? save to a copy? ...

Interesting plot package http://en.wikipedia.org/wiki/HippoDraw