Workshop on Monte Carlo data evaluation, archiving and provenance

November 1-4, 2008

As computer simulation programs are becoming more complex, and more costly, often running for many months on hundreds of CPUs, the question of validation is becoming more important. With the growing complexity of the methods, algorithmic descriptions in research publications are not sufficient anymore to reproduce the simulation results - and even if they could be reproduced the effort to do so is becoming prohibitively large. This is a serious problem since the scientific method relies on reproducibility of scientific results. This problem is compounded by the fact that the raw simulation output is usually stored in proprietary formats known only to the graduate student performing the simulation and it becomes inaccessible once the student graduates and moves on.

We thus need tools to enable and simplify the long-term archiving of data, the consistent and reproducible evaluation of results and their errors, and tools to record provenance information. Collaborative efforts are essential to achieve these goals. To follow up on a 2006 workshop on standard data formats and evaluation tools, we plan to hold another meeting this year. The discussions following the previous workshop and attempts at a standard data format there have shown that

  • there is a growing need for
    • simple standardized data formats for Monte Carlo data
    • efficient evaluation tools based on these formats
    • scalability to gigabyte-sized data sets
  • instead of attempting a unified data format for all data and provenance information we need to
    • focus the efforts of the physics community on simpler data formats focusing on Monte Carlo data
    • making use of tools developed in computer science departments to record process provenance information

We have invited Prof. Claudio Silva, an expert in computational provenance and developer of the VisTrails software to join this workshop.


The workshop is sponsored by the Center for Theoretical Studies at ETH Zurich.


The workshop will be held November 1-4 in the information sciences building (HIT) of the Hönggerberg campus of the ETH Zurich, Switzerland (map).

Travel to Zurich

Zurich is well connected to the national and international railway system. Timetables can be found on: Hönggerberg campus can be reached by various busses, please check our local bus schedule.

Hönggerberg campus is only a 5 minutes train ride and 10 minutes bus ride away from Zurich International Airport and about 25 minutes away from Zurich main station.

  • From the airport, you take the train to `Oerlikon' station and then take the bus No. 80 at the bus stop `Bahnhof Oerlikon Nord' and get off at the bus stop `ETH Hönggerberg'.
  • From the main station, take the street car No. 11 to `Bucheggplatz' and there switch to the bus No. 69 and keep aboard until its final stop at `ETH Hönggerberg'. Probably this is the shortest among the several paths between the main station and Hönggerberg campus.

More detailed instructions on how to reach Hönggerberg campus are available on the ETH webpage in German (same in English)


Once we receive your travel details we will book a hotel room for you. Please let us know as soon as possible when you plan to arrive and depart.


Tentative Workshop Program

The style of the workshop will be informal, with presentations on the first day to be followed by a more discussion-centered program and work in working groups on the following days. All talks and discussion will be held in Room K 52 in the HIT building floor K.

Saturday, November 1

  • Morning and early afternoon
    • 10: Arrival, Institute for Theoretical Physics in the HIT building floor K
    • 10 - 12: Coffee and ALPS developer meeting: ALPS 2
    • 12 - 13: Lunch on ETH Hönggerberg campus
  • Afternoon
    • 13-onward: Continuation of ALPS developer meeting: build system and framework for ALPS 2

Sunday, November 2

  • Morning and early afternoon
    • 9 - 9:30: Coffee
    • 9:30 - 11:30: Presentations by participants on the status of MC data evaluation, data formats and provenance in their groups
    • 11:30 - 12: Break
    • 12 - 13: Introduction to computational provenance by Prof. Claudio Silva
    • 13 - 14: Lunch on ETH Hönggerberg campus
  • Afternoon
    • 14 - 14:30: Coffee
    • 14:30 - 15:30: Introduction to VisTrails by Prof. Claudio Silva
    • 15:30 - 18:00: Discuss workflow in Monte Carlo simulations and the potential of using VisTrails or similar tools

Monday, November 3

  • Morning and early afternoon
    • 9-10: Short presentations on desirable tools and workflow for Monte Carlo data evaluation
    • 10-11: Discuss concrete tools fur running, evaluating and archiving Monte Carlo simulations (Assign project leaders)
    • 11-11:20: A review of past attempts at a common data format for MC simulations by Prof. Matthias Troyer
    • 11:20-12: Discuss minimal standard format for Monte Carlo data archiving
    • 12-13: Lunch on ETH Hönggerberg campus (Wok)
  • Afternoon
    • 13 - 13:30: Coffee
    • 13:30 - 15:00 Split into groups to discuss specific projects and goals

Tuesday, November 4

  • Morning and early afternoon
    • 9 - 10: Short presentations on ideas regarding essential provenance information (what should be recorded and what should not?)
    • 10 - 12: Discussion on provenance in MC simulations
      • What has to be stored with the raw or evaluated MC data?
      • What can be delegated to workflow provenance tools, e.g. VisTrails?
      • What tools already exist?
    • 12 - 13: Lunch on ETH Hönggerberg campus (Chemistry restaurant)
  • Afternoon
    • 13 - 13:30: Coffee
    • 13:30 - onward: Summary discussion
      • Finalize data format
      • List of projects and project leaders
      • Establish timetable


Participants (as of 02/10/2008)

  • Overseas
    • Jeongnim Kim (NCSA, Urbana, IL, USA)
    • Claudio Silva (University of Utah, Salt Lake City, USA)
    • Synge Todo (University of Tokyo, Japan)
    • Simon Trebst (Microsoft Station Q, Santa Barbara, USA) remotely in the afternoons via videoconferencing
  • Europe
    • Ulrich Schollwöck (RWTH Aachen, Germany)
    • Lars Bonnes (University of Stuttgart, Germany)
    • David Luitz (University of Wuerzburg, Germany)
    • Wolfhard Janke (Leipzig, Germany)
  • Switzerland
    • Matthias Troyer
    • Helmut Katzgraber
    • Philipp Werner
    • Emanuel Gull
    • Lukas Gamper
    • Vito Scarola
    • Peter Anders
    • Evgeny Kozik
    • Sergey Isakov
    • Brigitte Surer
    • Bela Bauer
    • Ruben Andrist
    • Christian May
    • Norbert Stoop
    • Tobias Kesselring
    • Eric Fehr
    • Hansjörg Seybold
    • Reza Mahmoodi Baram
    • Matthias Nyfeler
    • Wesley Petersen
    • Mauro Calderara

Meeting Notes