Morning Session: Provenance

From ALPS
Jump to: navigation, search

Talks

  • Chair: Vito Scarola
  • Speakers
    • Vito Scarola
      • CCM <-> Hunters & gatherers
      • Reproducibility
      • What to keep for each algorithm? (create table in discussion)
      • VisTrails: storing information automatically
    • Juliana Freire
      • Trustworthiness of the results: assessment requires provenance
      • Reproducability also requires storing at least software versions, or even virtual machines

Discussion

  • Vito's ALPS example: bose Hubbard model
  • Idea: let report be paradigm for provenance in computational CM research
  • VisTrail's allows documentation of boxes (infrastructure is there); this can be done in the code
  • Provide examples in the report
    • Pointers to other potential provenance models
    • Make a real set of examples
  • Where does the logging take place?
    • Every execution is logged
    • Into files, databases, ...
  • Matthias: not too detailed at this stage
    • Coarse-grained view of the problem
  • Remote execution is possible
  • Integration with ALPS scheduler?
  • Two important questions: What do I/What do others have to do to reproduce data?
    • Having everything in the paper will make it more messy
  • ALPS can have a standardized format; other people need not follow
  • Workflow need not know about intermediate files, checkpoints, etc.
    • How much the user wants to say remains in his own hands
  • Keeping the script is also a way of recording provenance
  • Providing the code?
    • In context with ALPS code: what if somebody screwed up using ALPS?
    • Have a licence: publications with ALPS code must include code
    • The focus is reproducability
    • Will one want to make the latest discovery available?
      • Guifre: you will want to keep your small secrets
      • Matthias: should we stick to this paradigm?
        • Should we be forced to do that?
      • Rajiv: The person who write the code is still in advantage when applying it to other models
      • Two paradigms:
        • Publish nice, polished codes?
        • Ugly working codes?
      • Already now some journals require source code
      • Public code will increase number of citations (significantly)