Morning Session: Provenance
From ALPS
Talks
- Chair: Vito Scarola
- Speakers
- Vito Scarola
- CCM <-> Hunters & gatherers
- Reproducibility
- What to keep for each algorithm? (create table in discussion)
- VisTrails: storing information automatically
- Juliana Freire
- Trustworthiness of the results: assessment requires provenance
- Reproducability also requires storing at least software versions, or even virtual machines
Discussion
- Vito's ALPS example: bose Hubbard model
- Idea: let report be paradigm for provenance in computational CM research
- VisTrail's allows documentation of boxes (infrastructure is there); this can be done in the code
- Provide examples in the report
- Pointers to other potential provenance models
- Make a real set of examples
- Where does the logging take place?
- Every execution is logged
- Into files, databases, ...
- Matthias: not too detailed at this stage
- Coarse-grained view of the problem
- Remote execution is possible
- Integration with ALPS scheduler?
- Two important questions: What do I/What do others have to do to reproduce data?
- Having everything in the paper will make it more messy
- ALPS can have a standardized format; other people need not follow
- Workflow need not know about intermediate files, checkpoints, etc.
- How much the user wants to say remains in his own hands
- Keeping the script is also a way of recording provenance
- Providing the code?
- In context with ALPS code: what if somebody screwed up using ALPS?
- Have a licence: publications with ALPS code must include code
- The focus is reproducability
- Will one want to make the latest discovery available?
- Guifre: you will want to keep your small secrets
- Matthias: should we stick to this paradigm?
- Should we be forced to do that?
- Rajiv: The person who write the code is still in advantage when applying it to other models
- Two paradigms:
- Publish nice, polished codes?
- Ugly working codes?
- Already now some journals require source code
- Public code will increase number of citations (significantly)