Difference between revisions of "Provenance Best Practices"

From ALPS
Jump to: navigation, search
Line 15: Line 15:
 
** any data output should have attributes from where this information can be recovered (i.e. headers of  text file, or attibutes in hdf5)
 
** any data output should have attributes from where this information can be recovered (i.e. headers of  text file, or attibutes in hdf5)
 
* store runtime settings
 
* store runtime settings
 +
** store command line arguments, runtime and node
 
* link figures to evaluation scripts and data
 
* link figures to evaluation scripts and data
 
** if you get the PDF figure, can you go back to the version of code and parameters used in the simulation?
 
** if you get the PDF figure, can you go back to the version of code and parameters used in the simulation?

Revision as of 11:41, 15 October 2013

During the ETH Provenance Challenge we identify some "Best practices" in the production of provenance-rich scientific work.

Minimal requirement:

  • use version control for sources and scripts
  • commit often
  • store the revision number/repository state
  • create a directory per figure containing relevant scripts
  • store the numbers for the data in the plot in an accompanying text file
  • upload raw output


Additional features:

  • store build information
    • store branch, revision number, build time and node.
    • any data output should have attributes from where this information can be recovered (i.e. headers of text file, or attibutes in hdf5)
  • store runtime settings
    • store command line arguments, runtime and node
  • link figures to evaluation scripts and data
    • if you get the PDF figure, can you go back to the version of code and parameters used in the simulation?