Provenance Best Practices

From ALPS
Revision as of 08:34, 16 October 2013 by Dolfim (talk | contribs)

Jump to: navigation, search

During the ETH Provenance Challenge we identify some "Best practices" in the production of provenance-rich scientific work.

Minimal requirement:

  • use version control for sources and scripts
    • commit often
    • write descriptive, but concise commit messages
  • store the revision number/repository state
  • store input parameters (incl. random seeds) used to obtain the data
  • create a directory per figure containing relevant scripts
  • store the numbers for the data in the plot in an accompanying text file
  • upload raw output
  • describe the post-processing procedure that turns raw data into plotted values


Additional features:

  • store build information
    • store branch, revision number, build time and node.
    • any data output should have attributes from where this information can be recovered (i.e. headers of text file, or attibutes in hdf5)
  • store runtime settings
    • store command line arguments, runtime and node
  • link figures to evaluation scripts and data
    • if you get the PDF figure, can you go back to the version of code and parameters used in the simulation?

Compiling code with provenance from Git repository

This is example shows how to add git repository information such as branch and revision into your code. It is easily portable to CMake and Subversion.

'Makefile:

BUILDHEADER=/tmp/buildheader.info
BUILDSTAMP="\"`cat ${BUILDHEADER} | head -n 1`\""
FLAGS = -O3 -DBUILD_STAMP=${BUILDSTAMP} 
 
buildheader:
	command -v git >/dev/null 2>&1 &&  echo "Build date" `date +'%y.%m.%d %H:%M:%S'` "NL"  "Branch: " `git rev-parse --abbrev-ref HEAD` "NL" "Hash: " `git rev-parse HEAD` "" > ${BUILDHEADER}
 
program: buildheader
        c++ ${FLAGS} -o program program.cpp

program.cpp:

#include<iostream>
 
int main() {
    std::cout << "Save the macro BUILD_STAMP with your data." << std::endl;
    std::cout << BUILD_STAMP << std::endl;
    return 0;
}

Note: It might require to be compiled with C++11 - I don't remember whether it works with 03.

Subversion revision number in CMake

'CMakeLists.txt:

set(MYPROJECT_VERSION_BUILD "")
find_package(Subversion) 
if(Subversion_FOUND)
  # get the Subversion info
  Subversion_WC_INFO(${PROJECT_SOURCE_DIR} MYPROJECT)
  # (optional) extract the branch path from the full url
  string(REPLACE ${MYPROJECT_WC_ROOT} "" MYPROJECT_BRANCH ${MYPROJECT_WC_URL})
  # combine revision number and branch path
  set(MYPROJECT_VERSION_BUILD "r${MYPROJECT_WC_REVISION} (${MYPROJECT_BRANCH})")
endif(Subversion_FOUND) 
 
# configure a C++ header file. build revision will then be available as a macro.
configure_file(version.hpp.in ${CMAKE_BINARY_DIR}/version.hpp)

version.hpp.in:

#ifndef MYPROJECT_VERSION
#define MYPROJECT_VERSION
 
#cmakedefine MYPROJECT_VERSION_BUILD "@MYPROJECT_VERSION_BUILD@"
 
#endif