Thoughts on pipeline post Data Handling Meeting

Created by Richard Dubois on Jan 16, 2005

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Here are what I take to be the primary issues that need addressing in the next iteration of the pipeline:

High level

more easily debuggable code; a language with a debugger. We've decided the bulk of the code that performs logic will be in java. Perl (or python) would be used for those functions closely tied to the unix O/S.

Mid Level

the graph of processes that can be supported needs to be much richer than provided now
- conditions for running a process need not be restricted to availability of datasets
- multiple processes should be able to work in parallel on a given dataset
- a process may depend on multiple datasets
- a task can take inputs from another task
- the graph should support versions of process runs and if desired follow up subsequent processes whose inputs have now been incremented. This could allow reprocessing while maintaining the basic identity of the original task

No labels