Improvements to Pipeline database performance and scalability

What?

I'd like to split the ProcessInstance table to only have information common to both 'BATCH' and 'SCRIPT' processinstances.

This should significantly speed up row retrieval for a lot of operations where the other information is not needed (Especially because of the WorkingDir and LogFile columns), increase database performance

Things to do:

Make a BatchProcessInstance table with the following columns:

ProcessInstance
JobSite
JobID
SubmitDate (Maybe should just be in ProcessInstance even though Jython scriptlets will never use it)
CPUSecondsUsed
MemoryUsed
SwapUsed
ExecutionHost
WorkingDir
LogFile

This should nearly eliminate row migration in the ProcessInstance table which should really, really help index performance (especially with partitioning)
We can easily do a left outer join between the two tables to get the same thing as the processinstance table.

After that, things we should do:

Simplify ProcessInstance class in org.srs.pipeline.server.sql to correspond to the reduced ProcessInstance table
Create a new BatchProcessInstance class that corresponds to the join of ProcessInstance and BatchProcessInstance
We need to deal with Row migration/chaining due updating the logfile directory. Options are:
- PCTFREE should be tuned
- A temporary table to hold BatchProcessInstance data until we get LogDir, when we get that, insert it into BatchProcessInstance
- ?

This site has some interesting information: http://www.akadia.com/services/ora_chained_rows.html

Blog

Improvements to Pipeline database performance and scalability

What?

Things to do:

After that, things we should do: