Performance Priorities which involve a database change
- Splitting the ProcessInstance table into ProcessInstance and BatchProcessInstance
- This drastically reduces the size of the ProcessInstance table and prevents row migration
- Do we remove the fullpath from the logFile column, and assume it will always be under the WorkingDirectory?
- Partition the ProcessInstance table
- Partitioning requires inserting all rows into a new table.
- This may need to be modified to partition to Stream reference.
- Partitioning by month seems to be easiest way to eek out extra performance of current transactions
- Partitioning by month also helps to theoretically boost performance of the web interface, specifically the task.jsp page if we allow users to limit the stats of processInstances to be within the last week/month(s), which currently isn't enabled.
- Partition the Stream
- Should probably be done before ProcessInstance partitioning
- Not sure what the best strategy is.
- One strategy is to partition based on Task, or a reference to a root-level task possibly. This gets a little messy.
- Second strategy is to partition based on RootStream. The problem with this is that it requires processing of the ancestor stream at table insertion.
- But maybe we could add the column to the current database, and start computing those values now. Then later, we can insert those into a new table with interval partitions
- Time-only partitioning requires all queries to specify a time range in order to limit search partitions
- Add a RootStream column to the Stream table, enable RootStream locking.
- Enables fast Stream Tree locking
- When a process instance acquires a lock, it will no longer lock on it's parent node, which is a stream. It will, from now on, lock on Stream with the primary key of the RootStream of the PI's parent stream node, which will prevent any dead locks as each change to a process instance's status will preclude the modification of any other part of the tree.
- Enables the canceling of top-level streams. Without this, it's impossible to know if another database connection has a lock on a child node of a top level stream.
- To maintain full backward compatibility, the stream table will need to be modified to populate this column, which may take a while for all streams.
- Should speed up rolling back
- Dead branch isLatest decrementing
- Currently, Stream Tree >= Stream Tree where isLatest = 1 >= Stream Latest Tree
- Proposal: When a branch is declared dead, all child nodes would be decremented by 1
- This changes to ( Stream Tree where isLatest = 1 == Stream Latest Tree ), which eliminates recursive queries needed to find the latest tree
- This requires more work when rolling back a stream, for instance, but may also be offset with the speed gained from a reduced query time.
- May benefit greatly status changes to a stream execution tree
Backwards Compatibility Issues
- Active streams during transition
- Recently active streams during transition (streams which may be rolled back within a day or so)
- Streams that haven't been active for a week.
With weekly/monthly partitioning, we can lock an older partition for a Stream, for example, and prevent modification by another process while we are backfilling computed values, say for isLatest decrementing.
Software Migration
- Continue to move as much code from stored procedures back into the pipeline so we can support Postgres and SQLite. This was started with the last few releases, we're continuing on
- It also enables us to actually understand what operations are slow, and potentially modify them. Right now, Stored Procedures are a black box.
- In conjunction with the Schema changes and partitioning, this should still provide with an overall performance boost
- Move all database statements to Java 7 for readability (A branch with most of these changes already exists)
- Add lots of tests!
Testing
Database Migration Testing
- Need to be able to revert changes in the current database that I will test for all Performance Priorities. This should probably be done with backup files of the current data on zglast-oracle03.
- This is more exploratory in nature. Once I'm able to decide on schema changes which are performant, I will write scripts to perform schema changes. Right now, I'm using the software Alembic and SQLAlchemy to aid in these migrations. One major benefit of Alembic is the database tables are defined in Python, and DBMS-specific SQL is automatically generated for every database system we plan to support.
Server Software Testing
First off, I Would like to expand to support three databases going forward:
- Oracle 11g, possibly 12c in the future
- PostgresSQL 9.2+ (and by proxy, commercial product EnterpriseDB)
- Sqlite 3.8.3+ (3.8.3 includes recursive with statement support)
Now, a bit about Unit Testing and Integration Testing.
Unit testing should only test the code. Certain unit tests can/should be done with a database. Currently we use Oracle, but it could be convenient if, in addition/instead of oracle, we used SQLite as it is file based and flexible. Right now, the majority of unit tests only test the creation and uploading of a new Task. Many unit tests which aren't related to Task uploading are really some sort of integration test. Overall testing coverage is pretty low, and needs to be worked on.
Integration testing should test how certain parts of the code will interact with a database, and effectively simulate how the server would operate. Integration testing is loosely related to Database migration testing as well.
Integration testing is a lot harder than unit testing. Ideally, we would have some virtual machines and have integration with jenkins which can bring up/tear down these multiple virtual machines with a well-defined environment. Another option is for Amazon, or some sort of VPS service, as they are well-targeted to create various different environments easily. Integration testing should not be performed every time new code is checked in, but possibly ran on a daily basis. There's a lot of urgency in this migration, but it's a very high priority for me to make sure that any changes I make to the server software for the migration DO NOT preclude potential support for Postgres/SQLite, and the best way I think I can ensure this is with an adequate integration testing environment, but I don't believe I will be able to set up all these testing infrastructures in a reasonable amount of time.
Caveats of these database systems:
SQLite has no row-locking or partitioning.
PostgresSQL doesn't have friendly/automatic partitioning.
Oracle works best with CONNECT BY vs recursive with statements.
SQLite is a bit more complicated to setup, as it requires a native library. The version we need to use is pretty new, and the java support currently requires building by hand ( https://bitbucket.org/xerial/sqlite-jdbc ), but I was able to perform this successfully.
Caveats of current systems:
Current Oracle installs do not use timestamp with time zone. Not a problem if the server is in the same time zone (PST) as the database, which is the case. There are no plans to change this.
Additional Definitions
Task: A node which includes other nodes (sub-Tasks) or leaf-nodes, which are of type Process
Process: A leaf node of a Task which defines an process which can be ran as either a Batch job, or internally to the pipeline server as a Jython scriptlet.
Stream: A stream is an execution instance node of a task.
ProcessInstance: An execution instance of a Process inside a Stream.
Stream Tree: The tree, with the root being a top level stream, of all streams include those in "dead" branches. The root of a dead branch is the first node where isLatest is not equal to 1. A node is a Stream
Stream Latest Tree: The tree, with the root being a top level stream, of all streams which are in the latest path.
Task Forest: All Stream Trees for a given root-level task.
Root Stream: The top level stream for a given child, grand child, etc... stream.
12 Comments
Brian Van Klaveren
Possible partitioning scheme for stream:
Brian Van Klaveren
Partitioning that way limits the ability to subpartition by date for any children. That might be irrelevant if indexes are created appropriately.
It would take an estimated 1 hour to build a new stream table with all ancestor streams populated.
Brian Van Klaveren
The following will partition Task, Stream, and ProcessInstance by a root-level task. It requires a root-level task.
Brian Van Klaveren
I've encountered problems when we don't use nologging and we don't use the /*+ append */ hint with the redo logs when copying tables over, so we need to remember to use those.
Secondly, some of the calculations could be precalculated and added to a new tablespace with GRANT SELECT privileges on zglast-oracle03, and we could do a join to help things move quicker. Testing the following was pretty slow on sca-oracle01 with ~12 million streams and ~15 million process instances:
800 seconds to calculate root_stream_of 13 million rows for stream table.
Brian Van Klaveren
Split of Processinstance, insert into processinstance_test table concurrently with batchprocessinstance
20:30:00 [INSERT - 16055625 row(s), 268.125 secs] Command processed
... 4 statement(s) executed, 16055625 row(s) affected, exec/fetch time: 268.371/0.000 sec [1 successful, 3 warnings, 0 errors]
20:28:14 [INSERT - 14009767 row(s), 71.928 secs] Command processed
... 1 statement(s) executed, 14009767 row(s) affected, exec/fetch time: 71.928/0.000 sec [1 successful, 0 warnings, 0 errors]
Brian Van Klaveren
Local indexes created for several tablespaces.
3x Speedup with the following simulated front page SQL:
It doesn't seem like Oracle isn't always smart enough to understand the partition across joins, so that's why I did the (select * from PROCESSINSTANCE_TEST where stream....) because that ultimately references the partition from the roottask, then one stream partition, then one processinstance partition (hopefully).
Brian Van Klaveren
Creating a partitioned index for the stream table on the rootstream makes the index scanning very quick.
For example, we can find all latest child streams really quickly for a given root stream:
CREATE INDEX idx_stream_t_rootstrm ON stream_test (rootstream, islatest) GLOBAL PARTITION BY HASH (ROOTSTREAM) PARTITIONS 128 NOLOGGING;
Brian Van Klaveren
Modified the Pipeline web interface to use a new Common Table Expression (CTE) based recursion. Performance is roughly equivalent with the previous query, in some cases much better and in a few cases ~1.3x slower when tested with SRS queries. In addition to that, this query actually improves in that it catches an edge case that wasn't being properly handled with the previous query.
Some additional benefits may come in the future when partitioning is enabled.
Brian Van Klaveren
It's been decided the potential benefits of REFERENCE'd partitioning are hard to materialize properly. I've switched to ROOTSTREAM based partitioning which should provide us with more runtime benefits, hopefully with some improvement in the web application, notable that task.jsp page.
Of note, three things are known to be of issue:
Not enough tablespace, so we must create several new tablespace files immediately upon migrating.
Possible problems with the ROOT_STREAM_OF function
The following is a candidate script for the glast_dp_test migration:
Brian Van Klaveren
Query for determining the streams which are no longer in the latest path
Determining the actual streams that need to be updated (or in this case, the count):
Brian Van Klaveren
Notes on my procedures:
1. mv monitor, stop pipeline.
2. ~bvan/download-maven-deps.sh -a srs:org-srs-pipeline-server -v 1.5-SNAPSHOT
3. ~bvan/download-maven-deps.sh -a srs:org-srs-pipeline-client -v 1.2-SNAPSHOT
-- copy group manager 1.21
-- copy sqlj-1.2.jar from another directory
4. wget maven projects
5. modify run-pipeline-1.5.csh
-- add dependency for sqlj-1.2.csh in plugin path
6. modify pipeline executable script
-- Check directories
7. perform SQL migration
8. run islatest migration completion
Brian Van Klaveren
Should optimize pipeline and web application to get rid of PII.GetStreamIsLatestPath nonsense after move