Implementing ACLs in a database

Brian Van Klaveren posted on Aug 23, 2012

There's a writeup on how Spring does this:

http://static.springsource.org/spring-security/site/docs/3.0.x/reference/domain-acls.html

datacatalog

RESTful services and Java book, Asynchronous or long SQL requests and servicing

Brian Van Klaveren posted on Aug 15, 2012

There's a Safari on-line e-book available for java RESTful services

RESTful Java with JAX-RS is available for reading online.

Strategies for long queries and RESTful services

I was thinking of adding a ConcurrentHashMap/AtomicInterger to a given service for very get requests.

The idea and strategy is this:

For requests we think might take a long time, we start a new Asynchronous query. These requests shouldn't be requests that are performed very often. This query is assigned a new AtomicInteger.
When the query executes, we set a timer, maybe 10 seconds. If the query finishes, we just return the result like normal.
If the query doesn't finish, however, we create a new AsyncQueryStatus object and added to the ConcurrentHashMap with the key being the AtomicInteger. We then return that status object.
The client can poll to a new REST resource which corresponds to the ConcurrentHashMap for status. While the query is still executing, the same status will be returned.
When the query is finished, the status in the ConcurrentHashMap is updated and the original object is added.
The client gets the result on the next polling.

Just some thoughts.

Debugging long-running sql queries with bind variables

Brian Van Klaveren posted on Aug 01, 2012

Getting your bind variables back

Get a heap dump.
- For the pipeline:
  1. Hook up jconsole to the instance
  2. select com.sun.management
  3. select operations
  4. click on dumpHeap. The dump will be local to java instance you connected to, put in a file name, and click true.
  5. Transfer that file back to your computer
Open up the heap dump in Netbeans
- Go to Profile -> Load Heap Dump
- Select your heap dump
Find your statements with OQL:
- For prepared statements use:
  select {instance: sql, content: sql.sqlObject.odbc_sql.toString()} from oracle.jdbc.driver.T2CPreparedStatement sql
- For callable statements (i.e. stored procedures) use:
  select {instance: sql, content: sql.sqlObject.odbc_sql.toString()} from oracle.jdbc.driver.T2CCallableStatement sql
For the prepared statements, take a look at the content. The content should be the sql you are looking for.
You're going to have to look through the bind variables in the T2C*Statement. Usually called lastBound* or boundInt, boundLong, etc...

Speeding up pipeline while it's running, OLTP vs. OLAP

Brian Van Klaveren posted on Jul 31, 2012

In addition to partiotioning tables to speed things up, we should really think about adding these tables of active streams and processinstances.

ActiveStream

This new table should be all streams that are not in a finalized state.

ActiveProcessInstance

This is the same. Maybe this makes more sense to do first.

I think we could keep simpler indexes which would keep bookkeeping to a minimum, speeding up transaction time in general and reducing row locks. In addition to this, this might be good if we move to a server-client DB model from the stored procedure model.

I'll need to detail any complications that might arise from this model.

pipeline

Extensibility and usability improvements for the Pipeline and DataCatalog

Brian Van Klaveren posted on Jul 26, 2012

One major thing I want to add is support for a RESTful JSON API for the pipeline and the datacatalog.

Right now, we have the following pipeline packages:

Server
Client
Stored Procedures

... and the same hierarchy for the datacatalog.

With exception to the stored procedure packages, all of these packages rely on org-srs-datahandling-common. I think it might be useful to move some interfaces for Task, Stream, Process, Dataset, DatasetGroup, DatasetLocation, etc... into srs-datahandling-common. Things that might get used via a web application, a web/REST application/API, the *-client and *-server projects.

Improvements to Pipeline database performance and scalability

Brian Van Klaveren posted on Jul 12, 2012

What?

I'd like to split the ProcessInstance table to only have information common to both 'BATCH' and 'SCRIPT' processinstances.

This should significantly speed up row retrieval for a lot of operations where the other information is not needed (Especially because of the WorkingDir and LogFile columns), increase database performance

Things to do:

Make a BatchProcessInstance table with the following columns:

ProcessInstance
JobSite
JobID
SubmitDate (Maybe should just be in ProcessInstance even though Jython scriptlets will never use it)
CPUSecondsUsed
MemoryUsed
SwapUsed
ExecutionHost
WorkingDir
LogFile

This should nearly eliminate row migration in the ProcessInstance table which should really, really help index performance (especially with partitioning)
We can easily do a left outer join between the two tables to get the same thing as the processinstance table.

After that, things we should do:

Simplify ProcessInstance class in org.srs.pipeline.server.sql to correspond to the reduced ProcessInstance table
Create a new BatchProcessInstance class that corresponds to the join of ProcessInstance and BatchProcessInstance
We need to deal with Row migration/chaining due updating the logfile directory. Options are:
- PCTFREE should be tuned
- A temporary table to hold BatchProcessInstance data until we get LogDir, when we get that, insert it into BatchProcessInstance
- ?

This site has some interesting information: http://www.akadia.com/services/ora_chained_rows.html

Cleaning up the Stored Procedures for the Pipeline

Brian Van Klaveren posted on Jul 10, 2012

Why?

I think we want to eventually explore support for MySQL, but also it'd be nice to really have these things cleaned up as well.

Steps:

Determine performance critical calls and calls which may iterate over large result sets. These should be ran on the database, whether in a stored procedure or a java application.
- Like SPStream and SPProcessInstance methods, along with any helpers for these classes.
Code which is called by DatabaseUtilities that doesn't conform to 1 should just be moved to DatabaseUtilities
Clean up both DBU and stored procedures, separate out the SQL from the code, aim to reduce the LoC in DBU and Stored procedure classes
Eventually move away from oracle's jpub remote calls to something else (autocommit needs to be off), we still want SPs on the DB server, but we might want to move away from Oracle sometime in the future.
After this, we'll also want to move towards handling more concurrency issues in code vs. in the database to reduce the likelihood of locks I think.

Cleaning up java code by pulling out SQL and using QueryLoader

Brian Van Klaveren posted on Jul 10, 2012

Let's think about moving sql out of java in order to make queries easier to read and allow for alternate queries according to database

A good first step I think would be to use QueryLoader and put the sql in a properties file. Maybe we could put our own query loader in org-srs-datahandling-common.

http://commons.apache.org/dbutils/apidocs/org/apache/commons/dbutils/QueryLoader.html

Example of use:

sql.properties

sqlstatement=\
select someid, someproperty from sometable where someid = ?

sqlstatementtrue=\
select someid from sometable\
    where someproperty = ?

MyClass.java

    final Map queryMap;
    public MyClass(){
        queryMap = QueryLoader.instance().load("/org/srs/project/sql.properties");
    }

    void someMethod(){
         Connection connection = null;
         PreparedStatement stmt = null;
         ResultSet rs = null;
         try{
             connection = Utils.getConnection();
             stmt = connection.prepareStatement(queryMap.get("sqlstatement"));
             stmt.setLong(1,1234);
             rs = stmt.executeQuery();
             if(rs.next()){
                 System.out.println(rs.getString(2);
             }
         catch (Exception e){
             System.out.println("Some error happened");
         }
    };

First blog post about coding

Brian Van Klaveren posted on Jul 10, 2012

I think I'm going to try to document coding and problems I run in using the blog format on my profile, although maybe a general "Coding Blog" might be better.

We'll see how the labels work too

1 Comment ·

Testing blog post

Brian Van Klaveren posted on Jul 10, 2012

Testing my first blog post.