Skimmer Web Application (architecture and use cases)

Igor Pavlin
December 13, 2007
(Draft version 0.1)

...

There is a conflicting need between specialization of skimmer tasks for some special purpose and a full skimmer application which would make a more general and powerful application. I will try to address differences and problems in this paper. While in simpler case it is acceptable to use standard Servlet/JSP architecture where business logic is embedded in JSP pages (like in the first version of DataPortal), the complexity of the interaction between various parts of the application and the other GLAST servers and applications, makes the classic Servlet/JSP architecture developers nightmare regarding maintenance and ever present need for enhancements. So I distinguish two architectures: Simple Skimmer and Full Skimmer. In the first case of Simple Skimmer, a limited functionality is needed for very specific purpose. This often leads to rewriting the whole application when some other simple functionality is needed. For the Full Skimmer, a more general application framework is needed, which I have chosen to be Struts 2 architecture. The hope is that Full Skimmer functionality is a super set of Simple Skimmer functionality, and that they will use similar set of objects on which to build both applications. However, the Simple Skimmer functionality is needed for immediate use in some cases.

1. Skimmer application from user point of view

Here are usual steps used in the current skimmer web applications:

1. User asks for /skimmer application and is forwarded to the Welcome page
2. User is informed in the header of any page about the name of the application, login status, session duration, and type of database connection used by the application.
3. The header contains GLAST defined logo for the skimmer application.
4. User can click on links that lead him to the help, the application version release notes, the Confluence and Jira sites.
5. User clicks on one of the tabs or links in the Welcome page performs an action in attempt to form a Skimmer Task and submit it to the Pipeline server.
Note: action is used here in a more general sense and means collecting some information in application's web page and submitting form data to the application server for further processing.
6. If the action is guarded, the user is forced to login through CAS server if he/she is not authenticated.
7. In the Welcome page user is offered several links: to Skimmer page, to Data Catalog page, to Skimmer Task page, to History page, to User page, to Simple Skimmer page. Although the user is going to follow links most of the time in the mentioned order, that is not required.
8. On the Skimmer page, the user fills in data needed to form a SkimmerRequest
9. On the Data Catalog page, the user fills in data needed to form FileList from Datasets stored in Data Catalog.
10. On the Skimmer Task page, the user fills in data needed to populate SkimmerTask data before submission to the pipeline server
11. On the History page, user is offered a choice to review earlier task of particular type and pre-populate request data from an earlier skimmer task of the same type.
12. On the User page, user can modify the email where Pipeline will send the notifications about Skimmer Task, and the user expertise level. The later enables an experienced user to see and change some more exotic fields used by the backend skimmer.
13. User is informed in the footer of the application about running status of the application, the versions of the application and backend skimmer and if the user is expert, debug information.
14. In a case there is an user error in application (improper field entries) user is immediately warned to correct the error
15. In a case there is an application error, user is forwarded to error pages.

Use case scenarios

Bellow we define several use case scenarios in greater detail:

1. Forming Simple Skimmer Request

This page allows you to select a subset of data from GLAST merit tuples, stored in a predefined Data Catalog logical set of merit tuples of data. User is offered a drop down list of 'Data Source's which he/she can use.

...

There is verification of TCut, Min, Max entries and user is offered to proceed with ignoring verification warnings or to go back in case of verification error.
The user is given an estimate of the CPU time that will be used by the task.

1.1 The following are new use cases and requirements to be added to the SimpleSkimmer task: TBD

2. Forming Full Skimmer Request

Full Skimmer Request adds more control amd choices to user when creating a Skimmer Task.

...

3. Forming File List from DataCatalog (Simple Skimmer)
TBD
4. Forming File List from Data Catalog (Full Skimmer)
TBD
5. Forming Pipeline Task (Full Skimmer)
TBD
6. Forming User data
TBD
7. Using History page
TBD

2. Skimmer application architecture

(Work in progress)

Skimmer applications (Simple and Full Skimmer) are Java web applications running on Tomcat server.
In general, skimmer applications use some common libraries shared by other applications, Servlet/JSP. In additions Full Skimmer application uses Struts2 framework.
There are also many other auxiliary libraries that are used in application which will not be discussed here.
All the skimmer libraries are built using Maven 2 builds.

2.1. Common libraries

Skimmer application uses the following libraries:
? org-glast-datahandling-common
? org-glast-dataportal-model
? org-glast-datacat-client
? org-glast-pipeline-client
The following are APIs used by Skimmer: ...

2.2. Simple Skimmer architecture

Main purpose of Simple Skimmer application is to keep user at the minimum functionality needed to accomplish simple skimming tasks. As such, most of the control and business logic is implemented using Java Servlet/JSP and can do only a certain amount of simple skimming tasks.

2.3. Full Skimmer architecture

The architecture of the Full Skimmer application is as follows.

2.3.1 Model Objects

The model objects are User, Request, Task (inherited from DataPortalModel lib) DatacatInfo, PipelineInfo. The model objects are used by Actions to populate user entries.

...

1. User object allows user change email and expertise level. It also contains username, which is read only.
i. Note: username is unique, because only SLAC authenticated users can create User objects, and SLAC usernames are unique.
2. Request object lets user populate SK_xxx environment variables and user comment about that request.
3. Request objects have also type, to distinguish skimmer, simpleSkimmer, pruner, peeler, etc type of request. Since these requests are different, we can think that taskType (which is the same as requestType) is also unique. If the request type is 'skimmer' it points to SkimmerRequest object.
4. DatacatInfo object contains user choices for submitting information needed by the pipeline task and creating proper lists of files from data catalog to work on. This is information pertinent to that particular user's task.
5. PipelineInfo object contains user choices for submitting information needed by the pipeline task. This is information pertinent to that particular pipeline user's task.
6. Service object. Service has a type. It can be an oracle service (if application is deployed on Tomcat with Oracle database), a memory service, or an mysql service. Service is created by factory depending on the type. Service provides connection to the Dataserver database (GLASTGEN), or application memory. The service works not directly to the database, but rather through help classes: UserDao, RequestDao. Service provides connection to database.
7. Task object. The connections to the Pipeline and Data Catalog are handled by the Task object.
8. UserDao object performs creating/updating/retrieving/deleting data for User object.
9. RequestDao object performs creating/updating/retrieving/deleting current request data for the Request object.
10. History object keeps collective information about the former user requests. History object can also display information about other user requests.
11. HistoryEntry objects keeps only some information about user previous requests:
a. Request No
b. Task Type
c. Submit Time
d. Output Directory
e. User Comment

2.3.2 Actions

There are several categories of actions:

1. Actions that build main domain objects: User object, Request object, Task object, History object
2. Actions that display user profile (object) and enable only authorized user to use skimmer application.
3. Actions that display user history to help create new tasks from previous user tasks
4. Actions that perform interaction with the data catalog to obtain different data sets needed for skimming
5. Actions that submit Skimmer Task to Pipeline server and collect information from the server.
6. BaseAction has an abstract method - getModel which returns the domain object ( model) the action is working with. For example, UserAction will return User object.

2.3.3 Persisting model objects

Typical flow of information for a persisted object is as follows (example of user object):

1. User logs in and username is used to authenticate the user
2. From username, an User object is created and stored in the Session object. The presence of the User object in the Session object is a sign that the user is logged in. When the user loggs out, the User object in the Session object needs to be set to null.
3. With the User object, the UserDao object is created.
4. User has access to the UserDao
5. UserDao object asks database Service object to initiate the database connection and retrieves data for the user, based on the username.
6. UserDao populates the User object with e-mail information and expertise level if the user exists in the database, otherwise, it creates a default e-mail username@slac.stanford.edu and expertise-level false. UserDao creates and stores the user in database in that initialization case.
7. This completes information about the user object and makes it available to the UserAction for display, through the UserAction model object which is the User.
8. User clicks on the User tab (link) and gets displayed the full information about the user.
9. User can modify e-mail or expertise level and click on preview
10. User gets preview information on the preview page and clicks on confirm button, if he/she agrees with user data.
11. When the user confirms choice, UserDao object stores the new User information in database and the confirmation page is displayed to the user. Both User and UserDao now point to new user data.
12. If anything goes wrong during the process, a DataserverException is thrown and the information is displayed on the error page.

2.3.4 Tables in database:

There are three tables currently used by the Skimmer. They are hosted on GLASTGEN table space. They are available for Prod, Dev and Test versions of Skimmer. The tables are DP_TASK_TYPE, DP_REQUEST, and DP_CONFIG.

...

DP_CONFIG,
NAME, key of the configuration parameter
VALUE, value of the configuration parameter
VERSION, of the key-value pair

2.3.5 Request Object

Request is an object that reflects columns in DP_REQUEST and therefore contains also RequestData (clob).

...

1. Long requestId - unique identifier for the request. The request with largest requestId is considered the most recent
2. String username - unique name of the user who submitted the request
3. String taskType - skimmer, simpleSkimmer, ...
4. String userComment - user supplied comment that goes with the request.
5. String outputDir - output directory computed by the pipeline, where one can find the log files and results of the skimmer runs.
6. String id - might be obsolete, or not, an array of streamIds that processed the request

2.3.6 RequestData object

RequestData is saved as a clob in the database and can save different specific fields used by Skimmer, Pruner, Peeler actions, in a similar manner. This is a simple way to extend RequestData beyond initial implementation.

...

1. long requestId
2. String taskName
3. String branchListPath
4. String fileListPath
5. String eventListPath
6. String releaseFilePath
7. RootVersion rootVersion
8. String userOutputDirectory
9. String tCut
10. String tCutDataType
11. int runMin
12. int runMax
13. int maxFilesize
14. List<String> dataTypes
15. List<String>debugFlags
16. List<String>skipFlags
17. List<String>forceFlags
18. boolean testDB
19. int skimmerMaxCPU
20. int mergeMaxCPU
21. int numSubtasks

2.3.7 Workflow

User is presented with several JSP pages which display forms that he needs to populate. The navigation between pages is such, that all the information for creation of Skimmer Task is gradually created. This implies usage model that does not collect all the information in one gigantic web page, but rather several small pages implying wizard type of workflow.

...

Space shortcuts

Child pages

Versions Compared

Old Version 1

New Version 2

Key

Skimmer Web Application (architecture and use cases)

1. Skimmer application from user point of view

Use case scenarios

1. Forming Simple Skimmer Request

1.1 The following are new use cases and requirements to be added to the SimpleSkimmer task: TBD

2. Forming Full Skimmer Request

2. Skimmer application architecture

2.1. Common libraries

2.2. Simple Skimmer architecture

2.3. Full Skimmer architecture

2.3.1 Model Objects

2.3.2 Actions

2.3.3 Persisting model objects

2.3.4 Tables in database:

2.3.5 Request Object

2.3.6 RequestData object

2.3.7 Workflow

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 1

New Version 2

Key

Skimmer Web Application (architecture and use cases)

1. Skimmer application from user point of view

Use case scenarios

1. Forming Simple Skimmer Request

1.1 The following are new use cases and requirements to be added to the SimpleSkimmer task: TBD

2. Forming Full Skimmer Request

2. Skimmer application architecture

2.1. Common libraries

2.2. Simple Skimmer architecture

2.3. Full Skimmer architecture

2.3.1 Model Objects

2.3.2 Actions

2.3.3 Persisting model objects

2.3.4 Tables in database:

2.3.5 Request Object

2.3.6 RequestData object

2.3.7 Workflow