Web frontend for skimming (architecture and use cases)

Igor Pavlin
December 13, 2007
(Draft version 0.1)

This document is an attempt to formalize the scope of Skimmer Web application, from user and developers point of view. The intention is also to define more precisely the architecture of the web application and to stress the importance of use case scenarios, as the luck of these usually leads to misunderstanding of what the application can and should do.

In this paper I will define several use case scenarios, domain objects and navigation between the web application pages. The skimmer application has changed dramatically since my first DataPortal web application. Change came with the introduction of backend skimmer application, use of Data Catalog, Pipeline II and xrootd servers, so I feel compelled to lay down assumptions used in building this complex application. This is especially important now when we are trying to add more functionality to skimmer and connect it to other tools that are processing GLAST ROOT files.

There is a conflicting need between specialization of skimmer tasks for some special purpose and a full skimmer application which would make a more general and powerful application. I will try to address differences and problems in this paper. While in simpler case it is acceptable to use standard Servlet/JSP architecture where business logic is embedded in JSP pages (like in the first version of DataPortal), the complexity of the interaction between various parts of the application and the other GLAST servers and applications, makes the classic Servlet/JSP architecture developers nightmare regarding maintenance and ever present need for enhancements. So I distinguish two architectures: Simple Skimmer and Full Skimmer. In the first case of Simple Skimmer, a limited functionality is needed for very specific purpose. This often leads to rewriting the whole application when some other simple functionality is needed. For the Full Skimmer, a more general application framework is needed, which I have chosen to be Struts 2 architecture. The hope is that Full Skimmer functionality is a super set of Simple Skimmer functionality, and that they will use similar set of objects on which to build both applications. However, the Simple Skimmer functionality is needed for immediate use in some cases.

1. Skimmer application from user point of view

Here are usual steps used in the current skimmer web applications:

1. User asks for /skimmer application and is forwarded to the Welcome page
2. User is informed in the header of any page about the name of the application, login status, session duration, and type of database connection used by the application.
3. The header contains GLAST defined logo for the skimmer application.
4. User can click on links that lead him to the help, the application version release notes, the Confluence and Jira sites.
5. User clicks on one of the tabs or links in the Welcome page performs an action in attempt to form a Skimmer Task and submit it to the Pipeline server.
Note: action is used here in a more general sense and means collecting some information in application's web page and submitting form data to the application server for further processing.
6. If the action is guarded, the user is forced to login through CAS server if he/she is not authenticated.
7. In the Welcome page user is offered several links: to Skimmer page, to Data Catalog page, to Skimmer Task page, to History page, to User page, to Simple Skimmer page. Although the user is going to follow links most of the time in the mentioned order, that is not required.
8. On the Skimmer page, the user fills in data needed to form a SkimmerRequest
9. On the Data Catalog page, the user fills in data needed to form FileList from Datasets stored in Data Catalog.
10. On the Skimmer Task page, the user fills in data needed to populate SkimmerTask data before submission to the pipeline server
11. On the History page, user is offered a choice to review earlier task of particular type and pre-populate request data from an earlier skimmer task of the same type.
12. On the User page, user can modify the email where Pipeline will send the notifications about Skimmer Task, and the user expertise level. The later enables an experienced user to see and change some more exotic fields used by the backend skimmer.
13. User is informed in the footer of the application about running status of the application, the versions of the application and backend skimmer and if the user is expert, debug information.
14. In a case there is an user error in application (improper field entries) user is immediately warned to correct the error
15. In a case there is an application error, user is forwarded to error pages.

Use case scenarios

Bellow we define several use case scenarios in greater detail:

1. Forming Simple Skimmer Request

This page allows you to select a subset of data from GLAST merit tuples, stored in a predefined Data Catalog logical set of merit tuples of data. User is offered a drop down list of 'Data Source's which he/she can use.

User is allowed to change a subset of environment variables used by the most recent version of the backend skimmer:

i) TCut
ii) Min Run Number
iii) Max Run Number
iv) Branch List
v) Possibility to generate Fits files
vi) Defining Max ROOT File size
vii) Request specific comment.

From this page user is forwarded to the Confirm Skimmer Settings page where he can either go back to change settings or to submit Skimmer Task to pipeline server . User has no control about any other parameters or execution.

There is verification of TCut, Min, Max entries and user is offered to proceed with ignoring verification warnings or to go back in case of verification error.
The user is given an estimate of the CPU time that will be used by the task.

1.1 The following are new use cases and requirements to be added to the SimpleSkimmer task: TBD

2. Forming Full Skimmer Request

Full Skimmer Request adds more control amd choices to user when creating a Skimmer Task.

1. This page allows user to define all the environment variables used by the backend skimmer and is therefore geared more towards an expert user. Besides the fields that are available from the Simple Skimmer request page, user has option to specify on the Skimmer Request Page:
2. Option to give the Skimmer Task a custom name
3. ROOT version used to create files to be skimmed
4. Choice of GLAST tuples (merit, svac)
5. Branch File to be uploaded from the local machine
6. Event File to be uploaded from the local machine.
7. Control of more exotic backend skimmer request parameters (for expert user)

The user is than forwarded to Data Catalog page, where he forms a Data Catalog query to form a File List of files to be skimmed. (See Forming File List from Data Catalog (Full Skimmer).
After completing the FileList and SkimmerRequest forms, user is than forwarded to Pipeline page where he enters Skimmer Task relevant properties (see Forming Pipeline Task (Full Skimmer) use case.
Full Skimmer applies verification on each of the field entries.

3. Forming File List from DataCatalog (Simple Skimmer)
TBD
4. Forming File List from Data Catalog (Full Skimmer)
TBD
5. Forming Pipeline Task (Full Skimmer)
TBD
6. Forming User data
TBD
7. Using History page
TBD

2. Skimmer application architecture

(Work in progress)

Skimmer applications (Simple and Full Skimmer) are Java web applications running on Tomcat server.
In general, skimmer applications use some common libraries shared by other applications, Servlet/JSP. In additions Full Skimmer application uses Struts2 framework.
There are also many other auxiliary libraries that are used in application which will not be discussed here.
All the skimmer libraries are built using Maven 2 builds.

2.1. Common libraries

Skimmer application uses the following libraries:
? org-glast-datahandling-common
? org-glast-dataportal-model
? org-glast-datacat-client
? org-glast-pipeline-client
The following are APIs used by Skimmer: ...

2.2. Simple Skimmer architecture

Main purpose of Simple Skimmer application is to keep user at the minimum functionality needed to accomplish simple skimming tasks. As such, most of the control and business logic is implemented using Java Servlet/JSP and can do only a certain amount of simple skimming tasks.

2.3. Full Skimmer architecture

The architecture of the Full Skimmer application is as follows.

2.3.1 Model Objects

The model objects are User, Request, Task (inherited from DataPortalModel lib) DatacatInfo, PipelineInfo. The model objects are used by Actions to populate user entries.

Note: The model objects are directly involved as transfer data objects to persistent storage. This caused one of the major decisions to switch from Struts1 to Struts2. Struts2 has simplified the transition between the form data to permanent storage through the set of type convertors and eliminating need for FormData used in Struts1.

1. User object allows user change email and expertise level. It also contains username, which is read only.
i. Note: username is unique, because only SLAC authenticated users can create User objects, and SLAC usernames are unique.
2. Request object lets user populate SK_xxx environment variables and user comment about that request.
3. Request objects have also type, to distinguish skimmer, simpleSkimmer, pruner, peeler, etc type of request. Since these requests are different, we can think that taskType (which is the same as requestType) is also unique. If the request type is 'skimmer' it points to SkimmerRequest object.
4. DatacatInfo object contains user choices for submitting information needed by the pipeline task and creating proper lists of files from data catalog to work on. This is information pertinent to that particular user's task.
5. PipelineInfo object contains user choices for submitting information needed by the pipeline task. This is information pertinent to that particular pipeline user's task.
6. Service object. Service has a type. It can be an oracle service (if application is deployed on Tomcat with Oracle database), a memory service, or an mysql service. Service is created by factory depending on the type. Service provides connection to the Dataserver database (GLASTGEN), or application memory. The service works not directly to the database, but rather through help classes: UserDao, RequestDao. Service provides connection to database.
7. Task object. The connections to the Pipeline and Data Catalog are handled by the Task object.
8. UserDao object performs creating/updating/retrieving/deleting data for User object.
9. RequestDao object performs creating/updating/retrieving/deleting current request data for the Request object.
10. History object keeps collective information about the former user requests. History object can also display information about other user requests.
11. HistoryEntry objects keeps only some information about user previous requests:
a. Request No
b. Task Type
c. Submit Time
d. Output Directory
e. User Comment

2.3.2 Actions

There are several categories of actions:

1. Actions that build main domain objects: User object, Request object, Task object, History object
2. Actions that display user profile (object) and enable only authorized user to use skimmer application.
3. Actions that display user history to help create new tasks from previous user tasks
4. Actions that perform interaction with the data catalog to obtain different data sets needed for skimming
5. Actions that submit Skimmer Task to Pipeline server and collect information from the server.
6. BaseAction has an abstract method - getModel which returns the domain object ( model) the action is working with. For example, UserAction will return User object.

2.3.3 Persisting model objects

Typical flow of information for a persisted object is as follows (example of user object):

1. User logs in and username is used to authenticate the user
2. From username, an User object is created and stored in the Session object. The presence of the User object in the Session object is a sign that the user is logged in. When the user loggs out, the User object in the Session object needs to be set to null.
3. With the User object, the UserDao object is created.
4. User has access to the UserDao
5. UserDao object asks database Service object to initiate the database connection and retrieves data for the user, based on the username.
6. UserDao populates the User object with e-mail information and expertise level if the user exists in the database, otherwise, it creates a default e-mail username@slac.stanford.edu and expertise-level false. UserDao creates and stores the user in database in that initialization case.
7. This completes information about the user object and makes it available to the UserAction for display, through the UserAction model object which is the User.
8. User clicks on the User tab (link) and gets displayed the full information about the user.
9. User can modify e-mail or expertise level and click on preview
10. User gets preview information on the preview page and clicks on confirm button, if he/she agrees with user data.
11. When the user confirms choice, UserDao object stores the new User information in database and the confirmation page is displayed to the user. Both User and UserDao now point to new user data.
12. If anything goes wrong during the process, a DataserverException is thrown and the information is displayed on the error page.

2.3.4 Tables in database:

There are three tables currently used by the Skimmer. They are hosted on GLASTGEN table space. They are available for Prod, Dev and Test versions of Skimmer. The tables are DP_TASK_TYPE, DP_REQUEST, and DP_CONFIG.

At the moment tables are not cleanly separated, but the idea is that DP_TASK_TYPE holds info about valid skimmer task types, DP_REQUEST holds data about user entries for the request (skimmer, pruner, etc) and DP_CONFIG holds the parameters of the pipeline task (for example, ROOTSYS, DP_REPLYTO etc).

The following are table entries:

DP_TASK_TYPE,
TASK_TYPE, like skimmer (meaning Full Skimmer), SimpleSkimmer, DC2Prune
URL, url of the task, like, http://glastground.slac.stanford.edu/DC2Prune

DP_REQUEST,
REQUEST_PK, primary key, number based on timestamp
USERID, user name
TASK_TYPE, from table above
USER_COMMENT, entered during user request creation
ID, streamId (might be deprecated)
REQUEST_DATA, clob of all data, typical to TASK_TYPE
SUBMIT_DATA, when submitted, timestamp
OUTPUT_DIR, where to expect result of task data (could be obsolete)

DP_CONFIG,
NAME, key of the configuration parameter
VALUE, value of the configuration parameter
VERSION, of the key-value pair

2.3.5 Request Object

Request is an object that reflects columns in DP_REQUEST and therefore contains also RequestData (clob).

Request fields are:

1. Long requestId - unique identifier for the request. The request with largest requestId is considered the most recent
2. String username - unique name of the user who submitted the request
3. String taskType - skimmer, simpleSkimmer, ...
4. String userComment - user supplied comment that goes with the request.
5. String outputDir - output directory computed by the pipeline, where one can find the log files and results of the skimmer runs.
6. String id - might be obsolete, or not, an array of streamIds that processed the request

2.3.6 RequestData object

RequestData is saved as a clob in the database and can save different specific fields used by Skimmer, Pruner, Peeler actions, in a similar manner. This is a simple way to extend RequestData beyond initial implementation.

RequestData fields for Skimmer request are:

1. long requestId
2. String taskName
3. String branchListPath
4. String fileListPath
5. String eventListPath
6. String releaseFilePath
7. RootVersion rootVersion
8. String userOutputDirectory
9. String tCut
10. String tCutDataType
11. int runMin
12. int runMax
13. int maxFilesize
14. List<String> dataTypes
15. List<String>debugFlags
16. List<String>skipFlags
17. List<String>forceFlags
18. boolean testDB
19. int skimmerMaxCPU
20. int mergeMaxCPU
21. int numSubtasks

2.3.7 Workflow

User is presented with several JSP pages which display forms that he needs to populate. The navigation between pages is such, that all the information for creation of Skimmer Task is gradually created. This implies usage model that does not collect all the information in one gigantic web page, but rather several small pages implying wizard type of workflow.

To create a skimmer user needs to basically acquire the following information:

1. Data required to form a skimmer request (basically env variables needed by backend skimmer).
2. Data required to define various versions of backend skimmer release (release, libraries, paths, ROOT used, etc)
3. Various kinds of file lists that will collect particular type of merit or svac files and run.event type information from Data Catalog.
4. Various kinds of event lists that will be collected from local file system
5. Information needed to obtain data for file list from Data Catalog (which Data Catalog server, location of logical paths, and filter used to create such a file list.
6. Various branching parameters to be used during pruning.
7. Various datasets (datasets being special sets of files to be processed and are grouped together for a particular major initiative in GLAST).
8. Various parameters needed to form a Pipeline task (like number of subtasks, batch farm parameters, location of various working directories, etc).
9. Various parameters needed to inform the user about the execution and location of results of the Pipeline Task
10. Some help pages which will help user create new jobs based on previously submitted jobs (History)
11. Various help links, Jira and Confluence location, and (Simple) Skimmer location.
12. Error pages that will help user understand what went wrong
13. Validation of many fields in the forms will be required, as they are part of business model.

Since this information is scattered through domain objects, and there are many of them, the approach is to navigate user through a series of smaller forms, that are collecting all the information listed above. The information in the forms is entered/removed/updated until user decides to submit the job to pipeline, in which case also the business object values are stored in database for future use.

The navigation is currently taking place between the following pages:

1. Welcome page (entry page, and a menu type of page that directs the user to various points in the workflow).
2. Skimmer page which collects only request data (some of the fields will be moved to a Pipeline page, to collect Pipeline Task related parameters)
3. Skimmer flags page which collects various backend skimmer flags, used by expert user, to alter the flow of processing of the backend skimmer.
4. FileUpload page (and MultipleFileUpload page) are used for some of the steps when a file data need to be collected.
5. Datacat page which collects information about file lists to be created using Data Catalog
6. Skimmer Task page which collects parameters required to submit Skimmer Request to pipeline server.
7. History page, which gives pointers to previously submitted Pipeline tasks.
8. User page, that collects useful user information.
9. Login page (administered by CAS login)
10. Error page for various types of application errors.

The simplest solution for workflow is to use Struts2 scope interceptor. The object of this interceptor is to collect information in session scope.

For example, as user enters basic parameters for the backend skimmer request (in essence various backend skimmer environment variables), user must also provide File List from Data Catalog (our business model). File List will be part of the Skimmer Request, but it is updated from Data Catalog page. Various parameters for Skimmer Task are collected on separate Skimmer Task page (number of sub-processes, various batch farm parameters, pipeline directories, etc).

Space shortcuts

Child pages