Current Status
SSRL currently makes its data available to users by sending the data to San Diego where it is stored in SRB and accessed using a web site at jcsg (http://www.jcsg.org/). This web site makes the protein structure data available and deals with user login. Meta-data for the web site is sent from SLAC to San Diego as XML files updated as necessary. Some manual intervention is required at San Diego to import the data into the web site. The details of the implementation of the jcsg web site is currently unknown (to us) but the implementers of the site are SSRL collaborators.
SSRL would like to make the current image data and addition raw data available directly from SLAC but they want to continue to use the JCSG site as the main gateway, and in particular want to continue to use authentication via the JCSG web site. The data should not be made available anonymously. The data files are typically a few GBytes. Currently there is sufficient disk space at SLAC to hold all the data, but we should foresee the possibility that eventually it may be necessary to house some data only on tape.
Solution Outline
One possible solution would be to host the data at SLAC and serve it via a web server (apache or tomcat?) which would deal with controlling access to the data. Conceptually the data access could be achieved by having the jcsg site generate an (random) authentication token valid for a limited time (a few minutes) and sent along with the URL for the data. The server at SLAC would then check the authentication token with the server at jcsg before allowing download of the data file. Http access to the data should be adequate for the size of files we are dealing with. If necessary in future the server at SLAC could also deal with staging the data files from tape, and providing feedback to the user while this is done.
A dedicated web server at SLAC would probably be needed for this task.
Questions
- Would users often want to download a set of files, or is allowing them to access files one by one sufficient?
- How many users would we expect to use this each day, and what would the total volume of data be?
Next Steps
Discuss the feasibility of this approach with the implementers of the jcsg website.