This page is designed to be a place to post and answer questions related to the transition of operations from SLAC staff to the FSSC.

Unanswered Questions

Which of the contents of that Subversion repository will the FSSC be responsible for maintaining?

Are the versions of the software currently in the repository and those being used on disk in sync?  If not, when will the repository be updated with the correct versions and by whom?

Reading through the Pipeline II User's Guide and there is a big red box that says "*Everything beyond this point is a big mess and probably wrong.*".  Who at SLAC can update that section so it's not a big mess and correct?

Who will be able to grant permissions in the future?  When we needed access to the glast and glastraw accounts, Tom Glanzman provided it.  For the glastops account, he had Steve Tether do it. 

  There are a number of different types of permissions to consider.  Access to the glast and glastraw accounts, i.e., login privilege, is granted by AFS groups.  (See AFS Group Structure and Ownership Hierarchy for details).  Currently, there are four individuals who can modify the 'login' privileges: richard, tonyj, dragon, heather.  This list could potentially be expanded in the future.

Who do we contact for system related problems?  When there was an Oracle database failure, Warren emailed  He then said that "there's nothing more they can do but hope someone from db-admin responds".  Who's that?

Partially Answered Questions

How do we access the Subversion repository (as mentioned in the ISOC software summary page) for the ISOC software?  The root of the repository is at /nfs/slac/g/glast/online/svnroot.  What permissions are needed to access this directory?

Partial answer from Brian on Slack relative to permissions: "I think you need to be added to the `glonline` group" to access the directory

You do need to be a member of glonline.  There doesn't seem to be a web interface.  Steve accesses it through an emacs plugin.

Where do each of the pieces of software in the Subversion repository get installed when they are updated?  And what is deprecated and doesn't need to be maintained?

Steve Tether updated the page we created to keep track of this (ISOC SVN Repsitory Information) with a lot of information.  It's still partially incomplete but a lot closer.

Is responsibility for maintaining the Fermi LAT Portal and its associated Java web applications ( moving the FSSC as well or is that staying at SLAC?

Partial answer from Brian (definitive answer still needed):

For the web applications, I think we are having a discussion about that, but in general they stay at SLAC because they rely on Oracle. I think we're still going to try to figure out how support for them will work.

Some of the applications are used by other experiments, so we are still maintaining those (and will have to keep maintaining them)

Is there any documentation on what the SLAC computing center is looking at for containers going forward?  There was some talk about it at the software week and IIRC it wasn't docker.  I'd like to start getting tied into their plans for the future of the pipeline.

The technology to be used is Singularity.  Still need documentation on the plans.

Is this page - Servers and Aliases -  up to date?  If not, who at SLAC is responsible for getting it up to date and what is the timeline for doing so?

No. Tom Glanzman

How/when do you decide that a running job is hung?  Is there a set timescale?  Does it depend on the type of job?  Is this documented anywhere?

After a few hours.  It does typically depend on the job but if a job is not making any progress after an hour or so, it's probably hung.  This is not really documented anywhere.

How do you close a host in the batch system?  What criteria determine if it should be closed or not?  Do we ever reopen them?  If so, what determines when we should?  How is that done?

I think at least the how to open and close hosts is documented somewhere here in confluence, need to find it and link it in.  Need answers on determining when to do so.

Answered Questions

What is the timeline for working the FSSC staff into the rotation of watching the pipeline?

Done, Tom is regularly monitoring the pipeline and will be training Don and Joe E.

Where can I find top level information about various software?

What are the commands to change the state of running jobs (i.e. from RUN to USUSP and back) in the batch system?  Is there a reference somewhere?  I'm assuming you need to be on fermilnx-v16 (where the pipeline is running) and logged in as glastraw.  Can it be done from the pipeline monitoring page?

The bstop and bresume command are used to pause and resume jobs respectively.  They require the jobID value of the batch job to run.

You need to be logged in as the pipeline user (glastraw) in order to execute the commands.

You can run on multiple batch jobs at a time with a command similar to:

          bresume `bjobs -u glastraw -q medium | awk '/USUSP/{print $1}' | head -100`

which would select jobs from user glastraw in the medium queue that have the state USUSP and list out the first 100 jobID values to be submitted to bresume.

Is the OpsLog still used? (My guess is no since the last entry I can see is from Feb. 2013.)  If it is, how do we get login privileges since my SLAC username/password don't work?





  • No labels