- This page is meant to organize the discussion around the virtualization of Fermi Science Analysis Systems software. Some pieces of software today have a long history and there is a clear lack of man power to have these run on recent platforms. Some software are stuck on RHEL5, other on RHEL6, others run on modern platform. A detailed status point has to be made on each piece to understand the way forward: maintenance, VM, container.
- Information on this page were first gathered from a number of reference pages:
Summary
- Here is summary table with the main software packages
- see data flow section of the Software Week Meeting notes
- created by Johan on Tuesday 6th 2017, limiting myself to RHEL5, RHEL6 and RHEL7/CentOS7
CentOS7
Name Build platform Running platform Special dependencies Upgradable? Existing VM Existing container Links Comments Date FastCopy RHEL5 RHEL6 ? ? , to RHEL6 ? FASTCopy processing chain to be reviewed by experts Halfpipe Halfpipe to be reviewed by experts Isoc Monitoring to be reviewed by experts GlastRelease RHEL6 RHEL6 ROOT v5.34 may be with a lot of work and clean up huge work, including science verification DataMonitoring RHEL6 RHEL6 python2, numpy, ROOT probably if GR is upgraded svac/monitor is similar to GR, then mostly python code ScienceTools RHEL7 RHEL7 RHEL7 Giacomo Sam's / Matt's ASP RHEL7 RHEL7 RHEL7
- see data flow section of the Software Week Meeting notes
- Halfpipe sounds like a candidate..
- No, it runs on RHEL6. But unlikely to move beyond. So yes, virtualize at RHEL6.
- GlastRelease is also stuck on RHEL6
- Couple APIs need QT, using commercial version
- Release Manager uses free version of QT
- Unsure why using commercial version.
- Might be worth exploring move to free version
- Need to have a discussion about FastCopy, as it requires RHEL5.
- ISOC ops boxes are mostly under RHEL5. Demonstrated that the tools can be run under RHEL6.
- Backup ISOC is no longer supported.
What kind of virtualization? VM or container?
GlastRelease:
- GlastRelease needs virtualizations
- RHEL 6 is last release that we have the personnel to support
- A few people running GlastRelease (Developers) - nice use case for Docker. Getting GlastRelease to run on your laptop is painful.
- GlastRelease carries around geant4
- Is there a distinction between Users and Developers for GlastRelease?
- No
Science Tools:
- Focus with ScienceTools is just ease of distribution
Would it be useful to distribute the tools in VMs? Containers? Both?
Joris : I found this VM : Virtual Machine version 3- Are there external dependencies (like xroot-d) that would cause problems with virtualization if backend changes?
We need automated build system for ST: Release manager vs. manual builds
- GR uses xrootd ST does not (Eric)
- Use of virtualization is for convenience - which is most useful thing to do? (Richard)
- Don't depend on NFS/AFS if build container right. Stable for data xrootd
- getting files/libraries and also output data.
- Container helps with diffuse model
- on nodes not on NSF
- on nodes there's low overhead.
- Caching image on all of the nodes.
- Fermi ST image will have the diffuse model in it.
Release Manager: Release manager doesn't talk to Oracle - but it does talk to a database. Not user friendly.
- For slac farm - docker containers for GlastRelease. Need docker registry
- Docker containers is the right solution for batch farm (Brian)
Use their system run to RHEL6 container, but batch host is RHEL7.
- Carefully build container (nice with xrootd)
- need to find out from Warren if FT1, FT2 files included (Richard)
What systems need what kinds of containers?
- Samuel needed to discuss w/simulations at Lyon. (He is sick today)
- What is different for developers/users?
- Same image for all the GR uses.
- Don't want to pull a 3GB image to pull FT1, GR is 3x bigger. Just have 1 image at the moment.
- One giant image - good command line interface installed in that image.
- Images built such that the top looks the same between GR and ST. Keep same image.
- Separate builds for debugging purposes?
- GlastRelease is frozen, ST is constantly evolving. Debugging GR is not a problem, debugging ST is important
- Giacomo
- Mount code at runtime, container doesn't have debugging tools.
- Container provides environment.
- Compile inside the container.
- run debugger inside container.
- User image has everything - compiled.
- Lightweight container for developers then they can compile. Users have full compiled.
- Debugging in GR and ST is very different
- The computing center will have a cache of docker.
- Every project will say what docker images do you want on the batch nodes?
- Plan for managing cashed images. Work out allocations for collaborations.
- Cost of using docker?
Pipeline:
- Needs someone that he could show the pipeline code and train to do heavy lifting when it comes to kicking the pipeline
- Docker containers for something like the batch system may cause some problems, since
- For something like the L1 pipeline, a number of images would need to be launched simultaneously
- Would size of the software cause problems with deployment?
- We would need a system where you restrict loading images to the batch farm to prevent collisions/problems
- There is probably a precedent for this, however, Matt has no experience deploying on this scale
- File size of ~1 GB is best, a few is manageable for production.
- IT dept supportive of docker@SLAC. There is 1 machine with RHEL7
- Lyon is a much larger computing center - likely they will upgrade to Docker first
Now full support for Docker at Lyon (Fred)
Joris : Lyon wants to use Singularity because they have security issues with UGE + Docker.
Infrastructure:
- Last purchase went into dev cluster
- many nodes @RHEL6, upgrade to RHEL7 and doing docker with this
- Still figuring out NFS/AFS sorted out with RHEL7. GPFS?
- It's good to come up with a plan because of security implications if NFS underneath.
- Use right docker (UID issues w/security)
- SLAC will give us a few nodes for testing docker. Fall back way to install on user machines. (Brian)
- AFS on RHEL6 docker
- read files if world readable.
- NFS is hardest.
- Timeline for RHEL7, 12mo? 2018? (Matt)
- RHEL7 support is dodgy.
- Configuration stuff is hard part
Use cases
- GlastRelease - frozen on RHEL6
- L1 processing, reprocessing in SLAC batch farm
- RHEL6 container on a RHEL7 host
- do FT1, FT2 files go to xrootd? (Warren)
- separate containers for L1? Maybe not an issue if we can preload batch nodes. We're guessing ~5 GB image.
- Simulations at Lyon, SLAC, GRID
- maybe the same as for SLAC - check with Samuel for details
- Developers & Users
- maybe separate versions for debug symbols and source for developers. Could be on-demand production of this version.
- Release Manager or manual builds
- L1 processing, reprocessing in SLAC batch farm
- Science Tools
- Caching big files (e.g. templates) in container image. Need a strategy with SCS for this for how containers are cached.
2. Software dependencies :
- GPL_TOOLS (staging and logging)
- REPRO common tools
- REPRO task scripts
- GlastRelease
- ScienceTools
- GLAST_EXT software (e.g., python, root)
- Ftools (KIPAC installation)
- ROOT skimmer
- FITS skimmer (possibly unnecessary?)
- evtClassDefs
- calibration and alignment files
- diffuse models
- xroot tools
- xroot /glast/Scratch space
- /scratch on local batch machines
- data catalog query (FT2 file and current version of FITS files)
- mySQL DB (calibration and alignment)
- Fermi astrotools (could probably eliminate)
3. Questions
- Joris : Is there some security issues with LSF & Docker (https://developer.ibm.com/storage/2017/01/09/running-ibm-spectrum-lsf-jobs-in-docker-containers/ )
- Joris : We need to verify the compatibility between Singularity ( Lyon CC ) and Docker
- etc.