Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
Snapshot of my current (June, 2012) thinking on how Fermi might make use of VMs to freeze on a particular OS, presumably rhel6.


h2. Scope

What activities will take place inside VMs?  Here is a provisional list:
* Interactive code development, testing and debugging
* SCons release manager
* sys tests
* MC production
* Reprocessing

{bgcolor:#D0D0D0}
Depending on exactly when rhel6 becomes deprecated, the following might also have to run in VMs (but I hope not):
* L1 processing
* Online monitoring
* ASP
{bgcolor}

h2. Resources

What resources do the above activities require?   Which activities require which resources? The following table addresses these questions.


|| Activity\Resource || RM db (MySQL) || Calib db (MySQL) || Moot db (MySQL) || Calibrations (archive) || Moot configs (archive) || CVS || RM builds || Batch system ||
| *Code devel* | no | yes | yes | yes (or copy) | yes (or copy) | yes | yes | would be nice |
| *RM* | yes | yes (for running test programs) | yes | yes | yes | yes | yes | yes |
| *Sys tests* | no | yes | yes | yes | yes | no | yes | yes |
| *MC prod* | no | yes | yes | yes | yes | no | yes | yes |
| *Reprocessing* | no | yes | yes | yes | yes | no | yes | yes |

Note "batch system" should be interpreted loosely.  It might be the centrally-supported batch system (lsf or its descendent); it might be something specialized for use from VMs.

h2. Where

..are the various activities run?  Where do the resources reside? Assuming all the activities occur inside VMs there are at least three possibilities:
# Activity is available from an "appliance": pre-configured VM.  Must be used in a machine that can act as "host"; that is, has VirtualBox or similar software installed. For more about how this works, see [Tom's page on a ScienceTools appliance|https://confluence.slac.stanford.edu/x/kwpsBg].
# Activity is available in a VM which is normally already up and running (e.g., with SLAC-maintained machine as host)
# Activity is available in a transient VM: VM exists which has been configured to support the activity, but it may not be up and running. A "start VM" step is required before using it.

..are the resources such as MySQL, CVS located?  There are two plausible places:
# SLAC centrally-maintained server (where they are now)
# Stable VM running within a SLAC centrally-maintained host.

1. will most likely be preferable so that we won't have the burden of maintaining them and so that the databases may be easily read, e.g. by the server for SCons RM web pages.   It seems likely but not certain  that CVS and MySQL will continue to be supported by centrally-maintained machines by the end of Fermi's lifetime.  If not, we'll have to go with 2.  

File systems such as the calibration archive, moot archive and RM builds which VM-sequestered processes can write to will need some special handling.

h2. Where to draw the line

The BaBar design puts everything behind barbed wire, or at least a moat.  This can seem like overkill, but it does simplify some things.  We're aiming for a hybrid scheme in which we use standard centrally-supported resources as much as possible, but that means that every interaction which has to cross the line has to be examined very carefully. Among areas to consider 

h4. Interaction with batch

How will jobs to be run in VMs be specified?  Will they be submitted to the host or directly to the guest? If the latter, the VM nodes must always be up.  If the former, there must be a separate step to start up the guest and there has to be some synchronization to ensure the guest is ready to execute by the time it receives a request. (I have been unable to find a good way to do this with VirtualBox.  The only technique seems to be an initial guess at how long the boot will take, then try to execute, perhaps with retries and a sleep inbetween.)


h4. Security