Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
Snapshot of my current (June, 2012) thinking on how Fermi might make use of VMs to freeze on a particular OS, presumably rhel6.


h2. Scope

A provisional list of activities to take place inside VMs:

* Interactive code development, testing and debugging
* SCons release manager
* sys tests
* MC production
* Reprocessing

{bgcolor:#D0D0D0}
Depending on exactly when rhel6 becomes deprecated, the following might also have to run in VMs (but I hope not):
* L1 processing
* Online monitoring
* ASP
{bgcolor}

h2. Resources

 Which activities require which resources? Here I consider only the more restricted set of activities.  If online activities are added more resources would be involved as well.



|| Activity\Resource || RM db (MySQL) || Calib db (MySQL) || Moot db (MySQL) || Calibrations (archive) || Moot configs (archive) || CVS || RM builds || Batch system ||
| *Code devel* | no | yes | yes | yes (or copy) | yes (or copy) | yes | yes | would be nice |
| *RM* | yes | yes (for running test programs) | yes | yes | yes | yes | yes | yes |
| *Sys tests* | no | yes | yes | yes | yes | no | yes | yes |
| *MC prod* | no | yes | yes | yes | yes | no | yes | yes |
| *Reprocessing* | no | yes | yes | yes | yes | no | yes | yes |

Note "batch system" should be interpreted loosely.  It might be the centrally-supported batch system (lsf or its descendent); it might be something specialized for use from VMs.

h2. Where to draw the line

The BaBar design puts everything behind barbed wire, or at least a  moat.  This can seem like overkill, but it does simplify some things.   We're aiming for a hybrid scheme in which we use standard  centrally-supported resources as much as possible, but that means that  every interaction which has to cross the line has to be examined very  carefully. Among areas to consider

h4. Where do activities run?

Assuming all the activities occur inside VMs there are at least three possibilities:
# Activity is available from an "appliance": pre-configured VM.  Must be used in a machine that can act as "host"; that is, has VirtualBox or similar software installed. For more about how this works, see [Tom's page on a ScienceTools appliance|https://confluence.slac.stanford.edu/x/kwpsBg].
# Activity is available in a VM which is normally already up and running (e.g., with SLAC-maintained machine as host)
# Activity is available in a transient VM: VM exists which has been configured to support the activity, but it may not be up and running. A "start VM" step is required before using it.

h4. Where do resources reside?

There are two plausible places for MySQL and CVS servers

# SLAC centrally-maintained server (where they are now)
# Stable VM running within a SLAC centrally-maintained host.

1. will most likely be preferable so that we won't have the burden of maintaining them and so that the databases may be easily read, e.g. by the server for SCons RM web pages.   It seems likely but not certain  that CVS and MySQL will continue to be supported by centrally-maintained machines by the end of Fermi's lifetime.  If not, we'll have to go with 2.  

File collections including CVS archive, calibration archive, moot archive and RM builds which VM-sequestered processes can write to will need some special handling.


h4. Interaction with batch

How will jobs to be run in VMs be specified?  Will they be submitted to the host or directly to the guest? If the latter, the VM nodes must always be up.  If the former, there must be a separate step to start up the guest and there has to be some synchronization to ensure the guest is ready to execute by the time it receives a request. (I have been unable to find a good way to do this with VirtualBox.  The only technique seems to be an initial guess at how long the boot will take, then try to execute, perhaps with retries and a sleep in-between.)


h4. Security

Points to be considered include
* applications allowed to run in VMs (e.g., exclude web browsers, email programs, etc.?)
* network access allowed from VMs
* access to file systems from VMs (e.g., exclude SLAC home directories?)
* logins.  The VirtualBox API *ExecuteProcess* routine has required username and password arguments; I don't believe there is any other means of authentication. Accounts may have an empty password.

h2.Since To explore


h4. Remote logins

With VirtualBox if you're sitting in front of the host you can boot up a VM interactively and log into it via its display just as if it were a physical machinethe redhat 6 production phase (including, among other things, security patches) is now projected to last through November, 2020, isolation of VMs for security is no longer an urgent concern.  ItThere ismight alsostill possiblebe toa logneed infor viait programstoward likethe Remoteend Desktopof (Windows)Fermi oroffline theactivity, Linuxhowever, equivalent, rdesktop.  I got this to work some months ago but failed in more recent attempts.

Turned out I needed to re-install the Oracle extension pack which has server support for this.  I'm not sure why but it's possible I upgraded VirtualBox without also upgrading the extension pack, a no-no.


h4. MySQL, CVS access

Make sure VM can get to these SLAC resources.so the architecture chosen should allow for such isolation, even if it's not turned on initially.


h2. To explore


h4. Remote logins

With VirtualBox if you're sitting in front of the host you can boot up a VM interactively and log into it via its display just as if it were a physical machine.  It is also possible to log in via programs like Remote Desktop (Windows) or the Linux equivalent, rdesktop.  I got this to work some months ago but failed in more recent attempts.

(/) Turned out I needed to re-install the Oracle extension pack which has server support for this.  I'm not sure why but it's possible I upgraded VirtualBox without also upgrading the extension pack, a no-no.

In order to allow connection to more than one guest on the same server, guests within the same host should be configured to use distinct ports.



h4. MySQL, CVS access

Make sure VM can get to these SLAC resources.

(/) Without any special configuration, processes on VMs can access resources such as mysql databases or cvs archives across the net just as if they were running on the host.

h4. ssh

I was able to ssh _out_ of the VM just as if I were logged directly into the host. 

To ssh in to the guest, the VM configuration has to be adjusted by adding a port forwarding entry. This is easy to do from the VirtualBox gui but the end result is not ideal.  I can ssh in like this:

{code}
ssh user@hostnode -p alternate-port
{code}
but, depending on the ssh configuration for the node I'm coming from, I might get a complaint about host keys not matching.   ssh expects the host key to be the same when the host is the same, regardless of port number.  If the host key doesn't match what's in your known_hosts file it won't connect; you have to either edit out the old entry in known_hosts first.


h4. VirtualBox and beyond

Is VirtualBox the right product for all our needs?



h5. VirtualBox ease of use, features

VirtualBox supports all platforms of interest and appears to have the most comprehensive set of features, all accessible via the API.  But the documentation is incomplete and confusing, especially for use direct from C+\+ (rather than COM). The command-line program VBoxManage is much easier to use and exports essentially everything available from the API.     

h5. VirtualBox reliability

I've encountered some unpleasant behavior with VirtualBox. Once or twice I think it caused my laptop (host) to shut down when only the VM should have shut down.  Another time a VM booted "headless" got bogged down after being given a couple ExecuteProcess commands.  Execution slowed to a crawl; even shutting it down took a very long time.

h5. Alternatives

The most attractive general-purpose alternative is probably VMware. My impression is that it's a little behind VirtualBox in variety of platforms supported, especially newer OS versions, and the API might not be as complete, but in both of these areas VMware would probably be adequate for our needs.  It's certainly worth investigating if VirtualBox reliability is questionable.

There was a meeting in April about future lsf versions, including a [presentation about Platform LSF8|https://confluence.slac.stanford.edu/download/attachments/105710791/SLAC+-+Platform+LSF8+Feature+Overview.pptx] . One of its features known as "Platform Adaptive Cluster" (discussion starts on slide 78) involves use of VMs.  Conceivably this could handle our batch VM needs; at least we wouldn't have to worry about integration with lsf\!  But we would still need some other form of VM for interactive code development and debugging.

h2. Timeline

Should be implemented and checked out at least a year before end of Redhat 6  "Production 2" phase, since, at that point, the OS will not be updated to accommodate new hardware. [Current estimate|https://access.redhat.com/support/policy/updates/errata/] for end of Production 2 is Q2 of 2017, so target is aoubt 4 years from now. That seems like ample time to get the work done, especially since various pieces can be done in parallel - as long as initial decisions concerning tools and architecture are made in a timely fashion (and correctly\!)