Introduction

This document describes a proposed computer system for modelling multiple-particle dynamics in the Linac Coherent Light Source (LCLS). The LCLS is an electron particle accelerator and coherent x-ray laser being developed at the Stanford Linear Accelerator Center, to study phenomena in the 1.5 - 15 Ångstrom, 1-230 femtosecond realm.

The objective is to provide a production quality system which offers a number of suitable tracking codes in a unified environment, for both beam simulation and automated optimization.

Given the computational requirement of tracking many particles through the LCLS lattice, we will use a number of farms of multiprocessors to perform the computation. Additionally, the tracking codes will make use of compiled Multi-Processor Instructions, so that each processing step is carried out in parallel.

The LCLS beamline is sensibly broken-down into three "regions" for this modelling; the injector (which is dominated by Space Charge effects), the main e- linac (acceleration), and the x-ray beamline. Different tracking codes can be used for each region. For instance, Impact or Parmela will be used for the injector, Elgant for the linac, and Genesis for the x-ray beamline. The modeling framework will handle pipelining the results of one tracker into the inputs of the next, so the machine can be modelled from "end-to-end" in one run:

LCLS Modelling Regions

Functional Requirements and Design References

The end-to-end simulation project has the following overall requirements and design constraints.

  1. Easy problem statement and submission for processing. That is easy selection of run settings like time step size, number of particles, grid size, num of cpus etc).
  2. Good output results management, so users can easily find their results files and compare to measured profile image data, etc.
  3. Automated multi-processing (that we'll use the LSF system already at SLAC).
  4. Automated pipelining (that we'll use a variation of the GLAS Pipeline-II).
  5. Connect model definition decks to the extant control system values (by the "skeleton-deck" method we used for SLC/PEPII).
  6. The 4 modes of modelling we intend to support:
    1. Design. Modelling the nominal "design" machine, in which tracking based only on the literal values of variables in the deck
    2. Extant Machine. Tracking is based exclusively on values of variables derived from the extant accelerator. That is, all of those model parameters which can be acquired from the control system, such as the injector gun phase, will in fact be used in the simulation, so that such things as the predicted beam profile at a given camera can be compared to the real camera image. This will be done by the "skeleton deck" method first which has been successful in SLC and the B-factory accelerators previously built at SLAC.
    3. Test. Allowing a combination of Design and Extant Machine, which facilitates easier "what-if" tests. This is the mode we never supported in the SLC system that was always wanted. The problem is really a simple one of distinguishing the "gold" design, from these test models, the results of runs on such test models, and providing an intuitive way of combining input parameter sources.
    4. Optimization. An extension of the Test mode; this is a way of automatically varying ("scanning") some parameter in simulation, such as gun phase, or charge, to find its optimum value for some observable such as beam width. The computed optimum value can then be used in the real machine.

Computational and the Multiprocessing Pipeline

This section describes the basic inputs and outputs of each tracking code. These collectively define the inputs to a task in the pipeline (see #regions above).

Basic Computational Flow

This section describes the model data sources and sinks, how data flows from one tracker to another, and the kinds of tracking we intend.

Inputs

The inputs common to all trackers are

  1. The "deck" - defines the beamline and parameters for tracking to a tracking code. Nominally this is the "input file" of a tracking code. for this project, this input file will be generated at runtime from a combination of a "skeleton" file (which is an input file modified so that it includes the names of those control system device names which can be polled to acquire the present value of independent parameters)
  2. EM field files
  3. Control System, for modelling modes "Extant Machine", "Test" and "Optimization" above, the present values of the control system variables, whose names are defined by the skeleton deck, will be acquired and used in the model run
  4. Computational parameters. This is a file containing canned run configurations, such as the number of particles to track, the grid size, the step size, etc
  5. Particle definitions. The 6-d description of each particle may be specified by file, or by instructing the tracker to first generate a particle distribution. This is specified in the computational parameters
  6. Pipelined particle data. The inputs to "downstream" trackers may optionally be the outputs from the upstream tracker. The unix tool "sed" will be used to map the syntax of one output to the input of another. In fact Parmela is already able to output an Elegant input particle definition file
  7. Kind of tracking. We will include the option to switch easily between at least three multiplicities of particles; single particle - for simply converting t to z; some intermediate number like 5000, for speed, and a meaningful number for simulation, like 200,000.

Outputs

The primary outputs of the simulation are particle distributions at some number of locations along the beamline. These will stored in a well-defined directory structure, which will be made easy to peruse and in which the files can be easily compared to camera image data. The present camera image analysis application program will be adapted to allow a user to compare, side-by-side, real camera images to the computed particle distribution scatter-plot of the same location.

The predicted storage requirements of this project are now being established by a sizing rollup. Results will be added to this document, together with estimated costs.

The Computational Pipeline System

The process of feeding the outputs of one tracking run into the inputs of the next, in order to use suitable tracking codes for each region of the machine, is being called the "pipeline". Additionally the pipeline system takes care of such things as aborting the processing if one part of the tracking fails, monitoring of the progression of the processing, and error handling in general.

For this functionality we will use the existing Pipeline II system developed for the GLAST project at SLAC. In Pipeline-II, each task is defined by an XML file, which essentially contains the sequence of executable commands necessary for completion of the task, together with instructions about what to do should each sub-task succeed or fail. A major advantage of using Pipeline-II is that it already uses the LSF load balancing system that we plan to use for managing the batch processed high-performance compute farms that we will be using for tracking.

We additionally plan to extend the GLAST Pipeline system a little, so one can check tracking results (beam profile and phase images etc) directly from the Pipeline-II web interface.

Computational Requirements

Computational Roll-up

To estimate the computer processing requirements necessary for the end-to-end modeling project, we shall establish the "Big-O" upper-bound of the computation for each tracker, map that to elapsed time for processing each of the modes of processing as run on each processing farm (which may or may not be MPI capable) by a scaling factor which must be established experientially, and multiply by the number of simultaneous runs we expect to make.

For instance, the Big-O characterization for Impact appears to be O N^3 Log N, where N  is the number of grid points in each dimension, and is almost linear as the number of particles.  As an example, with the Orvolv farm, which is MPI capable,  for the 2D case run to DL1, if N=32 and with 200k particles, it usually takes around 50 minutes using 32 cpus.  We have not yet established how long we would like such a computation to take, and how many runs we will be making. When we do we'll make the roll-up necessary and estimate what additional resources will be necessary, in phases from now (commissioning) until production operation.

Storeage Roll-up

To estimate total storage resources necessary, we will establish the Big-O characterization for storage, and decide how many run instances there are likely to be for each of the modelling modes for which we shall store output files over the long term. This again, will be estimated for a number of phases from now until production operation.

Possible new architectures that may save costs

In phase 3 of this project we will look at possible replacements for the existing hardware configuration for high-performance computing used at SLAC. For instance use of Grid or GRAPE technologies. However, at first we will use the existing LSF facility at SLAC, with the existing MPI system now supported by our scientific computing support center. We are presently working with them to decide future technology needs for HPC.

Approach and Schedule

We propose to develop this project in three phases:

Phase 1: Concentrate on Ease-of-Use.

This phase concentrates on each individual tracking code. The objective is to making the following nominal tasks *easy*:

  • Problem definition (ie easy parameter variation)
  • Problem submission and monitoring (submit jobs to LSF, notification of success or failure)
  • Comparison of results to measured profiles etc
  • Parameter scanning and optimization
  • Connecting the models to the control system to get extant values.

Basically this phase is solved by a combination of scripting, filtering (for setting the desired parameters in a deck for a run, and for getting extant machine values), filesystem organization (so it's clear where input and output files are kept in a production environment). We will also provide some GUIs for setting up the tracked problem, scanning for optimization, and viewing the results. The image viewing application now in development will be modified to help a user compare camera images to beam profiles output from tracking.

Additionally we will check the computational options that were used in building the tracking executables, to make sure compiler switches were right so the codes being executed are properly optimized for MPI processing.

This phase will cover the period up to and including the LCLS injector commissioning.

Phase 2: Pipelining

This phase integrates two additional features:

  1. Bring the codes together as an end-to-end system, connecting outputs to inputs
  2. Use of the GLAST Pipeline management system to define and manage the tracking tasks.

This phase will cover the period summer and fall 2007.

Phase 3: Expanded High-Performance Computing (HPC) support.

In this phase we will expand the computational hardware support to bring in necessary processing power, based on the results of the previous two phases.

This phase will be undertaken from fall to December 2007.

  • No labels