Computing

Policies

LCLS users are responsible for complying with the data management and curation policies of their home institutions and funding agents and authorities. To enhance the scientific productivity of the LCLS user community, LCLS supplies on-site disk, tape and compute resources for prompt analysis of LCLS data, and software to access those resources consistent with the published data retention policy. Compute resources are preferentially allocated to recent and running experiments.

Getting an Account

You will need a valid SLAC UNIX account in order to use the LCLS computing system. The instructions for getting a SLAC UNIX account are here.

Getting Access to the System

You can get into the LCLS photon computing system by ssh'ing to:

psexport.slac.stanford.edu

From psexport you can then reach the analysis nodes (see below). You can also move data files in and out of the system.

Each control room has a number of nodes for local login. These nodes have access to the Internet and are named psusr<id>.

The controls and DAQ nodes used for operating an instrument work in kiosk mode so you don't need a personal account to run an experiment from the control room. Remote access to these nodes is not allowed for normal users.

Data Management

LCLS provides space for all your experiment's data at no cost for you. This includes the raw data from the detectors as well as the data derived from your analysis. Your raw data are available as XTC files or, on demand, as HDF5 files. The path to the experimental data is:

/reg/d/psdm/<instrument>/<experiment>

The LCLS data policy is described here. The tools for managing files are described here.

Data Export

You can use the psexport nodes for copying your data. The recommended tools for exporting the data offsite are bbcp and Globus Online. The former, bbcp, is slightly simpler to setup. On the other hand Globus Online is more efficient when transferring large amount of data because it babysits the overall process by, for example, automatically restarting a failed or stalled transfer. The performance of the two tools is very similar.

All control rooms and the overflow room in FEH have one or more taps on the Visitor Data Network. These taps can be used to transfer data to a laptop or a storage device. These devices will be automatically assigned an IP address through DHCP as soon as they are connected to the network tap.

There is a web interface to the experimental data accessible via

https://pswww.slac.stanford.edu/apps/explorer

The web interface also allows you to generate file lists that can be fed to the tool you use to export the data from SLAC to your home institution.

Running the Analysis

The analysis framework is documented in the Data Analysis page. This section describes the nodes which are available for running the analysis.

Interactive Pools

In order to get access to the interactive nodes, connect to psananeh if your experiment is in the Near Experimental Hall (NEH) or psanafeh for the Far Experimental Hall (a load-balancing mechanism will connect you to the least loaded of the nodes in the pool):

ssh psananeh
ssh psanafeh

Each pool is currently made of six servers with the following general specifications:

8-cores, Opteron 2384, 16GB, diskless, 10Gb/s

Each node in the interactive pools has one single user Matlab license. You can find which nodes in the pool have a Matlab license available by running the following command on any of the psana nodes:

/reg/common/package/scripts/matlic

The current Matlab version is 2012a:

/reg/common/package/matlab/r2012a/bin/matlab

Batch Farms

There are a number of batch farms (i.e. collections of compute nodes) located in the NEH and FEH. Depending on your data access you may need to submit jobs to a specific farm. This is accomplished by submitting to the appropriate LSF batch queue. Refer to the table below. Multi-core OpenMPI jobs should be run in either the psnehmpiq or psfehmpiq batch queue, see the following section on "Submitting OpenMPI Batch Jobs". Simulation jobs should be submitted to the low priority queues psfehidle and psfehidle.

Experimental Hall	Queue	Nodes	Data	Comments
NEH	psnehq	psana11xx,psana12xx	ana01, ana02	Jobs <= 6 cores
	psnehmpiq	psana11xx,psana12xx	ana01, ana02	OpenMPI jobs > 6 cores, preemptable
	psnehidle	psana11xx,psana12xx		Simulations, preemptable, low priority
FEH	psfehq	psana13xx,psana14xx	ana11, ana12	Jobs <= 6 cores
	psfehmpiq	psana13xx,psana14xx	ana11, ana12	OpenMPI jobs > 6 cores, preemptable
	psfehidle	psana13xx,psana14xx		Simulations, preemptable, low priority

The batch farms listed above consist of eighty nodes with the following general specifications:

12 cores, Xeon X5675, 24GB memory, 500GB disk, QDR IB

LSF Overview

LSF (Load Sharing Facility) is a job scheduler provided by Platform Computing. It is used at SLAC to execute user batch jobs on the various batch farms. A short list of useful LSF status commands follows (see next sections for submitting jobs):

Report status of all jobs (running, pending, finished, etc) submitted by the current user:

bjobs -w -a

Report only running or pending jobs submitted by user "radmer":

bjobs -w -u radmer

Report running or pending jobs for all users in the psnehq queue:

bjobs -w -u all -q psnehq

Report current node usage on the two NEH batch farms:

bhosts -w ps11farm ps12farm

The following links give more detailed LSF usage information:

PowerPoint presentation describing LSF for LCLS users at SLAC

Batch system in a nutshell

Overview of LSF at SLAC

Submitting Batch Jobs

Instructions describing how to submit OpenMPI (parallel) jobs can be found on the Submitting OpenMPI Batch Job web page. For normal batch submissions, see the Submitting Batch Jobs page.

Child pages