ATLAS computing tricks

Here's a bunch of random tricks I use, some SLAC-related, but many are general to ATLAS.

If you need a particular database release, for running over data typically, you can set the database release you want to use by adding to your cmthome/requirements file:

 set DBRELEASE_OVERRIDE 7.1.1

I still get a "Word too long" message sometimes after setting up an ATLAS release. It seems to be from the PATH variable getting over a certain length that even bash can't handle. You can fix it with this, which turns all the /afs/slac.stanford.edu to just /afs/slac, which works just as well:

export PATH=`echo $PATH | sed s%.stanford.edu%%g`

To kill ALL your batch jobs at SLAC:

for j in `bjobs | cut -f 1 -d " "`; do bkill $j; echo $j; done

To run eclipse (see this page):

unset _JAVA_OPTIONS
/afs/slac.stanford.edu/g/atlas/work/a/ahaas/eclipse/eclipse

There's a lot more space in /nfs/slac/g/atlas/u01/users:

mkdir /nfs/slac/g/atlas/u01/users/<username>
cd; ln -s /nfs/slac/g/atlas/u01/users/<username> nfs2

Do this in a release, and then you can always just grep the packages.txt file to see where things are, or what versions are needed:

cmt show packages > packages.txt

Sometimes a digi job won't work (in 15.3.0?) because "chappy fails" on the input file. The problem can be fixed by adding the right python directory to your path:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/afs/slac/g/atlas/b/sw/lcg/external/Python/2.5.4/slc4_ia32_gcc34/lib

One of my favorites, this will do a "fast" build, if you've only changed a src file:

cd cmt; make QUICK=1; cd ..

So useful for joining together lots of ROOT files from many jobs into a single ROOT file:

hadd -h #show how to use
hadd -f step.root */*step.root #for instance

Sometime when running over data it helps to put a link in your running directory:

mkdir sqlite200; ln -s /afs/cern.ch/user/a/atlcond/coolrep/sqlite200/COMP200.db sqlite200/ALLP200.db

Can run these on a POOL file to see what StoreGate keys are in there:

checkFile.py <file>
checkSG.py <file>

Can put this in a bash script near the top, to check if you have a GRID cert:

voms-proxy-info
if [ $? -eq 1 ] ; then echo You need to get a GRID cert; exit; fi

To get a list of filenames (to load into athena) from a given dataset:

dq2-ls -f -p -H $1 | sed "s%srm://osgserv04.slac.stanford.edu:8443/srm/v2/server?SFN=/xrootd/atlas/%filelist += [\"root://atl-xrdr//atlas/xrootd/%g" | sed "s%$%\"]%g" | grep xrootd

This gets a ROOT file with info on a given data run (mag field configuration, #events, streams, etc.):

#!/bin/bash
#gets a ROOT file with info on a run (takes run number as argument)
wget http://atlas-runquery.cern.ch/query.py?q=find+run+${1}+%2F+show+all+%2F+nodef
sleep 3
wget http://atlas-runquery.cern.ch/data/atlrunquery.root
rm -v query.py\?q\=find+run*

There's a few athena options (I like the -s and -c etc.):

athena -h #show athena help

Sometimes my JiveXML files get messed up and can't be read, due to a binary character in the trigger string. Fix it with:

#!/bin/bash
for f in JiveXML*; do sed -i '/Obeys/d' $f ; done
for f in JiveXML*; do sed -i 's/<trigInfoStreamTag>/<trigInfoStreamTag>fixJive/' $f ; done

If a script is expecting a particular ATLAS release version, you can check it with:

#!/bin/bash
if [ $AtlasVersion != "15.3.1" ]; then echo "Go to a cmthome and do . setup.sh -tag=15.3.1"; exit; fi

Check out all the CSC transforms:

csc_<tab> #will show them all... look at csc_atlasG4_trf.py, csc_digi_trf.py, csc_reco_trf.py, etc...

This will actually put your files into the catalog, so you don't get annoying warnings:

pool_insertFileToCatalog <file>

When running on the batch farm, you really should write things out into the /scratch area on the batch node during the job, and then cp it all back at the end of the job, to prevent hammering on NFS. Here's an example script:

#!/bin/bash

. /u/at/ahaas/cmthome/setup.sh -tag=15.3.0 #setup the ATLAS release

export d=`date +%s`; echo $d #make a variable name for the directory which is the number of seconds since 1975
mkdir /scratch/ahaas; mkdir /scratch/ahaas/${d}; mkdir /scratch/ahaas/${d}/temp; cd /scratch/ahaas/${d}; pwd;

athena.py -c "TIMESHIFT=0" -c "DECAY=False" /u/at/ahaas/reldirs/15.3.0/Generators/Pythia_i/share/jobOptions.pythiaRhad.py >  temp/pyth.log.txt
#all outputs of the athena job that are important should get put into the temp directory too...

echo copying back results
pwd; ls -lh temp
mv -v /scratch/ahaas/${d}/temp /nfs/slac/g/atlas/u01/users/ahaas/temp/rh_production_stripped_files/temp_${d}
cd; rm -rfv /scratch/ahaas/${d}; echo done

You could run this batch script above (put in a file called myjob.sh) with:

bsub -q xlong -R rhel40 -J myjobname time myjob.sh

The xlong queue will kill your job after 177.6 hours of CPU time in "SLAC units"... which is about ~15 hours of real CPU time.
See all queues with "bqueues". You can see the the details of a queue with "bqueues -l xlong".
Note the "-R rhel40" above, which forces your job onto a machine compatible with the ATLAS releases (gcc34, RHEL4).
"bhosts -R rhel40" will show you which batch nodes are in that list.
"lsinfo -r" will show you all resourse lists, like the rhel40 one.
Check your batch jobs with "bjobs".

Space shortcuts

Child pages