Here's a bunch of random tricks I use, some SLAC-related, but many are general to ATLAS.
If you need a particular database release, for running over data typically, you can set the database release you want to use by adding to your cmthome/requirements file:
set DBRELEASE_OVERRIDE 7.1.1
I still get a "Word too long" message sometimes after setting up an ATLAS release. It seems to be from the PATH variable getting over a certain length that even bash can't handle. You can fix it with this, which turns all the /afs/slac.stanford.edu to just /afs/slac, which works just as well:
export PATH=`echo $PATH | sed s%.stanford.edu%%g`
To kill ALL your batch jobs at SLAC:
for j in `bjobs | cut -f 1 -d " "`; do bkill $j; echo $j; done
To run eclipse (see this page):
unset _JAVA_OPTIONS /afs/slac.stanford.edu/g/atlas/work/a/ahaas/eclipse/eclipse
There's a lot more space in /nfs/slac/g/atlas/u01/users:
mkdir /nfs/slac/g/atlas/u01/users/<username> cd; ln -s /nfs/slac/g/atlas/u01/users/<username> nfs2
Do this in a release, and then you can always just grep the packages.txt file to see where things are, or what versions are needed:
cmt show packages > packages.txt
Sometimes a digi job won't work (in 15.3.0?) because "chappy fails" on the input file. The problem can be fixed by adding the right python directory to your path:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/afs/slac/g/atlas/b/sw/lcg/external/Python/2.5.4/slc4_ia32_gcc34/lib
One of my favorites, this will do a "fast" build, if you've only changed a src file:
cd cmt; make QUICK=1; cd ..
So useful for joining together lots of ROOT files from many jobs into a single ROOT file:
hadd -h #show how to use hadd -f step.root */*step.root #for instance
Sometime when running over data it helps to put a link in your running directory:
mkdir sqlite200; ln -s /afs/cern.ch/user/a/atlcond/coolrep/sqlite200/COMP200.db sqlite200/ALLP200.db
Can run these on a POOL file to see what StoreGate keys are in there:
checkFile.py <file> checkSG.py <file>
Can put this in a bash script near the top, to check if you have a GRID cert:
voms-proxy-info if [ $? -eq 1 ] ; then echo You need to get a GRID cert; exit; fi
To get a list of filenames (to load into athena) from a given dataset:
dq2-ls -f -p -H $1 | sed "s%srm://osgserv04.slac.stanford.edu:8443/srm/v2/server?SFN=/xrootd/atlas/%filelist += [\"root://atl-xrdr//atlas/xrootd/%g" | sed "s%$%\"]%g" | grep xrootd
This gets a ROOT file with info on a given data run (mag field configuration, #events, streams, etc.):
#!/bin/bash #gets a ROOT file with info on a run (takes run number as argument) wget http://atlas-runquery.cern.ch/query.py?q=find+run+${1}+%2F+show+all+%2F+nodef sleep 3 wget http://atlas-runquery.cern.ch/data/atlrunquery.root rm -v query.py\?q\=find+run*
There's a few athena options (I like the -s and -c etc.):
athena -h #show athena help
Sometimes my JiveXML files get messed up and can't be read, due to a binary character in the trigger string. Fix it with:
#!/bin/bash for f in JiveXML*; do sed -i '/Obeys/d' $f ; done for f in JiveXML*; do sed -i 's/<trigInfoStreamTag>/<trigInfoStreamTag>fixJive/' $f ; done
If a script is expecting a particular ATLAS release version, you can check it with:
#!/bin/bash if [ $AtlasVersion != "15.3.1" ]; then echo "Go to a cmthome and do . setup.sh -tag=15.3.1"; exit; fi
Check out all the CSC transforms:
csc_<tab> #will show them all... look at csc_atlasG4_trf.py, csc_digi_trf.py, csc_reco_trf.py, etc...
This will actually put your files into the catalog, so you don't get annoying warnings:
pool_insertFileToCatalog <file>
When running on the batch farm, you really should write things out into the /scratch area on the batch node during the job, and then cp it all back at the end of the job, to prevent hammering on NFS. Here's an example script:
#!/bin/bash . /u/at/ahaas/cmthome/setup.sh -tag=15.3.0 #setup the ATLAS release export d=`date +%s`; echo $d #make a variable name for the directory which is the number of seconds since 1975 mkdir /scratch/ahaas; mkdir /scratch/ahaas/${d}; mkdir /scratch/ahaas/${d}/temp; cd /scratch/ahaas/${d}; pwd; athena.py -c "TIMESHIFT=0" -c "DECAY=False" /u/at/ahaas/reldirs/15.3.0/Generators/Pythia_i/share/jobOptions.pythiaRhad.py > temp/pyth.log.txt #all outputs of the athena job that are important should get put into the temp directory too... echo copying back results pwd; ls -lh temp mv -v /scratch/ahaas/${d}/temp /nfs/slac/g/atlas/u01/users/ahaas/temp/rh_production_stripped_files/temp_${d} cd; rm -rfv /scratch/ahaas/${d}; echo done
You could run this batch script above (put in a file called myjob.sh) with:
bsub -q xlong -R rhel40 -J myjobname time myjob.sh
The xlong queue will kill your job after 177.6 hours of CPU time in "SLAC units"... which is about ~15 hours of real CPU time.
See all queues with "bqueues". You can see the the details of a queue with "bqueues -l xlong".
Note the "-R rhel40" above, which forces your job onto a machine compatible with the ATLAS releases (gcc34, RHEL4).
"bhosts -R rhel40" will show you which batch nodes are in that list.
"lsinfo -r" will show you all resourse lists, like the rhel40 one.
Check your batch jobs with "bjobs".