...
No Format |
---|
ssh -F ssh_config USERNAME@detsim.fnal.gov |
Setup the Grid Tools
Setup the grid tools in a bash shell.
No Format |
---|
source /fnal/ups/grid/setup.sh
|
Setup the grid tools in tcsh or csh.
No Format |
---|
source /fnal/ups/grid/setup.csh
|
Session Certificate and quotas
...
No Format |
---|
rm -f slic_grid.csh cat > slic_grid.csh << +EOF #!/bin/csh echo start /bin/date cd \${_CONDOR_SCRATCH_DIR} setenv LABELRUN slic_grid-\${ClusterProcess} setenv TARFILE \${LABELRUN}-results.tar echo \${TARFILE} echo start /bin/date mkdir results /grid/app/ilc/sid/SimDist/v2r4p2/SimDist/scripts/slic.sh -r 5 \ -g /grid/app/ilc/detector/SimDist/detectors/sid01/sid01.lcdd \ -i /grid/data/ilc/detector/LDC/stdhep/ZZ_run10.stdhep -o ./results/ZZ_run10\${LABELRUN} >& \ ./results/ZZ_run10\${LABELRUN}.lis ls -lh results /bin/date echo "build output tarball: " \${TARFILE} tar -cf \${TARFILE} results echo done +EOF chmod +x slic_grid.csh rm -f slic_grid.run cat > slic_grid.run << +EOF universe = grid globusschedulerGridResource = gt2 fngp-osgfnpcosg1.fnal.gov/jobmanager-condor executable = ./slic_grid.csh transfer_output = true transfer_error = true transfer_executable = true environment = "ClusterProcess=\$(Cluster)-\$(Process)" transfer_output_files = slic_grid-\$(Cluster)-\$(Process)-results.tar log = slic_grid.log.\$(Cluster).\$(Process) notification = NEVER output = slic_grid.out.\$(Cluster).\$(Process) error = slic_grid.err.\$(Cluster).\$(Process) stream_output = false stream_error = false ShouldTransferFiles = YES WhenToTransferOutput = ON_EXIT globusrsl = (jobtype=single)(maxwalltime=999) queue +EOF condor_submit slic_grid.run |
...
No Format |
---|
condor_q -submitter <username> |
You can view information about all requests with the following command:
No Format |
---|
condor_status -submitters |
To cancel a job type condor_rm followed by the job number:
No Format |
---|
condor_rm <job number> |
Condor can put a job into held state when e.g. the proxy expires while the job is running. In that case the job still might be running fine on the worker node but even after successsful completion there will not be any log files etc. copied back. To remedy that situation renew the proxy and then release the jobs.
...