Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
minLevel32

(Skip to the bottom of this page for a concise reminder of all Best Practices.)

...

Tip
titleStart faster

Please also see this page to learn how to get your jobs to start running sooner.

 

...

 

...

Known problems to avoid

PFILE (and other) Simultaneous File Writing Conflicts

...

Finally, note that all linux machines have a /tmp disk partition.  It is strongly recommended that /tmp NOT be used because of the danger of its becoming full which will cause the machine to crash.


Monitoring remote file servers

First, one must identify the server holding all of the job's needed input and future output files.

...

  • CPU utilization > 50%  (especially "System CPU")
  • NFS disk I/O > 30 MB/s
  • AFS disk I/O > 5-10 MB/s
  • xroot disk I/O >> 200 MB/s (wains only)

...

Anchor
Summary
Summary
Summary

  • Store analysis code and scripts in your AFS home directories (which are backed up)
  • Assessment.  For every new task, assess its impact on key servers to ensure they will not be overloaded
  • File staging.  Files that remain open for the duration of the job (either reading or writing) should be located in local scratch space.  Copy needed input files to local scratch at the beginning of your job; write output data products to their final destinations at the end of the job.
  • Submitting jobs.  
    • Never submit a large number (~>50) jobs without first assessing their impact on key shared resources.
    • If your jobs are known to produce a large I/O load only during the start-up phase, then submit jobs in small batches, wait for those to run and pass the start-up phase and only then submit another small batch, etc.
    • If you are planning a large batch operation of, say, more than 50 simultaneous jobs, please inform and coordinate with SAS management (Richard Dubois).
  • PFILES. Arrange that the parameter files for ScienceTools, FTools, etc. be stored in a directory unique to the batch job.
  • Core dumps.  Completely disable core dumps.
  • Cleanup. Be sure to perform a cleanup on the local scratch space after your jobs have completed!