Overview
...
Ideally,
...
job
...
management
...
would
...
be
...
a
...
system
...
that
...
lived
...
in
...
ASE
...
that
...
"abstracted
...
out"
...
the
...
concepts
...
of
...
determining
...
if
...
a
...
job
...
was:
...
- running
- completed
- exited with error
It would ideally also abstract out the directory structure that a user wanted for his/her jobs. This would probably be a significant amount of work, so here we just summarize what has been done, in the hope that other people can re-use and perhaps improve, over time.
Scripts described below for job management can be found in /afs/slac/g/suncat/share/scripts
...
.
...
Less
...
user-friendly
...
scripts
...
can
...
be
...
found
...
in
...
the
...
"
...
devel
...
"
...
subdirectory.
...
Anyone
...
in
...
the
...
g-suncat
...
afs
...
group
...
can
...
put
...
scripts
...
in
...
these
...
directories
...
(type
...
"pts
...
mem
...
g-suncat"
...
to
...
see
...
if
...
you're
...
in
...
the
...
list
...
...
...
if
...
not,
...
...
cpo).
...
If
...
you
...
don't
...
want
...
someone
...
to
...
modify
...
scripts
...
that
...
you
...
have
...
put
...
there,
...
remember
...
to
...
"chmod
...
-w"
...
the
...
file.
...
Andy
...
Peterson
...
Job
...
Management
...
System
...
Andy
...
has
...
two
...
reusable
...
scripts
...
that
...
are
...
in
...
the
...
above
...
directory:
...
"running"
...
and
...
"rundirs".
...
Change
...
the
...
username
...
to
...
yours
...
within
...
the
...
python
...
script.
...
He
...
has
...
some
...
help
...
enabled
...
(e.g.,
...
"running
...
-h").
...
Those
...
are
...
for
...
dealing
...
with
...
jobs
...
in
...
the
...
queue.
...
He
...
also
...
has
...
scripts
...
for
...
dealing
...
with
...
jobs
...
and
...
determining
...
if
...
they
...
are
...
done.
...
These
...
are
...
not
...
as
...
"clean"
...
– that
...
is,
...
they
...
won't
...
just
...
work
...
for
...
you
...
like
...
the
...
two
...
scripts
...
above
...
should,
...
so
...
they
...
are
...
in
...
the
...
"devel"
...
subdirectory.
...
The
...
one
...
he
...
uses
...
for
...
dealing
...
with
...
the
...
hundreds
...
of
...
alloys
...
he
...
screens
...
is
...
called
...
"checkdone.py".
...
He
...
organizes
...
the
...
output
...
with
...
"analyze".
...
Both
...
have
...
the
...
"-h"
...
option
...
enabled,
...
so
...
you
...
might
...
get
...
an
...
idea
...
of
...
how
...
he
...
approached
...
the
...
problem
...
but
...
they
...
probably
...
won't
...
work
...
directly
...
for
...
you.
...
More
...
Info
...
from
...
Andy
...
on
...
Job
...
Submission
...
The
...
basic
...
python
...
tool
...
I
...
use
...
in
...
submitting
...
large
...
numbers
...
of
...
jobs
...
is:
...
from
...
string
...
import
...
Template
...
With
...
that,
...
you
...
first
...
read
...
in
...
a
...
template
...
file
...
with
...
${keyword}
...
in
...
all
...
the
...
places
...
you
...
want
...
to
...
make
...
substitutions,
...
and
...
then
...
use
...
the
...
Template
...
class
...
to
...
make
...
the
...
substitutions.
...
The
...
${keyword}
...
can
...
potentially
...
be
...
as
...
simple
...
as
...
a
...
lattice
...
constant
...
or
...
as
...
complex
...
as
...
several
...
lines
...
of
...
(ASE)
...
code.
...
Within
...
the
...
directory
...
you
...
looked
...
at
...
(20110110_alloys),
...
take
...
a
...
look
...
at
...
the
...
script:
...
makescripts-OH.py
...
That
...
has
...
the
...
procedure
...
for
...
all
...
of
...
my
...
runs
...
with
...
OH
...
on
...
something
...
like
...
1400
...
surfaces.
...
This
...
script
...
is
...
obviously
...
a
...
bit
...
complicated.
...
I
...
copied
...
a
...
much
...
simpler
...
implementation
...
of
...
the
...
same
...
concept
...
into
...
the
...
below
...
directory.
...
Check
...
it
...
out
...
and
...
let
...
me
...
know
...
if
...
it
...
makes
...
sense
...
to
...
you:
Code Block |
---|
} /a/suncatfs1/u1/aap/temp/adam-example {code} h3. Some Utilities from AJ Medford Just in case anyone is interested I have written a few more simple commands for managing large amounts of jobs. They aren't that pretty, but are functional and I have found them useful. The commands can be found in my development folder |
Some Utilities from AJ Medford
Just in case anyone is interested I have written a few more simple commands for managing large amounts of jobs. They aren't that pretty, but are functional and I have found them useful. The commands can be found in my
development folder (/afs/slac.stanford.edu/g/suncat/vol3/scripts/devel/ajmedfor)
...
and
...
a
...
brief
...
summary
...
is:
...
parseErr
...
:
...
a
...
tremendously
...
simple
...
script
...
which
...
parses
...
error
...
output
...
from
...
jobs
...
and
...
does
...
not
...
allow
...
duplicate
...
lines.
...
This
...
makes
...
it
...
much
...
easier
...
to
...
read
...
error
...
files
...
output
...
by
...
multiple
...
cores.
...
The
...
arguement
...
should
...
be
...
an
...
error
...
file.
...
jobInfo
...
:
...
list
...
all
...
jobs
...
along
...
with
...
their
...
status,
...
run
...
time,
...
and
...
submission
...
directory.
...
If
...
no
...
argument
...
is
...
supplied
...
then
...
it
...
only
...
shows
...
the
...
name
...
of
...
the
...
submission
...
directory,
...
but
...
if
...
you
...
pass
...
any
...
argument
...
then
...
it
...
will
...
show
...
the
...
full
...
absolute
...
path
...
of
...
the
...
submission
...
directory.
...
This
...
is
...
significantly
...
slower
...
than
...
bjobs,
...
but
...
gives
...
a
...
lot
...
more
...
information.
...
You
...
can
...
probably
...
figure
...
out
...
how
...
to
...
hack
...
it
...
into
...
doing
...
your
...
bidding
...
if
...
you
...
want
...
something
...
more
...
detailed.
...
jobDir
...
:
...
this
...
command
...
takes
...
a
...
job's
...
ID
...
as
...
an
...
argument
...
and
...
will
...
return
...
the
...
directory
...
of
...
that
...
job.
...
If
...
you
...
couple
...
this
...
with
...
an
...
alias
...
like
...
:
...
alias
...
bcd
...
'cd
...
`jobDir
...
!*`'
...
then
...
you
...
can
...
automatically
...
change
...
to
...
a
...
job's
...
directory
...
by
...
typing
...
"bcd
...
jobID"
...
resub
...
:
...
takes
...
a
...
job
...
ID
...
as
...
the
...
first
...
argument.
...
If
...
nothing
...
else
...
is
...
supplied
...
it
...
moves
...
to
...
that
...
job's
...
directory
...
and
...
resubmits
...
the
...
job
...
using
...
the
...
gpaw-bsub
...
command
...
and
...
the
...
original
...
name
...
of
...
the
...
submission
...
script.
...
You
...
can
...
optionally
...
supply
...
a
...
different
...
submission
...
command
...
as
...
the
...
second
...
argument
...
(e.g.
...
'gpaw-ver-bsub
...
21')
...
and
...
you
...
can
...
optionally
...
supply
...
a
...
different
...
file
...
name
...
as
...
the
...
third
...
argument
...
(say
...
you
...
originally
...
submitted
...
'run_k441.py'
...
but
...
this
...
time
...
you
...
want
...
to
...
submit
...
using
...
'run_k881.py'
...
then
...
you
...
could
...
do:
...
resub
...
jobID
...
gpaw-bsub
...
run_k881.py)
...
massCommand
...
:
...
this
...
allows
...
you
...
to
...
issue
...
a
...
command
...
to
...
a
...
large
...
number
...
of
...
jobs.
...
The
...
first
...
argument
...
is
...
required,
...
and
...
is
...
the
...
command
...
to
...
issue.
...
If
...
no
...
other
...
argument
...
is
...
passed
...
it
...
issues
...
the
...
command
...
to
...
all
...
jobs.
...
The
...
command
...
should
...
take
...
a
...
job
...
ID
...
as
...
its
...
argument
...
(bkill,
...
btop,
...
bbot,
...
etc).
...
The
...
second
...
argument
...
is
...
a
...
"flag"
...
or
...
"filter
...
word".
...
If
...
it
...
is
...
supplied
...
without
...
any
...
additional
...
arguments
...
then
...
the
...
"filter
...
word"
...
must
...
be
...
contained
...
in
...
the
...
absolute
...
path
...
of
...
the
...
submission
...
directory,
...
or
...
else
...
the
...
command
...
will
...
not
...
be
...
issued
...
to
...
that
...
job.
...
The
...
third
...
optional
...
argument
...
can
...
be
...
specified
...
with
...
-f
...
and
...
gives
...
the
...
"field"
...
to
...
check
...
for
...
the
...
"filter
...
word"
...
in.
...
If
...
it
...
is
...
not
...
specified
...
then
...
it
...
defaults
...
to
...
filepath,
...
but
...
the
...
other
...
options
...
are
...
submissiondir
...
(the
...
directory
...
it
...
was
...
submitted
...
from),
...
filename
...
(the
...
name
...
of
...
the
...
submission
...
script),
...
command
...
(the
...
actual
...
submission
...
command
...
...
...
),
...
runtime
...
(the
...
time
...
in
...
hours
...
that
...
the
...
job
...
has
...
been
...
running)
...
,
...
id
...
(job
...
ID),
...
status
...
(RUN/PEND).
...
The
...
final
...
optional
...
argument
...
is
...
the
...
"condition".
...
This
...
is
...
the
...
operator
...
which
...
is
...
used
...
to
...
compare
...
the
...
"filter
...
word"
...
and
...
the
...
"field".
...
The
...
default
...
is
...
"in",
...
but
...
you
...
can
...
supply
...
anything
...
which
...
python
...
would
...
understand.
...
A
...
few
...
examples
...
(*
...
means
...
they
...
are
...
untested)....
Code Block |
---|
} massCommand bkill #kill all jobs massCommand bbot Ir #move all jobs with Ir in their absolute path to the bottom of the queue massCommand btop COOH -f filepath #move all jobs with 'COOH' in the filepath to the top of the queue massCommand bkill CH2 -f submissiondir -c 'not in' #kill all jobs withOUT 'CH2' in the submission directory massCommand resub 49 -f runtime -c '<' #resubmit all jobs which have been running for more than 49 hours (using the custom resub command) {code} h3. Other Thoughts * Heine has also said Jacapo has something related to job management. I took a quick look but didn't spot |
Other Thoughts
- Heine has also said Jacapo has something related to job management. I took a quick look but didn't spot it.