Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

Overview

...

Ideally,

...

job

...

management

...

would

...

be

...

a

...

system

...

that

...

lived

...

in

...

ASE

...

that

...

"abstracted

...

out"

...

the

...

concepts

...

of

...

determining

...

if

...

a

...

job

...

was:

...

  • running
  • completed
  • exited with error

It would ideally also abstract out the directory structure that a user wanted for his/her jobs. This would probably be a significant amount of work, so here we just summarize what has been done, in the hope that other people can re-use and perhaps improve, over time.

Scripts described below for job management can be found in /afs/slac/g/suncat/share/scripts

...

.

...

Less

...

user-friendly

...

scripts

...

can

...

be

...

found

...

in

...

the

...

"

...

devel

...

"

...

subdirectory.

...

Anyone

...

in

...

the

...

g-suncat

...

afs

...

group

...

can

...

put

...

scripts

...

in

...

these

...

directories

...

(type

...

"pts

...

mem

...

g-suncat"

...

to

...

see

...

if

...

you're

...

in

...

the

...

list

...

...

...

if

...

not,

...

email

...

cpo).

...

If

...

you

...

don't

...

want

...

someone

...

to

...

modify

...

scripts

...

that

...

you

...

have

...

put

...

there,

...

remember

...

to

...

"chmod

...

-w"

...

the

...

file.

...

Andy

...

Peterson

...

Job

...

Management

...

System

...

Andy

...

has

...

two

...

reusable

...

scripts

...

that

...

are

...

in

...

the

...

above

...

directory:

...

"running"

...

and

...

"rundirs".

...

Change

...

the

...

username

...

to

...

yours

...

within

...

the

...

python

...

script.

...

He

...

has

...

some

...

help

...

enabled

...

(e.g.,

...

"running

...

-h").

...

Those

...

are

...

for

...

dealing

...

with

...

jobs

...

in

...

the

...

queue.

...

He

...

also

...

has

...

scripts

...

for

...

dealing

...

with

...

jobs

...

and

...

determining

...

if

...

they

...

are

...

done.

...

These

...

are

...

not

...

as

...

"clean"

...

– that

...

is,

...

they

...

won't

...

just

...

work

...

for

...

you

...

like

...

the

...

two

...

scripts

...

above

...

should,

...

so

...

they

...

are

...

in

...

the

...

"devel"

...

subdirectory.

...

The

...

one

...

he

...

uses

...

for

...

dealing

...

with

...

the

...

hundreds

...

of

...

alloys

...

he

...

screens

...

is

...

called

...

"checkdone.py".

...

He

...

organizes

...

the

...

output

...

with

...

"analyze".

...

Both

...

have

...

the

...

"-h"

...

option

...

enabled,

...

so

...

you

...

might

...

get

...

an

...

idea

...

of

...

how

...

he

...

approached

...

the

...

problem

...

but

...

they

...

probably

...

won't

...

work

...

directly

...

for

...

you.

...

More

...

Info

...

from

...

Andy

...

on

...

Job

...

Submission

...

The

...

basic

...

python

...

tool

...

I

...

use

...

in

...

submitting

...

large

...

numbers

...

of

...

jobs

...

is:

...

from

...

string

...

import

...

Template

...

With

...

that,

...

you

...

first

...

read

...

in

...

a

...

template

...

file

...

with

...

${keyword}

...

in

...

all

...

the

...

places

...

you

...

want

...

to

...

make

...

substitutions,

...

and

...

then

...

use

...

the

...

Template

...

class

...

to

...

make

...

the

...

substitutions.

...

The

...

${keyword}

...

can

...

potentially

...

be

...

as

...

simple

...

as

...

a

...

lattice

...

constant

...

or

...

as

...

complex

...

as

...

several

...

lines

...

of

...

(ASE)

...

code.

...

Within

...

the

...

directory

...

you

...

looked

...

at

...

(20110110_alloys),

...

take

...

a

...

look

...

at

...

the

...

script:

...

makescripts-OH.py

...

That

...

has

...

the

...

procedure

...

for

...

all

...

of

...

my

...

runs

...

with

...

OH

...

on

...

something

...

like

...

1400

...

surfaces.

...

This

...

script

...

is

...

obviously

...

a

...

bit

...

complicated.

...

I

...

copied

...

a

...

much

...

simpler

...

implementation

...

of

...

the

...

same

...

concept

...

into

...

the

...

below

...

directory.

...

Check

...

it

...

out

...

and

...

let

...

me

...

know

...

if

...

it

...

makes

...

sense

...

to

...

you:

{
Code Block
}
/a/suncatfs1/u1/aap/temp/adam-example
{code}

h3. Some Utilities from AJ Medford

Just in case anyone is interested I have written a few more simple commands for managing large amounts of jobs. They aren't that pretty, but are functional and I have found them useful. The commands can be found in my
development folder 

Some Utilities from AJ Medford

Just in case anyone is interested I have written a few more simple commands for managing large amounts of jobs. They aren't that pretty, but are functional and I have found them useful. The commands can be found in my
development folder (/afs/slac.stanford.edu/g/suncat/vol3/scripts/devel/ajmedfor)

...

and

...

a

...

brief

...

summary

...

is:

...

parseErr

...

:

...

a

...

tremendously

...

simple

...

script

...

which

...

parses

...

error

...

output

...

from

...

jobs

...

and

...

does

...

not

...

allow

...

duplicate

...

lines.

...

This

...

makes

...

it

...

much

...

easier

...

to

...

read

...

error

...

files

...

output

...

by

...

multiple

...

cores.

...

The

...

arguement

...

should

...

be

...

an

...

error

...

file.

...

jobInfo

...

:

...

list

...

all

...

jobs

...

along

...

with

...

their

...

status,

...

run

...

time,

...

and

...

submission

...

directory.

...

If

...

no

...

argument

...

is

...

supplied

...

then

...

it

...

only

...

shows

...

the

...

name

...

of

...

the

...

submission

...

directory,

...

but

...

if

...

you

...

pass

...

any

...

argument

...

then

...

it

...

will

...

show

...

the

...

full

...

absolute

...

path

...

of

...

the

...

submission

...

directory.

...

This

...

is

...

significantly

...

slower

...

than

...

bjobs,

...

but

...

gives

...

a

...

lot

...

more

...

information.

...

You

...

can

...

probably

...

figure

...

out

...

how

...

to

...

hack

...

it

...

into

...

doing

...

your

...

bidding

...

if

...

you

...

want

...

something

...

more

...

detailed.

...

jobDir

...

:

...

this

...

command

...

takes

...

a

...

job's

...

ID

...

as

...

an

...

argument

...

and

...

will

...

return

...

the

...

directory

...

of

...

that

...

job.

...

If

...

you

...

couple

...

this

...

with

...

an

...

alias

...

like

...

:

...

alias

...

bcd

...

'cd

...

`jobDir

...

!*`'

...

then

...

you

...

can

...

automatically

...

change

...

to

...

a

...

job's

...

directory

...

by

...

typing

...

"bcd

...

jobID"

...

resub

...

:

...

takes

...

a

...

job

...

ID

...

as

...

the

...

first

...

argument.

...

If

...

nothing

...

else

...

is

...

supplied

...

it

...

moves

...

to

...

that

...

job's

...

directory

...

and

...

resubmits

...

the

...

job

...

using

...

the

...

gpaw-bsub

...

command

...

and

...

the

...

original

...

name

...

of

...

the

...

submission

...

script.

...

You

...

can

...

optionally

...

supply

...

a

...

different

...

submission

...

command

...

as

...

the

...

second

...

argument

...

(e.g.

...

'gpaw-ver-bsub

...

21')

...

and

...

you

...

can

...

optionally

...

supply

...

a

...

different

...

file

...

name

...

as

...

the

...

third

...

argument

...

(say

...

you

...

originally

...

submitted

...

'run_k441.py'

...

but

...

this

...

time

...

you

...

want

...

to

...

submit

...

using

...

'run_k881.py'

...

then

...

you

...

could

...

do:

...

resub

...

jobID

...

gpaw-bsub

...

run_k881.py)

...

massCommand

...

:

...

this

...

allows

...

you

...

to

...

issue

...

a

...

command

...

to

...

a

...

large

...

number

...

of

...

jobs.

...

The

...

first

...

argument

...

is

...

required,

...

and

...

is

...

the

...

command

...

to

...

issue.

...

If

...

no

...

other

...

argument

...

is

...

passed

...

it

...

issues

...

the

...

command

...

to

...

all

...

jobs.

...

The

...

command

...

should

...

take

...

a

...

job

...

ID

...

as

...

its

...

argument

...

(bkill,

...

btop,

...

bbot,

...

etc).

...

The

...

second

...

argument

...

is

...

a

...

"flag"

...

or

...

"filter

...

word".

...

If

...

it

...

is

...

supplied

...

without

...

any

...

additional

...

arguments

...

then

...

the

...

"filter

...

word"

...

must

...

be

...

contained

...

in

...

the

...

absolute

...

path

...

of

...

the

...

submission

...

directory,

...

or

...

else

...

the

...

command

...

will

...

not

...

be

...

issued

...

to

...

that

...

job.

...

The

...

third

...

optional

...

argument

...

can

...

be

...

specified

...

with

...

-f

...

and

...

gives

...

the

...

"field"

...

to

...

check

...

for

...

the

...

"filter

...

word"

...

in.

...

If

...

it

...

is

...

not

...

specified

...

then

...

it

...

defaults

...

to

...

filepath,

...

but

...

the

...

other

...

options

...

are

...

submissiondir

...

(the

...

directory

...

it

...

was

...

submitted

...

from),

...

filename

...

(the

...

name

...

of

...

the

...

submission

...

script),

...

command

...

(the

...

actual

...

submission

...

command

...

pam

...

-g...

...

),

...

runtime

...

(the

...

time

...

in

...

hours

...

that

...

the

...

job

...

has

...

been

...

running)

...

,

...

id

...

(job

...

ID),

...

status

...

(RUN/PEND).

...

The

...

final

...

optional

...

argument

...

is

...

the

...

"condition".

...

This

...

is

...

the

...

operator

...

which

...

is

...

used

...

to

...

compare

...

the

...

"filter

...

word"

...

and

...

the

...

"field".

...

The

...

default

...

is

...

"in",

...

but

...

you

...

can

...

supply

...

anything

...

which

...

python

...

would

...

understand.

...

A

...

few

...

examples

...

(*

...

means

...

they

...

are

...

untested)....

{
Code Block
}
massCommand bkill #kill all jobs
massCommand bbot Ir #move all jobs with Ir in their absolute path to the bottom of the queue
massCommand btop COOH -f filepath #move all jobs with 'COOH' in the filepath to the top of the queue
massCommand bkill CH2 -f submissiondir -c 'not in' #kill all jobs withOUT 'CH2' in the submission directory
massCommand resub 49 -f runtime -c '<' #resubmit all jobs which have been running for more than 49 hours (using the custom resub command)
{code}

h3. Other Thoughts

* Heine has also said Jacapo has something related to job management.  I took a quick look but didn't spot 

Other Thoughts

  • Heine has also said Jacapo has something related to job management. I took a quick look but didn't spot it.