Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To the extent possible the options supported by the PilotJobDaemon are the same as those supported by the SlurmJobDaemon. Details below:

Workflow XMLOptionAliasDefaultMeaningComments

maxCPU

  1 hourMax cpu used by the job (in seconds).This is used for scheduling the job but is not currently enforced by the pilot.

maxMemory

  1GBMax memory used by the job (in kB).This is used for scheduling the job but is not currently enforced by the pilot.

batchOptions

-N--nodes1The number of nodes on which the job will runOnly for compatibility with SLURM. Option is ignored. 

batchOptions

-t--time01:00:00The wallclock time allowed for the jobThis is used for scheduling jobs in the pilot, but is not currently enforced by the pilot.

batchOptions

-L--licensenoneThe list of licenses required by the job separated by commas, e.g. -L SCRATCHPilotJobs will only accept jobs if all licenses are available in the pilot job.

batchOptions

Accepted but not yet used.

-C--constraintnoneThe list of constraints required by the job, separated by commas, e.g. -C haswellPilotJobs will only accept job if all constraints are satisfied by the pilot job.

batchOptions

Accepted but not yet used.

-p--partitionnoneThe partition in which the job will be run.Allows PilotJob to selectively run jobs submitted only far a particular partition. Parition names can be assigned by the user.

batchOptions

Only for compatibility with SLURM. Option is ignored.

-c--cpus-per-task1The number of cpus (threads) which will be allocated to this job.This is used for scheduling jobs in the pilot, but is not currently enforced by the pilot.

batchOptions

 --ntasks-per-node1 Only for compatibility with SLURM. Option is ignored.

batchOptions

-J--job_name The name of the job.Only for compatibility with SLURM. Option is ignored.

...

.

Pilot Jobs

In the current implementation the pilot jobs are not submitted automatically, although this may change in future. Currently to submit the default pilot job simply login as user "desc" (separate instructions needed?) and run the following:

...

The options supported by the JobControlPilot are:

-C (--constraint) VAL : Constraints satisfied by this pilot
-L (--license) VAL : Licenses provided by this pilot
-

...

P N : The port that the pilot will attempt to pull jobs from (default: 0)
-c N : The total number of cores of share among all running jobs (default: 32)
-h

...

VAL

...

:

...

The

...

host

...

from

...

which

...

this

...

pilot

...

will

...

attempt

...

to

...

pull

...

jobs

...

(default:

...

corigrid.nersc.gov)
-i

...

N

...

:

...

The

...

time

...

after

...

which

...

the

...

pilot

...

will

...

die

...

if

...

no

...

work

...

is

...

provided

...

(seconds)

...

(default:

...

300)
-m

...

N

...

:

...

The

...

total

...

memory

...

(in

...

kB)

...

of

...

this

...

machine

...

to

...

share

...

among

...

all

...

running

...

jobs

...

(default:

...

64000000)
-o

...

:

...

True

...

if

...

OK

...

to

...

overwrite

...

existing

...

files

...

(default:

...

false)
-p

...

(--partition) VAL : If specified, only jobs requesting this partition will by run by this pilot
-r N : The maximum runtime for the job (seconds) (default: 172800)
-s

...

VAL

...

:

...

The

...

service

...

name

...

of

...

the

...

pilot

...

service

...

(default:

...

PilotJobProvider)
-u

...

VAL

...

:

...

The

...

user

...

name

...

under

...

which

...

the

...

pilot

...

service

...

is

...

running

...

(default:

...

desc)

Any number of pilot jobs can be submitted.

...

  • Currently while jobs are running in the JobControlPilot, the memory and cpu time used will always be reported as zero, although when the job completes the CPU time used will be reported normally. This will be fixed soon.
  • Currently if the PilotJobDaemon is stopped all information about running jobs will be lost. This will be fixed soon.These is no command line for killing running jobs, although jobs can be killed via the workflow engine (perhaps).
  • There is currently no support for checkpointing jobs running in the JobControlPilot, although plans are in place to develop such a feature in future and most of the infrastructure required is already in place.

...