...
To the extent possible the options supported by the PilotJobDaemon are the same as those supported by the SlurmJobDaemon. Details below:
Workflow XML | Option | Alias | Default | Meaning | Comments | |
---|---|---|---|---|---|---|
maxCPU | 1 hour | Max cpu used by the job (in seconds). | This is used for scheduling the job but is not currently enforced by the pilot. | |||
maxMemory | 1GB | Max memory used by the job (in kB). | This is used for scheduling the job but is not currently enforced by the pilot. | |||
batchOptions | -N | --nodes | 1 | The number of nodes on which the job will runOnly for compatibility with SLURM. Option is ignored. | ||
batchOptions | -t | --time | 01:00:00 | The wallclock time allowed for the job | This is used for scheduling jobs in the pilot, but is not currently enforced by the pilot. | |
batchOptions | -L | --license | none | The list of licenses required by the job separated by commas, e.g. -L SCRATCH | PilotJobs will only accept jobs if all licenses are available in the pilot job. | |
batchOptions Accepted but not yet used. | -C | --constraint | none | The list of constraints required by the job, separated by commas, e.g. -C haswell | Accepted but not yet used. | PilotJobs will only accept job if all constraints are satisfied by the pilot job. |
batchOptions | -p | --partition | none | The partition in which the job will be run. | Allows PilotJob to selectively run jobs submitted only far a particular partition. Parition names can be assigned by the user. | |
batchOptions Only for compatibility with SLURM. Option is ignored. | -c | --cpus-per-task | 1 | The number of cpus (threads) which will be allocated to this job. | This is used for scheduling jobs in the pilot, but is not currently enforced by the pilot. | |
batchOptions | --ntasks-per-node | 1 | Only for compatibility with SLURM. Option is ignored. | |||
batchOptions | -J | --job_name | The name of the job. | Only for compatibility with SLURM. Option is ignored. |
In addition the memory and maxcpu can be specified as part of the workflow job definition (in XML). These are used for scheduling the job in the pilot, but are not currently enforced by the pilot.
Pilot Jobs
In the current implementation the pilot jobs are not submitted automatically, although this may change in future. Currently to submit the default pilot job simply login as user "desc" (separate instructions needed?) and run the following:
...
The options supported by the JobControlPilot are:
-C (--constraint) VAL : Constraints satisfied by this pilot
-L (--license) VAL : Licenses provided by this pilot
-
...
P N : The port that the pilot will attempt to pull jobs from (default: 0)
-c N : The total number of cores of share among all running jobs (default: 32)
-h
...
VAL
...
:
...
The
...
host
...
from
...
which
...
this
...
pilot
...
will
...
attempt
...
to
...
pull
...
jobs
...
(default:
...
...
N
...
:
...
The
...
time
...
after
...
which
...
the
...
pilot
...
will
...
die
...
if
...
no
...
work
...
is
...
provided
...
(seconds)
...
(default:
...
300)
-m
...
N
...
:
...
The
...
total
...
memory
...
(in
...
kB)
...
of
...
this
...
machine
...
to
...
share
...
among
...
all
...
running
...
jobs
...
(default:
...
64000000)
-o
...
:
...
True
...
if
...
OK
...
to
...
overwrite
...
existing
...
files
...
(default:
...
false)
-p
...
(--partition) VAL : If specified, only jobs requesting this partition will by run by this pilot
-r N : The maximum runtime for the job (seconds) (default: 172800)
-s
...
VAL
...
:
...
The
...
service
...
name
...
of
...
the
...
pilot
...
service
...
(default:
...
PilotJobProvider)
-u
...
VAL
...
:
...
The
...
user
...
name
...
under
...
which
...
the
...
pilot
...
service
...
is
...
running
...
(default:
...
desc)
Any number of pilot jobs can be submitted.
Limitations and future plans
- Currently while jobs are running in the JobControlPilot, the memory and cpu time used will always be reported as zero, although when the job completes the CPU time used will be reported normally. This will be fixed soon.
- Currently if the PilotJobDaemon is stopped all information about running jobs will be lost. This will be fixed soon.
- There is currently no support for checkpointing jobs running in the JobControlPilot, although plans are in place to develop such a feature in future and most of the infrastructure required is already in place.