...
To a first approximation, the only change to the workflow to switch from one to the other is to change the jobsite. Since the jobsite can be set dynamically it is possible to have one workflow which submits jobs to either site (or even both sites).
Batch Options
To the extent possible the options supported by the PilotJobDaemon are the same as those supported by the SlurmJobDaemon. Details below:
Option | Alias | Default | Meaning | Comments |
---|---|---|---|---|
-N | --nodes | 1 | The number of nodes on which the job will run | Only for compatibility with SLURM. Option is ignored. |
-t | --time | 01:00:00 | The wallclock time allowed for the job | This is used for scheduling jobs in the pilot, but is not currently enforced by the pilot. |
-L | --license | none | The list of licenses required by the job separated by commas, e.g. -L SCRATCH | Accepted but not yet used. |
-C | --constraint | none | The list of constraints required by the job, separated by commas, e.g. -C haswell | Accepted but not yet used. |
-p | --partition | none | The partition in which the job will be run. | Only for compatibility with SLURM. Option is ignored. |
-c | --cpus-per-task | 1 | The number of cpus (threads) which will be allocated to this job. | This is used for scheduling jobs in the pilot, but is not currently enforced by the pilot. |
--ntasks-per-node | 1 | Only for compatibility with SLURM. Option is ignored. | ||
-J | --job_name | The name of the job. | Only for compatibility with SLURM. Option is ignored. |
In addition the memory and maxcpu can be specified as part of the workflow job definition (in XML). These are used for scheduling the job in the pilot, but are not currently enforced by the pilot.
Pilot Jobs
In the current implementation the pilot jobs are not submitted automatically, although this may change in future. Currently to submit the default pilot job simply login as user "desc" (separate instructions needed?) and run the following:
...
-c N : The total number of cores of share among all running jobs (default: 64)
-h VAL : The host from which this pilot will attempt to pull jobs (default: corigrid.nersc.gov)
-i N : The time after which the pilot will die if no work is provided (seconds) (default: 300)
-m N : The total memory (in kB) of this machine to share among all running jobs (default: 64000000)
-o : True if OK to overwrite existing files (default: false)
-p N : The port that the pilot will attempt to pull jobs from (default: 0)
-r N : The maximum runtime for the job (seconds) (default: 172800)
-s VAL : The service name of the pilot service (default: PilotJobProvider)
-u VAL : The user name under which the pilot service is running (default: desc)
Any number of pilot jobs can be submitted.