...
Code Block |
---|
conda install chardet conda install --force-reinstall -c conda-forge charset-normalizer=3.2.0 |
Interactive testing
It is suggested not to use interactive nodes to do training, but instead to open a terminal on an SDF node by:
...
Code Block |
---|
salt fit -c configs/<my_config>.yaml --data.num_jets_train <small_number> --data.num_workers <num_workers> |
For training on slurm
See the SALT on SDF documentation (also linked at the top of this page) and example configs in the SALT fork in the slac_bjr GitLab project.
For additional clarity, see the following description of the submission scripts. Change the submit_slurm.sh script as follows
...
You can use standard sbatch commands from SDF documentation to understand the state of your job.
Comet Training Visualization
In your comet profile, you should start seeing the live update for the training which looks as follows. The project name you have specified in the submit script appears under your
workspace which you can click to get the graphs of live training updates.
Training Evaluation
Follow salt documentation to run the evaluation of the trained model in the test dataset. There is a separate batch submission script used for the model evaluation, but is very similar to what is used in the model training batch script.
The main difference is in the salt command that is run (see below). It will produce a log in the same directory as the other log files, and will produce a new output h5 file alongside the one you pass in for evaluation.
...