Before You Start

Please follow the instructions below before processing data with CryoSPARC on S3DF.

  1. Request a CryoSPARC license at https://cryosparc.com/download
  2. Contact a Computing Specialist to create a folder /sdf/group/cryoem/g/<project number>, where you will store the processed data (requires permission from S2C2 management).
  3. Register as a Cryo-EM facility user in S3DF by one of the following procedures:
    a) Submit a request to be added to the Cryo-EM facility via the S3DF Coact user portal
      i)   In a browser, navigate to the S3DF Coact user portal at https://coact.slac.stanford.edu
      ii)  Click the "Log in with S3DF (unix)" button, then enter your SLAC Unix credentials
      iii) Click on "Repos" in the menu bar, click the "Request Access to Facility" button, select "CryoEM" from the drop down, and press the "Request Facility Access" button
    b) Submit a ticket to the S3DF-Help queue in ServiceNow
      i) Send an email to mailto:s3df-help@slac.stanford.edu with the subject line "User access request to S3DF CryoEM facility" and include your SLAC unix username and proposal number(s).
  4. Register a new repo for your proposal number(s) or request membership in an existing one. You will need a proposal number for each proposal for which you will be processing data with SLAC Cryo-EM compute resources on S3DF. For instance, if you are a member of proposals CX00 and CX01, and will use S3DF compute resources for processing data in both proposals, each proposal needs its own corresponding S3DF repo for compute usage accounting. Once you have completed S3DF user registration for the CryoEM facility, log into https://coact.slac.stanford.edu and do one of the following:
    a) To request a new repo for your proposal(s)
      i)   Click on "Repos" in the menu bar
      ii)  Click the "Request New Repo" button
      iii) Enter the name of your repo (should match the proposal, e.g. "CA00"), select "CryoEM" from the drop down, and enter the username of the principal who will be utilizing the S3DF compute resources (your own SLAC unix username will suffice)
    b) To request membership in an existing repo
      i)  Click on "Repos" in the menu bar
      ii) Click the "Request Repo Membership" button
      iii) Enter the existing repo name (should match the proposal number) and select "CryoEM" from the drop down menu, then submit.
  5. Go to your project folder provisioned in step (2), make a subfolder to contain symlinks to your raw data, and another folder to store your CryoSPARC database.  

    For example, I created a folder (mkdir) called apoferritin and in the CT00 folder, I then created a folder called 20221121-temalpha under the apoferritin folder. In the 20221121-temalpha folder, I put my linked raw data from the exp drive. Another folder called cryosparc was created under the CT00 folder to store data generated from running CryoSPARC:

    # change to project folder
    $ cd /sdf/group/cryoem/g/CT00
    
    # create subfolders to store CryoSPARC data
    $ mkdir -p apoferritin/20221121-temalpha
    
    # change to newly-created CryoSPARC data directory
    $ cd apoferritin/20221121-temalpha
    
    # continue to step 6
  6. To create the symbolic links: Find the path of your experimental data directory, which should be in the following format: /sdf/group/cryoem/exp/YYYYMM/YYYYMMDD-PP###_<instrument>/<sample>/raw/<path_to_sample_directory>/... Type in the commands below: note that echo is used to preview the command first

    If the data is in tiff format:

    # ensure that you are in your project folder as in step (5) (e.g., /sdf/group/cryoem/g/<proposal_number>/...
    $ pwd
    
    
    # Check the output of the `find` command in your experimental data directory to get a list of all *fractions.tiff raw image files,
    # and create symbolic links to the data in your project folder
    
    # Do a dry-run of the command
    $ find [directory with data] -name '*fractions.tiff' -exec echo ln -sf \{\} . \;  
    # Execute the command
    $ find [directory with data] -name '*fractions.tiff' -exec ln -sf \{\} . \;


    If the data is in mrc format:

    # ensure that you are in your project folder as in step (5) (e.g., /sdf/group/cryoem/g/<proposal_number>/...
    $ pwd
    
    # Check the output of the `find` command in your experimental data directory to get a list of all *Fractions.mrc raw image files, 
    # and create symbolic links to the data in your project folder
    
    # Do a dry-run of the command
    $ find [directory with data] -name '*Fractions.mrc' -exec echo ln -sf \{\} . \;
    # Execute the command
    $ find [directory with data] -name '*Fractions.mrc' -exec ln -sf \{\} . \;

    For example, 

    # Do a dry-run of the command
    $ find /sdf/group/cryoem/exp/202101/20210105-CF01_TEM2/Apoferritin/raw/ApoTest/Images-Disc1/ -name '*fractions.tiff' -exec echo ln -sf \{\} . \;
    # Execute the command
    $ find /sdf/group/cryoem/exp/202101/20210105-CF01_TEM2/Apoferritin/raw/ApoTest/Images-Disc1/ -name '*fractions.tiff' -exec ln -sf \{\} . \;
  7. Same with the gain ref: 

    $ find [directory with gainref data] -name '*.mrc' -exec ln -sf \{\} . \;
  8. Now you are ready to process data with CryoSPARC!

Accessing CryoSPARC on S3DF

Note: Once your session has started, the batch scheduler will reserve your requested resources for the amount of hours requested, regardless of whether you are actively running jobs in your CryoSPARC session or not. To reduce idling of compute resources and promote fairness of resource allocation, it is advised to only request time for your session when you will be actively running jobs in CryoSPARC, rather than acquiring resources and idling. For example, avoid requesting 72 hour CryoSPARC sessions when you will only be running jobs for a few hours during that timeframe. We will be implementing an idle timeout for S3DF CryoSPARC sessions not running batch jobs for a certain length of time to enforce this policy.


  • Log in to S3DF Ondemand by clicking the "Log in with S3DF (unix)" button, then enter your SLAC Unix credentials.
  • Under "Interactive Apps", select "CryoSPARC".
  • Fill out the Ondemand session request form and click “Launch” at the bottom of the page.

    The CryoEM facility on S3DF has the following resources available as of this writing (2024-06-03):

    Partition (cluster) nameNodes availableCPU modelCPU Cores per nodeUsable Memory per nodeGPU modelGPUs per nodeLocal Scratch per node
    turing12Intel Xeon Gold 511840 (hyperthreaded)160GBNVIDIA GeForce 2080Ti

    10

    300GB
    ampere25Rome 7542112 (hyperthreaded)960GBTesla A100 (40GB)4

    14TB

    roma7Rome 7702120480GB--

    300GB


    By default, S2C2 and Chiu Lab-affiliated proposals are allowed the equivalent allocation of 1 node's compute resources maximum for each cluster type above, for all of the proposal's running jobs combined on that cluster. For example, a given proposal may request and be allocated up to the equivalent of 1 turing node, or 10GPU/40CPU/160GB, when requesting resources across all jobs run under that proposal on the turing  cluster. A user may request jobs to run in multiple clusters, but allocation allowed per proposal, per cluster will remain at the equivalent of 1 node.

    It is generally recommended to request a GPU(cores):CPU(cores):memory(GB) ratio of 1GPU:4CPU:16GB for most jobs. Please scale your resource requests appropriately according to the type of jobs being run.


    Here are some example session settings:

    Cryosparc license ID: <a valid CryoSPARC license>

    Cryosparc Version: XXXXXX (currently deployed: 4.4.1+240110)

    Account: cryoem:<your proposal number> (e.g. cryoem:CT00)

    Partition: turing (requests 2080Ti GPUs)

    Number of hours: XX (if running live then this should be the length of your session)

    Number of CPU cores: 8

    Mem: 32768MB or higher

    Number of GPUs: 2 (Don't request more than 4 GPUs)

    Advanced Settings
    Cryosparc Datadir: /sdf/group/cryoem/g/<your proposal number> (do not use your home directory folder)

  • Your session will go into the scheduling queue, it may take a while to start depending on current cluster usage and resource availability. Your session will only start when all of the following conditions are met:
    ALL of your resource requests for this CryoSPARC session are available
    - Your job has reached the top of the scheduling queue
    Once your session is ready, you will see a “Launch Cryosparc” button on the bottom left corner of your session (you can also check the "I would like to receive an email when the session starts" option in the interactive session request form to be notified when your session is available). Click "Launch Cryosparc" to start your session.

     5. If this is your first time launching CryoSPARC, you may need to log in with your SLAC email (e.g <your_slac_unix_username>@slac.stanford.edu) and the password is your license.

     6. In the web browser, enter the following in the address bar:

  • To open the running CryoSPARC instance, enter localhost:<CRYOSPARC_BASE_PORT>  in the address bar.
    The value of CRYOSPARC_BASE_PORT can be viewed by opening a terminal in your CryoSPARC interactive session desktop and running:

    # display the contents of cryosparc/config.sh and filter for the CRYOSPARC_BASE_PORT field (default value is 39100)
    $ cat cryosparc/config.sh | grep CRYOSPARC_BASE_PORT
    export CRYOSPARC_BASE_PORT=39100

    Then enter the value you found in the config.sh  script into the browser address bar, e.g. localhost:39100.

    Note that if other CryoSPARC users have sessions on the same host, your CRYOSPARC_BASE_PORT value may differ.

  • To access CryoSPARC Live, connect to your running CryoSPARC instance as above and , click on the CryoSPARC Live (⚡️) "All Live Sessions" icon on the navigation bar. For more details on using CryoSPARC live, see: https://guide.cryosparc.com/live/new-live-session-start-to-finish-guide

     7. Now you should be able to see the CryoSPARC interface and start data processing. Detailed tutorials of using CryoSPARC to process data can be found here: https://guide.cryosparc.com/processing-data/tutorials-and-case-studies 

Best Practices For Using CryoSPARC in S3DF

  1. DO request only the resources that you need. If you ask for more time or more CPUs or GPUs than you can actually use in your job, then it will take longer for your job to start, you will be reducing your fairshare so that your later jobs may be de-prioritised, and it prevents others from using your reserved idle resources (see note above).
  2. DO be respectful of other users’ jobs - you shall be sharing a limited set of nodes with many other users. Please consider the type, size and quantity of jobs that you submit so that you do not starve others of compute resources.
  3. DO NOT request more than one CryoSPARC instance at a time. Having multiple instances will likely corrupt your database. Delete any currently running CryoSPARC sessions by logging in to S3DF Ondemand with your SLAC unix credentials, clicking on "My Interactive Sessions", and deleting any "Running" Cryosparc Ondemand session cards.
  4. DO NOT store data in your home directory (e.g., /sdf/home/<username_initial>/<username/... ) which has a limited disk quota of 25GB. Use a proposal directory under group space (e.g., /sdf/group/cryoem/g/<proposal_number> ) for each proposal you will be processing. See step (2) in the "Before You Start" section above.
  5. DO NOT use proposal compute resource allocations on a different proposal. For example, if you are a member of proposals CX00 and CX01, do not process CX00 data under the cryoem:CX01 account or vice versa. Only process the data associated with the proposal specified in the CryoSPARC session's Account field.
  6. DO NOT copy your experimental data (raw images) into your proposal folder, create symlinks to the experimental data in your proposal folder as described in steps (6) and (7) above. This reduces storage quota usage on the filesystem.

FAQs

  1. How many GPU cores and memory do I need?

We recommend a maximum of 4 GPUs. Preprocessing can use 1 or more GPUs, scaling linearly in terms of throughput. Reconstruction uses one or more GPUs for 2D classification, and one GPU for 3D refinement.

The recommended amount of GPU memory required is at minimum 11GB per GPU to be able to process most types of data successfully in CryoSPARC.

Minimum GPU Memory Requirements

Preprocessing

Gatan K3, Gatan K3 Super Resolution, TFS Falcon 4 Images, TFS Falcon 4 EER Images: 11GB+

Reconstruction

2D Classification: 4GB+

3D Refinement (heavily dependent on box size): 11GB+

      2. 

     3. 

  • No labels