Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added some links and made minor text changes.

...

The reprocessing is usually split in into three steps (expand for details):

Expand
titleStep 1 : Collect the files than that need to be reprocessed


I have a file called "preparelist.py" that helps querying the data catalog and save saves the list of runs in a an appropriate text file, which will be used in the next step. The file (/nfs/farm/g/glast/u38/Reprocess-tasks/P310-FT2/preparelist.py) looks like this:

Code Block
languagepy
#!/usr/bin/env python                                                                                                                                                                                   
import sys,os                                                                                                                                                                                           
def run(cmd,test=False):                                                                                                                                                                                
    '''                                                                                                                                                                                                 
    Simple interface to excecure a system call                                                                                                                                                          
    '''                                                                                                                                                                                                 
    print cmd                                                                                                                                                                                           
    if not test: os.system(cmd)                                                                                                                                                                         
    pass                                                                                                                                                                                                
                                                                                                                                                                                                        
def extractRunNumber(out_file_list,out_run_list):                                                                                                                                                       
    '''                                                                                                                                                                                                 
    Extract run number from the name of the file                                                                                                                                                        
    '''                                                                                                                                                                                                 
    runs=[]                                                                                                                                                                                             
    out_run_list_file=file(out_run_list,'w')                                                                                                                                                            
    for l in file(out_file_list,'r').readlines():                                                                                                                                                       
        run=l.split('_v')[0].split('_r0')[-1]                                                                                                                                                           
        runs.append(run)                                                                                                                                                                                
        out_run_list_file.write(run+'\n')                                                                                                                                                               
        pass                                                                                                                                                                                            
    return runs                                                                                                                                                                                         
                                                                                                                                                                                                        
#############################################################                                                                                                                                           
# REPROCESSED 310:                                                                                                                                                                                      
# MINIMUM RUN NUMEBR TO BE REPROCESSED                                                                                                                                                                  
RunMin='239557414'                                                                                                                                                                                      
# MAXIMUM RUN NUMEBR TO BE REPROCESSED                                                                                                                                                                  
RunMax='604845703'                                                                                                                                                                                      
RunMax='625881605' #2020-11-01 00:00:00                                                                                                                                                                 
#############################################################                                                                                                                                           
out_file_list = 'FileList_%(RunMin)s_%(RunMax)s.txt' % locals()                                                                                                                                         
out_run_list  = 'RunsList_%(RunMin)s_%(RunMax)s.txt' % locals()                                                                                                                                         
                                                                                                                                                                                                        
p310_file_list = 'P310_FileList_%(RunMin)s_%(RunMax)s.txt' % locals()                                                                                                                                   
p310_run_list  = 'P310_RunsList_%(RunMin)s_%(RunMax)s.txt' % locals()                                                                                                                                   
                                                                                                                                                                                                        
p310_remaining_run_list  = 'P310_Remaining_RunsList_%(RunMin)s_%(RunMax)s.txt' % locals()                                                                                                              \
#############################################################                                                                                                                                           
# This is the list of file in the datacatalog:                                                                                                                                                          
cmd="/afs/slac.stanford.edu/u/gl/glast/datacat/prod/datacat find --mode PROD --site SLAC_XROOT --group FT2 --filter 'RunMin >=%(RunMin)s && RunMin<=%(RunMax)s' --sort nRun --show-non-ok-locations /Da\
ta/Flight/Level1/LPA > %(out_file_list)s" % locals()                                                                                                                                                    
# --display 'RunMin' > $out_list                                                                                                                                                                        
run(cmd,test=False)                                                                                                                                                                                     
to_process=extractRunNumber(out_file_list,out_run_list)                                                                                                                                                 
print 'split -l25 ../%(out_run_list)s -a 3' % locals()                                                                                                                                                  
                                                                                                                                                                                                        
#############################################################                                                                                                                                           
# This is the list of files that are already reprocessed:                                                                                                                                               
cmd="/afs/slac.stanford.edu/u/gl/glast/datacat/prod/datacat find --mode PROD --site SLAC_XROOT --group FT2 --filter 'RunMin >=%(RunMin)s && RunMin<=%(RunMax)s' --sort nRun --s\                        
how-non-ok-locations /Data/Flight/Reprocess/P310 > %(p310_file_list)s" % locals()                                                                                                                       
run(cmd,test=False)                                                                                                                                                                                     
processed=extractRunNumber(p310_file_list,p310_run_list)                                                                                                                                                
#wc $out_list                                                                                                                                                                                           
                                                                                                                                                                                                        
remaining=[]                                                                                                                                                                                            
out_run_list_file=file(p310_remaining_run_list,'w')                                                                                                                                                    \
                                                                                                                                                                                                        
for x in to_process:                                                                                                                                                                                    
    if not x in processed:                                                                                                                                                                              
        if int(x)>240729801:# We skip the first two runs                                                                                                                                                
            remaining.append(x)                                                                                                                                                                         
            out_run_list_file.write(x+'\n')                                                                                                                                                             
        pass                                                                                                                                                                                            
    pass                                                                                                                                                                                                
out_run_list_file.close()                                                                                                                                                                               
print 'To procrss:%d, processed:%d, remaining: %d' % (len(to_process),len(processed),len(remaining))                                                                                                    
print 'split -l25 ../%(p310_remaining_run_list)s -a 3' % locals()   

It basically does two calls to the datacatalogdata catalog. The first to retrieve retrieves the list of run runs to reprocess, and the second to retrieve retrieves the list of run runs already reprocessed. Files are created to keep track of this these files, and the names of the files contain the minimum and the maximum run number.

...

Expand
titleStep 2: Prepare the files in "bunches", so that the reprocessing task (only one is running) will submit a bunch of jobs before entering in a sleep period. This is done to not kill the pipeline.

As the last print statement suggests, I split the list of files in into files containing 25 runs. First, I create 2 two directories, and I then cd in the todo one. For example:

Code Block
mkdir todo-2020-11/
mkdir done-2020-11/
cd todo-2020-11/

Then, the command I use is simply, for example:

Code Block
split -l25 ../P310_Remaining_RunsList_239557414_625881605 -a 3

This will create a series of files containing 25 run each.

...

Expand
titleStep 3: Submit the reprocessing task. This has to be a process than run that runs in the background. Usually, I open a terminal using a FASTX session, so I can go back to it any time, but I don't have to keep the connection active.

There is a simple file (submitter-prod-2020-11containing the sequence of bash command commands I submit:

Code Block
#!/bin/bash                                                                                                                                                                                             
                                                                                                                                                                                                        
delay=300                                                                                                                                                                                               
                                                                                                                                                                                                        
while true ; do                                                                                                                                                                                         
        rf=$(ls todo-2020-11/* | head -1)                                                                                                                                                               
        echo $rf                                                                                                                                                                                        
        for run in $(<$rf) ; do                                                                                                                                                                         
            /afs/slac.stanford.edu/u/gl/glast/pipeline-II/prod/pipeline -m PROD createStream --stream $run --define RUNID=r0$run P310-FT2                                                               
        done                                                                                                                                                                                            
        mv $rf done-2020-11/.                                                                                                                                                                           
        date                                                                                                                                                                                            
        sleep $delay                                                                                                                                                                                    
done                                                                                                                                                                                                    
    

Note that this has to be modified every time I create a backfill (todo-2020-11/ and done-2020-11/).

What this does is read one file in the todo-2020-11  directory, and submit N streams of the P310-FT2 task, each . Each task has the input run (RUNID) as an argument. In our case, N=25. Then it will move the input file in the done-2020-11 directory. Then it sleeps for 5 minutes.

...

Expand
titleStep 4: Applying BTI

This is very similar than the to reprocessing run. For a list of run containing BTI (this needs to be extracted manually, I suppose) we have a file (runs.  Check the BAD time periods page which lists runs with bad time intervals (BTIs).  Then, create a list (called BTI-list.txt here) with the runs in the reprocessing that contain BTIs.  As before, I split the file in into subfiles with 25 runs:

Code Block
mkdir todo-bti/
mkdir done-bti/
cd todo-bti/
split -l25 ../BTI-list.txt -a 3

and then I run the following script (submitter-prod-bti)

Code Block
#!/bin/bash                                                                                                                                                                                             
                                                                                                                                                                                                        
delay=600                                                                                                                                                                                               
                                                                                                                                                                                                        
while true ; do                                                                                                                                                                                         
        rf=$(ls todo-bti/* | head -1)                                                                                                                                                                   
        echo $rf                                                                                                                                                                                        
        for run in $(<$rf) ; do                                                                                                                                                                         
            /afs/slac.stanford.edu/u/gl/glast/pipeline-II/prod/pipeline -m PROD createStream --stream $run --define RUNID=r0$run flagFT2-P310                                                           
        done                                                                                                                                                                                            
        mv $rf done-bti/.                                                                                                                                                                               
        date                                                                                                                                                                                            
        rf=$(ls todo-bti/* | head -1)                                                                                                                                                                   
        echo $rf                                                                                                                                                                                        
        for run in $(<$rf) ; do                                                                                                                                                                         
            /afs/slac.stanford.edu/u/gl/glast/pipeline-II/prod/pipeline -m PROD createStream --stream $run --define RUNID=r0$run flagFT2-P310                                                           
        done                                                                                                                                                                                            
        mv $rf done-bti/.                                                                                                                                                                               
        date                                                                                                                                                                                            
                                                                                                                                                                                                        
        sleep $delay                                                                                                                                                                                    
done                                                                                                                                                                                                    
                                                                                                                                                                                                        
  

Which submit which submits the flagFT2-P310 task. Note that here the script submits 50 files and sleeps for 10 minutes.

...