Profiling one of my programs, I found some surprising things in gtobssim.

I am using ScienceTools-LATEST-1-4066, build x86_64-64bit-gcc47-Optimized.

Execution time

I am simulating a single point source, with this xml file:

<source_library title="Library">
  <source flux="0.667283950775" name="bn131231198">
    <spectrum escale="MeV">
    <particle name="gamma">
      <power_law emax="200000.0" emin="15.0" gamma="2.0" />
    </particle>
    <celestial_dir dec="-2.42" ra="10.11" />
    </spectrum>
  </source>
</source_library>

This is the command line I am using:

gtobssim infile=sim_model.xml srclist=source_list.txt scfile=/home/giacomov/FermiData/bn131231198/gll_ft2_tr_bn131231198_v00.fit evroot=boost simtime=100000 ltfrac=1 tstart=410157919.083 nevents=no maxtime=3.155e8 use_ac=no emin=30 emax=100000 edisp=no irfs=P8_SOURCE_V4 maxrows=100000 seed=653786

These are the execution times with the flux as reported above, and then "boost" times that value:

BoostExecution time (s)
14.438
26.176
1020.802
5092.315
100197.873

This shows that the algorithm is essentially linearly dependent on the number of events (the inflection between 50 and 100 might be due to memory management, or just noise):

Profiling

I used this command to profile the execution:

valgrind --tool=callgrind gtobssim infile=sim_model.xml srclist=source_list.txt scfile=/home/giacomov/FermiData/bn131231198/gll_ft2_tr_bn131231198_v00.fit evroot=boost simtime=100000 ltfrac=1 tstart=410157919.083 nevents=no maxtime=3.155e8 use_ac=no emin=30 emax=100000 edisp=no irfs=P8_SOURCE_V4 maxrows=100000 seed=653786

and then the tool kcachegrind to produce this plot:

 

I am puzzled by the fact that 50 % of the time is spent in the FluxSource::fullTitle() function, which afaik should only return the full name of the source. The relevant piece of code which calls fullTitle is in observationSim/src/Simulator.cxx:

void Simulator::makeEvents(EventContainer &events, 
                           ScDataContainer &scData, 
                           std::vector<irfInterface::Irfs *> &respPtrs, 
                           Spacecraft *spacecraft,
                           bool useSimTime) {
   m_useSimTime = useSimTime;
   m_elapsedTime = 0.;
// Loop over event generation steps until done.
   while (!done()) {

         ...
         
         std::string name = m_newEvent->fullTitle();
         if (name.find("TimeTick") != std::string::npos) {
            if (!m_usePointingHistory) {
               scData.addScData(m_newEvent, spacecraft);
            }
         } else {
            if (events.addEvent(m_newEvent, respPtrs, spacecraft)) {
               m_numEvents++;
            }
         }

    ...

   } // while (!done())
}

This is essentially only checking if the name contains "TimeTick", because in that case it has to generate a new piece of simulated FT2 file. In my case I am using a real FT2 file, so the check is false. Now, as a quick-and-dirty fix, just by substituting that line with:

  while (!done()) {

         ...
         
         //std::string name = m_newEvent->fullTitle();
         
         std::string name = "just a source";
         
         if (name.find("TimeTick") != std::string::npos) {
            if (!m_usePointingHistory) {
               scData.addScData(m_newEvent, spacecraft);
            }
         } else {
            if (events.addEvent(m_newEvent, respPtrs, spacecraft)) {
               m_numEvents++;
            }
         }

    ...

   } // while (!done())

I reduce the execution time to ~1/3 of what it used to be:

BoostExecution time (s)Old execution time (s)
107.63620.802
5028.37592.315
10053.868197.873

An easy fix would be to cache the name of the source in the FluxSource class, so that it is only computed once.

  • No labels

8 Comments

  1. That implementation in FluxSource is awful and surprisingly convoluted.  Of course, the fix is not to modify anything in the observationSim package, but to put a sensible implementation of FluxSource::fullTitle() in the flux package along the lines you suggest.

  2. BTW, this sort of thing should be filed in a JIRA issue rather than as a confluence page.

  3. hi, I agree that the fullTitle in SimpleSpectrum should be constructed once and for all, but it was not meant to be used in an event loop, so I disagree that observationSim/Simulator.cxx does not need modification. The following one does away with the problem and is a much more natural way to use the Spectrum interface :

     

    diff -r1.64 Simulator.cxx
    234,235c234,235
    <          std::string name = m_newEvent->fullTitle();
    <          if (name.find("TimeTick") != std::string::npos) {
    ---
    >          std::string name = m_newEvent->particleName();
    >          if (name.compare("TimeTick") == 0) {

     m_newEvent->particleName() calls for each new event the corresponding spectrum object particle name, which for the TimeCandle class is TimeTick. I quickly check that the events are the same in output, and indeed I gain 30% increase of speed

    real    0m2.065s

    user    0m1.986s

    sys    0m0.076s

     against

    real    0m3.403s

    user    0m3.298s

    sys    0m0.100s

    If you agree I can commit it and update the JIRA.

  4. Giacomo sees a factor of three speed up, not 30%.  Given all of the stuff in the FluxSource interface I don't know how anyone can claim that there is a "natural" way to use it.  In any case, that fullTitle() abomination should be fixed or removed from the public interface.  Don't worry about this.  I will fix it.  That's why I assigned the JIRA to myself.

  5. Btw, future comments should be made on the JIRA issue, not here.

  6. I have uploaded the FT2 file as attachment to this page, as requested by Johan.

    I believe that the reason why you see only 30% improvement is that you are running a very small simulation. If you use a high flux for the point source, so that there are many events to simulate, you will see a much higher gain.

    There is a startup cost (loading the IRFS and so on) which is not negligible if you run for 3 seconds.