Profiling one of my programs, I found some surprising things in gtobssim.
I am using ScienceTools-LATEST-1-4066, build x86_64-64bit-gcc47-Optimized.
Execution time
I am simulating a single point source, with this xml file:
<source_library title="Library"> <source flux="0.667283950775" name="bn131231198"> <spectrum escale="MeV"> <particle name="gamma"> <power_law emax="200000.0" emin="15.0" gamma="2.0" /> </particle> <celestial_dir dec="-2.42" ra="10.11" /> </spectrum> </source> </source_library>
This is the command line I am using:
gtobssim infile=sim_model.xml srclist=source_list.txt scfile=/home/giacomov/FermiData/bn131231198/gll_ft2_tr_bn131231198_v00.fit evroot=boost simtime=100000 ltfrac=1 tstart=410157919.083 nevents=no maxtime=3.155e8 use_ac=no emin=30 emax=100000 edisp=no irfs=P8_SOURCE_V4 maxrows=100000 seed=653786
These are the execution times with the flux as reported above, and then "boost" times that value:
Boost | Execution time (s) |
---|---|
1 | 4.438 |
2 | 6.176 |
10 | 20.802 |
50 | 92.315 |
100 | 197.873 |
This shows that the algorithm is essentially linearly dependent on the number of events (the inflection between 50 and 100 might be due to memory management, or just noise):
Profiling
I used this command to profile the execution:
valgrind --tool=callgrind gtobssim infile=sim_model.xml srclist=source_list.txt scfile=/home/giacomov/FermiData/bn131231198/gll_ft2_tr_bn131231198_v00.fit evroot=boost simtime=100000 ltfrac=1 tstart=410157919.083 nevents=no maxtime=3.155e8 use_ac=no emin=30 emax=100000 edisp=no irfs=P8_SOURCE_V4 maxrows=100000 seed=653786
and then the tool kcachegrind to produce this plot:
I am puzzled by the fact that 50 % of the time is spent in the FluxSource::fullTitle() function, which afaik should only return the full name of the source. The relevant piece of code which calls fullTitle is in observationSim/src/Simulator.cxx:
void Simulator::makeEvents(EventContainer &events, ScDataContainer &scData, std::vector<irfInterface::Irfs *> &respPtrs, Spacecraft *spacecraft, bool useSimTime) { m_useSimTime = useSimTime; m_elapsedTime = 0.; // Loop over event generation steps until done. while (!done()) { ... std::string name = m_newEvent->fullTitle(); if (name.find("TimeTick") != std::string::npos) { if (!m_usePointingHistory) { scData.addScData(m_newEvent, spacecraft); } } else { if (events.addEvent(m_newEvent, respPtrs, spacecraft)) { m_numEvents++; } } ... } // while (!done()) }
This is essentially only checking if the name contains "TimeTick", because in that case it has to generate a new piece of simulated FT2 file. In my case I am using a real FT2 file, so the check is false. Now, as a quick-and-dirty fix, just by substituting that line with:
while (!done()) { ... //std::string name = m_newEvent->fullTitle(); std::string name = "just a source"; if (name.find("TimeTick") != std::string::npos) { if (!m_usePointingHistory) { scData.addScData(m_newEvent, spacecraft); } } else { if (events.addEvent(m_newEvent, respPtrs, spacecraft)) { m_numEvents++; } } ... } // while (!done())
I reduce the execution time to ~1/3 of what it used to be:
Boost | Execution time (s) | Old execution time (s) |
---|---|---|
10 | 7.636 | 20.802 |
50 | 28.375 | 92.315 |
100 | 53.868 | 197.873 |
An easy fix would be to cache the name of the source in the FluxSource class, so that it is only computed once.
8 Comments
James Chiang
That implementation in FluxSource is awful and surprisingly convoluted. Of course, the fix is not to modify anything in the observationSim package, but to put a sensible implementation of FluxSource::fullTitle() in the flux package along the lines you suggest.
James Chiang
BTW, this sort of thing should be filed in a JIRA issue rather than as a confluence page.
James Chiang
Filed as OBS-23@JIRA
Giacomo Vianello
thank you!
Johann Cohen-Tanugi
hi, I agree that the fullTitle in SimpleSpectrum should be constructed once and for all, but it was not meant to be used in an event loop, so I disagree that observationSim/Simulator.cxx does not need modification. The following one does away with the problem and is a much more natural way to use the Spectrum interface :
m_newEvent->particleName() calls for each new event the corresponding spectrum object particle name, which for the TimeCandle class is TimeTick. I quickly check that the events are the same in output, and indeed I gain 30% increase of speed
real 0m2.065s
user 0m1.986s
sys 0m0.076s
against
real 0m3.403s
user 0m3.298s
sys 0m0.100s
If you agree I can commit it and update the JIRA.
James Chiang
Giacomo sees a factor of three speed up, not 30%. Given all of the stuff in the FluxSource interface I don't know how anyone can claim that there is a "natural" way to use it. In any case, that fullTitle() abomination should be fixed or removed from the public interface. Don't worry about this. I will fix it. That's why I assigned the JIRA to myself.
James Chiang
Btw, future comments should be made on the JIRA issue, not here.
Giacomo Vianello
I have uploaded the FT2 file as attachment to this page, as requested by Johan.
I believe that the reason why you see only 30% improvement is that you are running a very small simulation. If you use a high flux for the point source, so that there are many events to simulate, you will see a much higher gain.
There is a startup cost (loading the IRFS and so on) which is not negligible if you run for 3 seconds.