Profiling one of my programs, I found some surprising things in gtobssim.
I am using ScienceTools-LATEST-1-4066, build x86_64-64bit-gcc47-Optimized.
Execution time
I am simulating a single point source, with this xml file:
<source_library title="Library"> <source flux="0.667283950775" name="bn131231198"> <spectrum escale="MeV"> <particle name="gamma"> <power_law emax="200000.0" emin="15.0" gamma="2.0" /> </particle> <celestial_dir dec="-2.42" ra="10.11" /> </spectrum> </source> </source_library>
This is the command line I am using:
gtobssim infile=sim_model.xml srclist=source_list.txt scfile=/home/giacomov/FermiData/bn131231198/gll_ft2_tr_bn131231198_v00.fit evroot=boost simtime=100000 ltfrac=1 tstart=410157919.083 nevents=no maxtime=3.155e8 use_ac=no emin=30 emax=100000 edisp=no irfs=P8_SOURCE_V4 maxrows=100000 seed=653786
These are the execution times with the flux as reported above, and then "boost" times that value:
Boost | Execution time (s) |
---|---|
1 | 4.438 |
2 | 6.176 |
10 | 20.802 |
50 | 92.315 |
100 | 197.873 |
This shows that the algorithm is essentially linearly dependent on the number of events (the inflection between 50 and 100 might be due to memory management, or just noise):
Profiling
I used this command to profile the execution:
valgrind --tool=callgrind gtobssim infile=sim_model.xml srclist=source_list.txt scfile=/home/giacomov/FermiData/bn131231198/gll_ft2_tr_bn131231198_v00.fit evroot=boost simtime=100000 ltfrac=1 tstart=410157919.083 nevents=no maxtime=3.155e8 use_ac=no emin=30 emax=100000 edisp=no irfs=P8_SOURCE_V4 maxrows=100000 seed=653786
and then the tool kcachegrind to produce this plot:
I am puzzled by the fact that 50 % of the time is spent in the FluxSource::fullTitle() function, which afaik should only return the full name of the source. The relevant piece of code which calls fullTitle is in observationSim/src/Simulator.cxx:
void Simulator::makeEvents(EventContainer &events, ScDataContainer &scData, std::vector<irfInterface::Irfs *> &respPtrs, Spacecraft *spacecraft, bool useSimTime) { m_useSimTime = useSimTime; m_elapsedTime = 0.; // Loop over event generation steps until done. while (!done()) { ... std::string name = m_newEvent->fullTitle(); if (name.find("TimeTick") != std::string::npos) { if (!m_usePointingHistory) { scData.addScData(m_newEvent, spacecraft); } } else { if (events.addEvent(m_newEvent, respPtrs, spacecraft)) { m_numEvents++; } } ... } // while (!done()) }
This is essentially only checking if the name contains "TimeTick", because in that case it has to generate a new piece of simulated FT2 file. In my case I am using a real FT2 file, so the check is false. Now, as a quick-and-dirty fix, just by substituting that line with:
while (!done()) { ... //std::string name = m_newEvent->fullTitle(); std::string name = "just a source"; if (name.find("TimeTick") != std::string::npos) { if (!m_usePointingHistory) { scData.addScData(m_newEvent, spacecraft); } } else { if (events.addEvent(m_newEvent, respPtrs, spacecraft)) { m_numEvents++; } } ... } // while (!done())
I reduce the execution time to ~1/3 of what it used to be:
Boost | Execution time (s) | Old execution time (s) |
---|---|---|
10 | 7.636 | 20.802 |
50 | 28.375 | 92.315 |
100 | 53.868 | 197.873 |
An easy fix would be to cache the name of the source in the FluxSource class, so that it is only computed once.
8 Comments
James Chiang
Dec 05, 2014That implementation in FluxSource is awful and surprisingly convoluted. Of course, the fix is not to modify anything in the observationSim package, but to put a sensible implementation of FluxSource::fullTitle() in the flux package along the lines you suggest.
James Chiang
Dec 05, 2014BTW, this sort of thing should be filed in a JIRA issue rather than as a confluence page.
James Chiang
Dec 05, 2014Filed as OBS-23@JIRA
Giacomo Vianello
Dec 05, 2014thank you!
Johann Cohen-Tanugi
Dec 06, 2014hi, I agree that the fullTitle in SimpleSpectrum should be constructed once and for all, but it was not meant to be used in an event loop, so I disagree that observationSim/Simulator.cxx does not need modification. The following one does away with the problem and is a much more natural way to use the Spectrum interface :
m_newEvent->particleName() calls for each new event the corresponding spectrum object particle name, which for the TimeCandle class is TimeTick. I quickly check that the events are the same in output, and indeed I gain 30% increase of speed
real 0m2.065s
user 0m1.986s
sys 0m0.076s
against
real 0m3.403s
user 0m3.298s
sys 0m0.100s
If you agree I can commit it and update the JIRA.
James Chiang
Dec 06, 2014Giacomo sees a factor of three speed up, not 30%. Given all of the stuff in the FluxSource interface I don't know how anyone can claim that there is a "natural" way to use it. In any case, that fullTitle() abomination should be fixed or removed from the public interface. Don't worry about this. I will fix it. That's why I assigned the JIRA to myself.
James Chiang
Dec 06, 2014Btw, future comments should be made on the JIRA issue, not here.
Giacomo Vianello
Dec 06, 2014I have uploaded the FT2 file as attachment to this page, as requested by Johan.
I believe that the reason why you see only 30% improvement is that you are running a very small simulation. If you use a high flux for the point source, so that there are many events to simulate, you will see a much higher gain.
There is a startup cost (loading the IRFS and so on) which is not negligible if you run for 3 seconds.