Michael Kuss is kindly leading the charge to check that our simulation results when running RHEL4 builds of GlastRelease are at statistically equivalent to our RHEL3 builds.
Simulations Run
GR v17r34p0rhel4B: http://glast-ground.slac.stanford.edu/SystemTests/?releaseVersionId=11881 OutputLevel was set to Debug.
Use v17r34p0 as a reference which is the RHEL3 version of the systests. Please note v17r34p0B is another RHEL3 run and the results are seemingly identical..though OutputLevel was set to Debug. The resulting log file may be of interest.
Of particular interest has been the BackGndMixDC2 test.
Full ROOT files are available on u17/systests/GlastRelease where one uses the name of the systest to drill down into the directory structure.
System test report with more plots is here
Testing on Real Data Run 274279559 (Sept 13, 2009)
Using GR v15r47p12gr1
http://glast-ground.slac.stanford.edu/DataProcessing/run.jsp?runid=274279559
while the RHEL4 files can be found in the DEV Data Catalog. Here are the most relevant ones:
root://glast-test-rdr.slac.stanford.edu//glast/Data/Flight/Level1/LPA/dev/1.77/merit/r0274279559_v003_merit.root
root://glast-test-rdr.slac.stanford.edu//glast/Data/Flight/Level1/LPA/dev/1.77/digi/r0274279559_v001_digi.root
root://glast-test-rdr.slac.stanford.edu//glast/Data/Flight/Level1/LPA/dev/1.77/svac/r0274279559_v001_svac.root
A diff was performed using Luca Baldini's diff tool, the results are displayed in this PDF from Anders. Note from Anders: "We do expect CalCfpEnergy (and CTBBestEnergy) to change for quite a few events." Anders suggests: "Have a look at p108 (bottom plot), p109 (both plots), 110 (top plot)."
Calling All GR Package Owners
It is requested that all GR Package owners take a hard look at the warnings associated with their packages in the RHEL4 builds. Not sure what packages you own?? Check the list:
https://confluence.slac.stanford.edu/display/SAS/List+of+GR+Package+Owners+as+of+September+8+2009
Update Sept 15,2009
Some owners have taken the time to fix up warnings in their packages. So far these tags have not propagated into any version of GR. No smoking gun has been found yet, but the clean up is to our own benefit regardless.
Questions Swirl Around MCTERMZ
Looking at BackGndMixDC2, one failed plot is MCTERMZ.
However, for the other tests, this distribution is very similar for the rh9 and the rhel 4 runs.
Vertical Proton 1 GeV |
AllGamma |
---|---|
|
|
Vertical Muon 1 GeV |
Vertical Gamma 100 GeV |
|
|
And here are the results for dedicated CrElectronPrimary and CrElectronSplash runs for the rh9 and rhel4 builds. These are not composite flux sources. The plots that differ for the background mix runs are not identical, but are very similar as might be reasonably expected for minor differences caused by random number generators. This strongly supports the idea that it is just the balance among the composite mix that is changing in the MC generation and not anything in the actual performance of the reconstruction, etc.
Outputs available in
/afs/slac/g/glast/users/ehays/CrElectronPrimary
/afs/slac/g/glast/users/ehays/CrElectronPrimary_rhel4
/afs/slac/g/glast/users/ehays/CrElectronSplash
/afs/slac/g/glast/users/ehays/CrElectronSplash_rhel4
CrElectronSplash (40k) |
CrElectronPrimary (40k) |
---|---|
|
|
Unsigned versus Signed Ints
There are a number of warnings in our builds concerning unsigned/signed match. These should be looked over and fixed.
An audit of existing JO properties is also in order to check for mismatches, as these will not be caught at compile time.
One specific JO issue concerns TriggerAlg.mask:
Trigger: there is something really weird with the trigger mask:
In the jobOptions:
TriggerAlg.mask = -1;
rh9_gcc32opt logs:
TriggerAlg INFO No trigger requirement
rhel_gcc34opt logs:
TriggerAlg INFO Applying trigger mask: 589cba38
In TriggerAlg::TriggerAlg() we find:
declareProperty("mask" , m_mask=0xffffffff);
and later, in TriggerAlg::initialize() log << MSG::INFO; if(log.isActive()) { if (m_mask==0xffffffff) log.stream() << "No trigger requirement"; else log.stream() << "Applying trigger mask: " << std::setbase(16) <<m_mask <<std::setbase(10); if( m_throttle) log.stream() <<", throttled by rejecting the value "<< m_vetobits; } log << endreq;
mask is an unsigned int. For the moment, could we rerun both systests with simply the Trigger.mask=-1; line commented, as all bits set is the default anyway? And note anyway that the code should be changed? rh9 interprets correctly in the sense that with (signed int)-1 0xffffffff is intended. How rhel4 comes up with 0x589cba38 beats me.
Suggested Fix:
Make TriggerAlg.m_mask a StringProperty and convert the value to an unsigned int via the tools available in facilities::Util. This has been tried out and seems to work.
Current Status:
Unfortunately, the systest results remain the same (comparing v17r34p0 versus v17r34p0B and v17r34p0rhel4 versus v17r34p0rhel4B, even after being sure that the Trigger.mask is set to the appropriate default. It seems that when the ConfigSvc is enabled, the trigger mask has no effect.
Exception in CalLikelihoodManagerTool
rhel4 also contains one event that threw an exception in CalLikelihoodManagerTool
Random Sequences Diverge
In regards to the BackGndMixDC2 run and the mix of events:
"As Richard mentioned, the two random sequences will diverge eventually. Hence, the particle mix for the 40k triggers is different. The biggest discrepancy is in "1002 CrElectronSplash", with 6409 trigger for rh9_gcc32opt vs. 6603 for rhel4. Consequently, the other sources are weighted stronger in rh9, in general."
2 Comments
Anders W. Borgland
I have seen the exception in CalLikelihoodManagerTool on RHEL3 too.
Kelly (Arrighi), Heather
From Michael Kuss October 20, 2009
Conclusions after studying differences between background runs of rh9_gcc32opt and rhel_gcc34opt builds of GR:
Won't fix
This is an excerpt of the diff of two looong log files (rh9_gcc32opt_rhel3 left, rhel4_gcc34opt_rhel4 right):...
CrSpectrum::setPosition m_latitude 25.4705134181355 CrSpectrum::setPosition m_latitude 25.4705134181355
CrSpectrum::setPosition m_longitude -91.2545661814805 CrSpectrum::setPosition m_longitude -91.2545661814805
CrSpectrum::setPosition m_altitude 574.785474150153 CrSpectrum::setPosition m_altitude 574.785474150153
CrSpectrum::setPosition astro::IGRField::Model().lambda() 0.659972429275513 CrSpectrum::setPosition astro::IGRField::Model().lambda() 0.659972429275513
CrSpectrum::setPosition m_geomagneticLambda 0.659972429275513 CrSpectrum::setPosition m_geomagneticLambda 0.659972429275513
CrSpectrum::setPosition astro::IGRField::Model().R() 1.05167531967163 | CrSpectrum::setPosition astro::IGRField::Model().R() 1.05167520046234
CrSpectrum::setPosition m_geomagneticR 1.05167531967163 | CrSpectrum::setPosition m_geomagneticR 1.05167520046234
CrSpectrum::setPosition a*::Model().verticalRigidityCutoff() 5.06358528137207 | CrSpectrum::setPosition a*::Model().verticalRigidityCutoff() 5.06358623504639
CrSpectrum::setPosition m_cutOffRigidity 5.06358528137207 | CrSpectrum::setPosition m_cutOffRigidity 5.06358623504639
...
CrProton::flux particle proton CrProtonPrimary CrProton::flux particle proton CrProtonPrimary
CrProton::flux particle entering CrProtonPrimary->flux() CrProton::flux particle entering CrProtonPrimary->flux()
CrProtonPrimary::flux xxx15 m_cutOffRigidity 5.06358528137207 | CrProtonPrimary::flux xxx15 m_cutOffRigidity 5.06358623504639
CrProtonPrimary::flux xxx15 cor 5.06358528137207 | CrProtonPrimary::flux xxx15 cor 5.06358623504639
CrProtonPrimary::flux xxx15 phi 1068.90738142523 CrProtonPrimary::flux xxx15 phi 1068.90738142523
CrProtonPrimary::flux xxx15 tmp1 388.120685195923 | CrProtonPrimary::flux xxx15 tmp1 388.120604515076
CrProtonPrimary::flux xxx15 tmp2 368.821272468567 | CrProtonPrimary::flux xxx15 tmp2 368.821197795868
CrProtonPrimary::flux xxx15 energy_integral 374.821965255054 | CrProtonPrimary::flux xxx15 energy_integral 374.821888714265
CrProton::flux xxx15 (*i)>flux() 374.821965255054 | CrProton::flux xxx15 (*i)>flux() 374.821888714265
CrProton::flux xxx15 total_flux 374.821965255054 | CrProton::flux xxx15 total_flux 374.821888714265
FluxSource::flux calls m_spectrum->flux(time) FluxSource::flux calls m_spectrum->flux(time)
CrProton::flux particle proton CrProtonPrimary CrProton::flux particle proton CrProtonPrimary
CrProton::flux particle entering CrProtonPrimary->flux() CrProton::flux particle entering CrProtonPrimary->flux()
CrProtonPrimary::flux xxx15 m_cutOffRigidity 5.06358528137207 | CrProtonPrimary::flux xxx15 m_cutOffRigidity 5.06358623504639
CrProtonPrimary::flux xxx15 cor 5.06358528137207 | CrProtonPrimary::flux xxx15 cor 5.06358623504639
CrProtonPrimary::flux xxx15 phi 1068.90738142523 CrProtonPrimary::flux xxx15 phi 1068.90738142523
CrProtonPrimary::flux xxx15 tmp1 388.120685195923 | CrProtonPrimary::flux xxx15 tmp1 388.120604515076
CrProtonPrimary::flux xxx15 tmp2 368.821272468567 | CrProtonPrimary::flux xxx15 tmp2 368.821197795868
CrProtonPrimary::flux xxx15 energy_integral 374.821965255054 | CrProtonPrimary::flux xxx15 energy_integral 374.821888714265
CrProton::flux xxx15 (*i)>flux() 374.821965255054 | CrProton::flux xxx15 (*i)>flux() 374.821888714265
CrProton::flux xxx15 total_flux 374.821965255054 | CrProton::flux xxx15 total_flux 374.821888714265
FluxSource::flux xxx15 m_spectrum->flux(time) 374.821965255054 | FluxSource::flux xxx15 m_spectrum->flux(time) 374.821888714265
FluxSource::rate xxx15 flux(time) 374.821965255054 | FluxSource::rate xxx15 flux(time) 374.821888714265
FluxSource::rate xxx15 totalArea() 6 FluxSource::rate xxx15 totalArea() 6
FluxSource::calculateInterval xxx15 r 19782.6372651494 | FluxSource::calculateInterval xxx15 r 19782.6332254217
FluxSource::calculateInterval xxx15 val 3.35892711929847e-05 | FluxSource::calculateInterval xxx15 val 3.35892780521074e-05
EventSource::setInterval xxx15 input: time 3.35892711929847e-05 | EventSource::setInterval xxx15 input: time 3.35892780521074e-05
EventSource::setInterval xxx15 output: m_interval 3.35892711929847e-05 | EventSource::setInterval xxx15 output: m_interval 3.35892780521074e-05
...
To make a long story short: in EventSource::setInterval() the time to the next event is being set. The input time stems from FluxSource::calculateInterval(), which depends on FluxSource::rate(), which depends on FluxSource::flux(). The latter calls the flux() function of the source, here CrProton, which calls the flux() of the sub component, here CrProtonPrimary. This last one depends on m_cutOffRigidity, which is being set in the beginning of a run in CrSpectrum::setPosition() by a call to astro::IGRField::Model().verticalRigidityCutoff().
And, here is the problem. The call returns slightly different values for both builds, which eventually show up in the event time. I'm unsure from where it stems. It could be due to that all the flux classes use double, and astro primarily float variables. I'm very unsure if I would like to modify anything in IGRField or even in igrf_sub/igrf_sub.
Event with different G4 behavior. The initially small spatial position differences add up, and different active volumes are hit. There is also an effect because of different step lengths.PosDetectorManager::ProcessHits energy 0.171759 | PosDetectorManager::ProcessHits energy 0.0424398
PosDetectorManager::ProcessHits hit base class McPositionHit : PosDetectorManager::ProcessHits hit base class McPositionHit :
Volume ID = /0/1/1/1/1/0/0/0/1 Volume ID = /0/1/1/1/1/0/0/0/1
Entry point (x, y, z) = ( -42.7608 , 6.49615 , 0.0653712 ) | Entry point (x, y, z) = ( -43.9036 , 5.90944 , -0.132663 )
Deposited Energy = 0.171759 | Deposited Energy = 0.0424398
Particle Energy = 0 Particle Energy = 0
Time of flight = 0 Time of flight = 0
Exit point (x, y, z) = ( -42.772 , 6.4347 , 0.0464215 ) | Exit point (x, y, z) = ( -43.9001 , 5.90916 , -0.13473 )
McParticle = 0 McParticle = 0
ancestor McParticle = 0 ancestor McParticle = 0
PosDetectorManager::ProcessHits aStep->GetStepLength() 0.00235144 | PosDetectorManager::ProcessHits aStep->GetStepLength() 0.163489
PosDetectorManager::ProcessHits energy 0.0131892 | PosDetectorManager::ProcessHits energy 0.0960532
PosDetectorManager::ProcessHits hit base class McPositionHit : PosDetectorManager::ProcessHits hit base class McPositionHit :
Volume ID = /0/1/1/1/1/0/0/0/1 Volume ID = /0/1/1/1/1/0/0/0/1
Entry point (x, y, z) = ( -42.772 , 6.4347 , 0.0464215 ) | Entry point (x, y, z) = ( -42.7622 , 6.49562 , 0.0653716 )
Deposited Energy = 0.0131892 | Deposited Energy = 0.0960532
Particle Energy = 0 <
Time of flight = 0 <
Exit point (x, y, z) = ( -42.7717 , 6.43455 , 0.0460847 ) <
McParticle = 0 <
ancestor McParticle = 0 <
PosDetectorManager::ProcessHits aStep->GetStepLength() 0.356458 <
PosDetectorManager::ProcessHits energy 0.200804 <
PosDetectorManager::ProcessHits hit base class McPositionHit : <
Volume ID = /0/1/1/1/2/1/0/2/1 <
Entry point (x, y, z) = ( -19.718 , -10.7067 , -0.185961 ) <
Deposited Energy = 0.200804 <
Particle Energy = 0 Particle Energy = 0
Time of flight = 0 Time of flight = 0
Exit point (x, y, z) = ( -19.7685 , -10.6031 , 0.088998 ) | Exit point (x, y, z) = ( -42.7638 , 6.41965 , 0.105701 )
McParticle = 0 McParticle = 0
ancestor McParticle = 0 ancestor McParticle = 0
PosDetectorManager::ProcessHits aStep->GetStepLength() 0.247625 | PosDetectorManager::ProcessHits aStep->GetStepLength() 0.0628186