Comparison of performance of silicon clustering between
single muon events and b-bbar events.
On average a single muon event took 0.014 seconds and a b-bbar event 0.135 seconds.
The table below shows how the time in the event step of the SiClusteringModule was spend
for both types of events. Only the 5 most prominent functions and the entry for operator
new is shown in the table. Notice that the percentage entry for new is not correct.
The entries for Incl.. CPU state how much time was spend in the function and all functions
called from it in total. The Excl. CPU column shows how much time was spend in the
function alone, excluding all of the functions it has called. The calls column indicates
how often during the program the function was called.
----------------------------------------------------------------------
Incl. Excl. Calls
Attrib. CPU CPU
% (secs) (secs)
1000 single muons:
-----------------
Immediate Parents:
100.0% 14.019 0.010 1000 APPSequence::event(TRY_Abstract_Record*)
Selected function:
0.2% 14.009 0.030 1000 SiClusteringModule::event(TRY_Abstract_Record*)
Immediate Children:
61.4% 8.595 0.050 1000 TRYSiClusterData::writePersistent(void)
16.7% 2.333 1.080 1000 TRYRun2SiStripSet::_readSIXDBank(void)
12.3% 1.722 1.170 1000 TRYRun2SiStripSet::_readISLDBank(void)
6.6% 0.924 0.020 1000 SiClusterFinder::findClusters(void)
1.3% 0.184 0.000 1000 TRYSiClusterData::~TRYSiClusterData(void)
0.0% 0.536 0.030 312683 operator new(unsigned int)
10 b-bbar:
----------
46.3% 0.627 0.000 10 SiClusterFinder::findClusters(void)
18.4% 0.249 0.020 10 TRYSiClusterData::writePersistent(void)
18.3% 0.248 0.010 10 TRYRun2SiStripSet::_readSIXDBank(void)
10.1% 0.137 0.000 10 TRYSiClusterData::~TRYSiClusterData(void)
3.4% 0.047 0.010 10 TRYRun2SiStripSet::_readISLDBank(void)
0.0% 0.383 0.050 182745 operator new(unsigned int)
For the single muons the by far dominant task is the persistent write of the cluster data.
The persistent write is less than one third of the percentage in the b-bbar events, but is
still the second largest time consumer. This indicates that the persistent write has a large
"once only" overhead. For b-bbar the write persistent and "read DBanks" functions are
comparable in time consumption and make up approximately have of the total CPU time. In the
muon events the total time spent on persistent I/O is approximately 90% of the time.
It is also apparent that the readISLDBank/ReadCVXDBank ratio is close to 1 in the muon case
indicating that these functions have similar overhead. However for the b-bbar events the
ISLDBank seems to read much fast than the SVXD bank indicating that it contains either less
data or is more efficiently read.
The firsts level breakdown shows that in semi "realistic" events like b-bbar the cluster
finding is dominant and performance increase is most likely to be gained there.
Caveat:
Note that these events do not have an underlying event nor do they have any noise or
realistic hit generation simulations in them. It is therefore to be expected that
"Real" events will have significantly more hits in them and increase the time spend in
both the read DBank and findCluster functions.
In order to be able to compare the muon and b-bar events more directly with each other
the following table shows the inclusive and exclusive CPU times spend in each function
divided by the number of events in milli seconds for both types of events.
single-mu | b-bbar | Function Name
Incl. Excl. | Incl. Excl.|
------------------------------------------------------------
8.595 0.050 | 24.9 2.0 | TRYSiClusterData::writePersistent(void)
2.333 1.080 | 24.8 1.0 | TRYRun2SiStripSet::_readSIXDBank(void)
1.722 1.170 | 4.7 1.0 | TRYRun2SiStripSet::_readISLDBank(void)
0.924 0.020 | 62.7 0.0 | SiClusterFinder::findClusters(void)
0.184 0.000 | 13.7 0.0 | TRYSiClusterData::~TRYSiClusterData(void)
0.536 0.030 | 38.3 5.0 | operator new(unsigned int)
Most noticeable in this table is the increase in the findClusters function. The
findCluster function spends it time mainly in SiClusterFinder::_ClusterFindingLoop
(70% for muons, 81% for b-bbar) and in SiClusterList::addDetector (21% mu, 16% b-bbar).
The SiClusterFinder::_ClusterFindingLoop is in both cases dominated by
SiClusterFinder::_generateClusters(const SiDigiCode&) (78% mu, 72% b-bbar)
Inside the _generateClusters function the muon and b-bbar sample start to deviate as
can be seen in the table below.
--------------------------------------------------------------------
Incl. Excl. Calls
Attrib. CPU CPU
% (secs) (secs)
1000 single muons:
-----------------
Immediate Parents:
100.0% 0.653 0.040 12470 SiClusterFinder::_clusterFindingLoop(const SiDigiCode&)
Selected function:
11.7% 0.511 0.060 12474 SiClusterFinder::_generateClusters(const SiDigiCode&)
Immediate Children:
27.8% 0.142 0.000 9590 SiLayerFinder::newHit(SiCluster*,const SvxIslWafer*)
14.4% 0.074 0.000 2884 SiZLayerFinder::newHit(SiCluster*,const SvxIslWafer*)
11.7% 0.536 0.030 312683 operator new(unsigned int)
7.9% 0.041 0.000 9590 SiLayerFinder::newCluster(void)
7.2% 0.037 0.000 12474 __insert_multi__Q2_5__kai72...(Abbreviated)
5.9% 0.030 0.030 2884 SiZLayerFinder::findClusters(void)
5.9% 0.030 0.010 12474 SiLayerFinder::reset(void)
2.4% 0.012 0.000 2884 SiZLayerFinder::newCluster(void)
2.0% 0.010 0.010 9590 SiLayerFinder::findClusters(void)
1.2% 0.020 0.020 72778 SiStrip::SiStrip(const SiStrip&)
1.1% 0.060 0.060 138255 __search__Q2_5__kai72...(Abbreviated)
0.7% 0.010 0.010 37388 __upper_bound__Q2_5__kai72...(Abbreviated)
0.2% 0.010 0.010 136151 SiDigiCode::SiDigiCode(const SiDigiCode&)
0.2% 0.010 0.010 160768 SiDigiCode::~SiDigiCode(void)
0.0% 0.010 0.010 112758 __kai::rb_tree_base::...(Abbreviated)
10 b-bbar:
----------
22.9% 0.083 0.000 7372 SiLayerFinder::newHit(SiCluster*,const SvxIslWafer*)
21.4% 0.383 0.050 182745 operator new(unsigned int)
12.7% 0.046 0.000 7372 SiLayerFinder::newCluster(void)
8.0% 0.029 0.000 9607 __insert_multi__Q2_5__kai72...(Abbreviated)
6.9% 0.025 0.000 2235 SiZLayerFinder::newHit(SiCluster*,const SvxIslWafer*)
6.5% 0.050 0.050 20395 __upper_bound__Q2_5__kai72...(Abbreviated)
3.8% 0.014 0.000 2235 SiZLayerFinder::newCluster(void)
3.3% 0.030 0.030 69014 SiStrip::SiStrip(const SiStrip&)
2.8% 0.010 0.010 7351 SiLayerFinder::findClusters(void)
2.3% 0.090 0.090 103218 __search__Q2_5__kai72...(Abbreviated)
0.8% 0.030 0.030 80208 __kai::rb_tree_base::...(Abbreviated)
0.4% 0.010 0.010 65559 SiDigiCode::~SiDigiCode(void)
0.0% 0.000 0.000 2224 SiZLayerFinder::findClusters(void)
Note: The strange looking names ending in ....(abbreviated) are templated functions
which are not correctly De-mangled by cvd.
It is difficult to interpret the above table. It seems that, as expected from the number
of Z- and phi-Layers, the fraction of time spent in the making of ZLayer Clusters and Hits
is always smaller than that of the Phi Layers. For the muons however the Z-Hits and
Z-Clusters seem to take slightly more time than expected.
Footnote:
Hits are, as far as I understand the space points made from the clusters as is indicated
by the fact that the newHit and newCluster functions are equally often called.
#ZHits or ZClusters #Z-Layers
---------------------- = 0.300 (for muons) 0.303 (for b-bbar) =approx. ----------
#PhiHits or PhiClusters #Phi-Layers
%time(ZHits and ZClusters) 16.8 10.7
------------------------------ = ------=0.47(muons) ------=0.300(b-bbar)
%time(PhiHits and PhiClusters) 35.7 35.6
However the time spend in operator new is taking a very large fraction of the total time.
This is in particular so for the b-bbar events. This may be a sign that the creation
of individual objects for Clusters and Hits may be rather expensive and these objects
could be merged, saving the creation of the Hits but possibly requiring a larger Object
like a cluster with additional space point information being used instead.
++++++++
Summary:
++++++++
In total the clustering of a bbar event takes 0.135 seconds which is 14% of the estimated
time available for an event to pass the complete level3 trigger. This does seem to be
a lot given the many other task the level3 trigger has to perform.
46% of the time (for b-bbar events) is spend in the actual cluster finding and
Hit making whereas 22% fall to the reading of the DBanks (SVXD=18.3 ISLD=3.4) and
18.4% to the persistent write of the results banks (clusters and hits).
During the cluster finding a significant fraction of the time is spent in operator new,
creating Hit and Cluster Objects. Maybe a merge of these structures should be considered.
New as called in the cluster finding only takes 5% of the total time of the
SiClusteringModule. In total new takes 28.3%, so nearly one third of the modules time.
The corresponding time spend in delete or automatic delete-like functions is not easy
to determine but can be nearly as big. This may indicate that the module is spending
over 50% of its time in memory management functions.
Regional unpacking of the DBanks may be a solution to this problem as it will limit the
number of clusters and hits to be created. Even a rather complicated access mechanism
may still prove profitable for this task.