offline ana hints
Some hints on how to plan for CDF Run 2 analysis at home.
This is not supposed to be anything special, is just what I found
usefull to think about when making a plan for doing analysis
in Italy. I am not trying to sell this to you if you feel it
does not help. Some numbers (in the same spirit) are in
Stefano's rules of thumb
- define the size of the group: how many people will be
working at home, on how many different topics
- define the size of the data you will need. High-pt or Low-pt
can make a difference.
- what environment will you have/ can you have / do you want to
have at home ? Only Root/Paw ? Offline ? Frozen ? Tapes ? Robots ?
Disk servers ? Clusters ?
- in deciding how much hardwre you need, you will need to figure
out how many passes of a given analsys tool you will make, how long
each take on one computer, decide how fast you want to do it, and
this will tell you how many computers you need. It is a hard guess,
but a wild one will still give you a feeling for orders of
magnitude.
- if your data volume at home exceedes a few TB you will need
a tape robot to be competitive with Fermilab.
- define which data set will you need at home, how/where you
will create them (copy Pads from tape at FCC ? Strip PADs to
files at FCC ? Make ntuples ? ) and how many times you will have
to do it. How much resources will this take ?
- then worry about getting that data
- look at network connectivity to your institution, and especially
at realistic foreseable increase in the next 2~4 years
- remember that tape will cost 20 times more then last Run
(still 1$/GB, but 20x the data !), do you think buyng new tapes
every time a new reconstruction is run on your data ?
- buying hardware is easy, operating takes manpower. Unless you are
overseas you are likely to be able to telnet to FCC and get the
same speed as being at Fermilab.
- you may have computing resources now, that are likely to be
obsolete by the time we really have 2 fb^-1 of data.
- do not only look at the final sample (my usual mistake) also
think of the usage pattern you will follow as data is being accumalated
- in the end people time is more scarse then computers
- not enough ? Worry about Run 2b.
Stefano Belforte
Last modified: Thu Jun 21 20:49:41 CDT 2001