Creating a used_set given run number, jobset, and process_name
Introduction
A used_set is a way of attaching a calibration(s) to a given CDF run number.
Let us say you have your subdetector (the SVX) and somehow calculate 100 rows
of calibrations to be put into the SiChipPed table. This doesn't really happen
but suffices for example purposes. Now you want to run your offline physics
analysis job and access these calibrations. But which runs are these calibrations
valid for? In your analysis job you will iterate over runs and events. For
a given run, a used_set for that run defines a list of calibrations valid
for that run. (Note getting access to calibrations in the offline is a little
more complicated than this involving the table PASSES etc).
In the database, the concept of a used_set is basically realised by attaching
a run_number to a valid_set. A valid_set in turn is a set of "valid" calibrations.
It consists of a list of "CID" numbers where each CID refers to N rows in
a given table.
Required inputs
The inputs you require to make the used_set are as follows:
- run-number. This is the CDF run-number for which your calibrations
are going to be valid.
- jobset. This is the identifier for your calibrations. It has to already
exist in the VALID_SETS table. A jobset consists of a list of CID's where
each CID refers to N rows in a given calibration table.
- process_name. This is the "name" of the valid_set which the jobset
you are giving defines. A valid-set needs a name because often we require
the "latest" valid_set with a given name. eg, the "latest" L3_PHYSICS_CAL
valid_set is always used for Level3 because each time the calorimetry calibrations
change, level3 needs to keep up.
The tool
There are actually several tools to make a used_set. You can use CalibDB/bin/DBSetMerge.cc,
however, as its name might suggest, this does more than create a used_set,
it also merges valid_sets. If all you want to do is create a used_set (or
many used_sets) given a run (or runs), a single jobset and a single
process_name then it is recommended that you use:
CalibDB/bin/DBUsedSet_multi.cc
This is the name of the source code file in the cdf run2 software package
called CalibDB. In order to use this, you must check out the package and build
the binary executable. Do that like this:
source ~cdfsoft/cdf2.cshrc
setup cdfsoft2 development
newrel -t development calibdb_rel
cd calibdb_rel
addpkg -h CalibDB
gmake CalibDB.bin
This makes the binaries for a number of calibration utilities which will
then be in the directory:
calibdb_rel/bin/LinuxBlahBlah/
Where LinuxBlahBlah is your "architecture" dir (there should only be one
in the bin directory, probably something like: Linux2-GCC_3_3).
You can access immediate help on using DBUsedSet_multi by just executing
the binary with no arguements:
./calibdb_rel/bin/LinuxBlahBlah/DBUsedSet_multi
This gives the following help lines:
Usage: ./bin/Linux2-GCC_3_3/DBUsedSet_multi dbtype dbconnect data_run_file pname jobset testmode
Where:
* dbtype is the type of database (example OTL/Msql/Text)
* dbconnect is the connect string (example myname/mypassword@cdfonprd)
* data_run_file is a file with a list of runs to make used_set entries for
* pname is a process_name (example (PROD_PHYSICS_CDF)
* jobset is the jobset number identifying the calibrations to put into the used_set entries
* testmode = TESTMODE sets the script in testmode. So it tells what it will do,but doesn't actually do it. testmode = REAL (or anything else) actually executes.
Rules:
MUST have 6 arguments (which is probably why you're seeing this message
The format of the data_run_file for used sets is:
lowrun1 highrun1
lowrun2 highrun2
etc...
Note1 : No white space except the space between lowrun and highrun
Note2 : data_run_file is assumed to a filename of a file in the current directory
Note3 : In order to get the last line of the file, put a hard return at the end of it
So you have to run DBUsedSet_multi with the 6 arguments specified. Normally,
the first argument will be OTL (oracle template library). You will need some
sort of calibration write access to create used_sets at all. This information
is encoded in the db-connect string which is the second argument. Replace
myname with your dbase username, and mypassword with your password. Usually
you will be writing usedsets to cdfonprd (the production database). The other
databases (int and dev) are not intended for physics quality information.
The script expects to see the data_run_file in the directory where you run
it. So if you run using ./bin/Linux2-GCC_3_3/DBUsedSet_multi etc, then you
are running from the top level of your local release and the data_run_file
should be there. Read the 3 notes and rules at the end of the usage for how
to format this file.
The pname and jobset should be coherent. That is to say, there must be a
row in the table valid_sets which has this process_name and jobset.
Lastly, you are advised to run the script in test mode first to make sure
that you don't do "nasty things" to the dbase which will be a hassle to clean
up! Put TESTMODE as the last argument to do this.
Example output
Here is some example output. Running in testmode with the following data_run_file:
155150 155155
166144 166148
With the command line:
./bin/Linux2-GCC_3_3/DBUsedSet_multi OTL username/password@cdfonint runfile PROD_PHYSICS_CDF 32083 TESTMODE
Gives:
Reading runfile:
low_run: 155150 hig_run: 155155
low_run: 166144 hig_run: 166148
List of runs to be added:
155150
155151
155152
155153
155154
155155
166144
166145
166146
166147
166148
jobset: 32083
DBType=OTL
DBConnect=username/password@cdfonint
runfilename= runfile
process_name= PROD_PHYSICS_CDF
jobset= 32083
testmode= 1
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155150
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155151
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155152
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155153
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155154
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155155
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166144
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166145
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166146
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166147
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166148
Then running the real thing with the command line:
./bin/Linux2-GCC_3_3/DBUsedSet_multi OTL username/password@cdfonint runfile PROD_PHYSICS_CDF 32083 REAL
Gives:
Reading runfile:
low_run: 155150 hig_run: 155155
low_run: 166144 hig_run: 166148
List of runs to be added:
155150
155151
155152
155153
155154
155155
166144
166145
166146
166147
166148
jobset: 32083
DBType=OTL
DBConnect=username/password@cdfonint
runfilename= runfile
process_name= PROD_PHYSICS_CDF
jobset= 32083
testmode= 0
USED_SETS already has entry for
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155150
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155150
VERSION = GENERATED
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155150
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155151
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155151
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155152
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155152
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155153
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155153
USED_SETS already has entry for
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155154
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155154
VERSION = GENERATED
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155154
USED_SETS already has entry for
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155155
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155155
VERSION = GENERATED
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155155
USED_SETS already has entry for
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166144
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166144
VERSION = GENERATED
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166144
USED_SETS already has entry for
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166145
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166145
VERSION = GENERATED
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166145
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166146
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166146
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166147
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166147
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166148
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166148
The warnings are given because there already existed entries for some of
these runs for PROD_PHYSICS_CDF in the integration database.
For any further questions please mail me at msmartin@fnal.gov