Creating a used_set given run number, jobset, and process_name


Introduction

A used_set is a way of attaching a calibration(s) to a given CDF run number. Let us say you have your subdetector (the SVX) and somehow calculate 100 rows of calibrations to be put into the SiChipPed table. This doesn't really happen but suffices for example purposes. Now you want to run your offline physics analysis job and access these calibrations. But which runs are these calibrations valid for? In your analysis job you will iterate over runs and events. For a given run, a used_set for that run defines a list of calibrations valid for that run. (Note getting access to calibrations in the offline is a little more complicated than this involving the table PASSES etc).

In the database, the concept of a used_set is basically realised by attaching a run_number to a valid_set. A valid_set in turn is a set of "valid" calibrations. It consists of a list of "CID" numbers where each CID refers to N rows in a given table.

Required inputs

The inputs you require to make the used_set are as follows:

The tool

There are actually several tools to make a used_set. You can use CalibDB/bin/DBSetMerge.cc, however, as its name might suggest, this does more than create a used_set, it also merges valid_sets. If all you want to do is create a used_set (or many used_sets) given a run (or runs), a  single jobset and a single process_name then it is recommended that you use:

CalibDB/bin/DBUsedSet_multi.cc

This is the name of the source code file in the cdf run2 software package called CalibDB. In order to use this, you must check out the package and build the binary executable. Do that like this:

source ~cdfsoft/cdf2.cshrc
setup cdfsoft2 development
newrel -t development calibdb_rel
cd calibdb_rel
addpkg -h CalibDB
gmake CalibDB.bin


This makes the binaries for a number of calibration utilities which will then be in the directory:

calibdb_rel/bin/LinuxBlahBlah/

Where LinuxBlahBlah is your "architecture" dir (there should only be one in the bin directory, probably something like: Linux2-GCC_3_3).

You can access immediate help on using DBUsedSet_multi by just executing the binary with no arguements:

./calibdb_rel/bin/LinuxBlahBlah/DBUsedSet_multi

This gives the following help lines:

Usage: ./bin/Linux2-GCC_3_3/DBUsedSet_multi dbtype dbconnect data_run_file pname jobset testmode
Where:
* dbtype is the type of database (example OTL/Msql/Text)
* dbconnect is the connect string (example myname/mypassword@cdfonprd)
* data_run_file is a file with a list of runs to make used_set entries for
* pname is a process_name (example (PROD_PHYSICS_CDF)
* jobset is the jobset number identifying the calibrations to put into the used_set entries
* testmode = TESTMODE sets the script in testmode. So it tells what it will do,but doesn't actually do it. testmode = REAL (or anything else) actually executes.
Rules:
MUST have 6 arguments (which is probably why you're seeing this message

The format of the data_run_file for used sets is:
lowrun1 highrun1
lowrun2 highrun2
etc...

Note1 : No white space except the space between lowrun and highrun
Note2 : data_run_file is assumed to a filename of a file in the current directory
Note3 : In order to get the last line of the file, put a hard return at the end of it


So you have to run DBUsedSet_multi with the 6 arguments specified. Normally, the first argument will be OTL (oracle template library). You will need some sort of calibration write access to create used_sets at all. This information is encoded in the db-connect string which is the second argument. Replace myname with your dbase username, and mypassword with your password. Usually you will be writing usedsets to cdfonprd (the production database). The other databases (int and dev) are not intended for physics quality information.

The script expects to see the data_run_file in the directory where you run it. So if you run using ./bin/Linux2-GCC_3_3/DBUsedSet_multi etc, then you are running from the top level of your local release and the data_run_file should be there. Read the 3 notes and rules at the end of the usage for how to format this file.

The pname and jobset should be coherent. That is to say, there must be a row in the table valid_sets which has this process_name and jobset.

Lastly, you are advised to run the script in test mode first to make sure that you don't do "nasty things" to the dbase which will be a hassle to clean up! Put TESTMODE as the last argument to do this.

Example output


Here is some example output. Running in testmode with the following data_run_file:

155150 155155
166144 166148

With the command line:

./bin/Linux2-GCC_3_3/DBUsedSet_multi OTL username/password@cdfonint runfile PROD_PHYSICS_CDF 32083 TESTMODE

Gives:

Reading runfile:
low_run: 155150 hig_run: 155155
low_run: 166144 hig_run: 166148
List of runs to be added:
155150
155151
155152
155153
155154
155155
166144
166145
166146
166147
166148
jobset: 32083
DBType=OTL
DBConnect=username/password@cdfonint
runfilename= runfile
process_name= PROD_PHYSICS_CDF
jobset= 32083
testmode= 1


TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155150
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155151
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155152
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155153
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155154
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155155
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166144
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166145
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166146
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166147
TESTMODE: would have created used_set from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166148






Then running the real thing with the command line:

./bin/Linux2-GCC_3_3/DBUsedSet_multi OTL username/password@cdfonint runfile PROD_PHYSICS_CDF 32083 REAL

Gives:

Reading runfile:
low_run: 155150 hig_run: 155155
low_run: 166144 hig_run: 166148
List of runs to be added:
155150
155151
155152
155153
155154
155155
166144
166145
166146
166147
166148
jobset: 32083
DBType=OTL
DBConnect=username/password@cdfonint
runfilename= runfile
process_name= PROD_PHYSICS_CDF
jobset= 32083
testmode= 0


USED_SETS already has entry for
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155150
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155150
VERSION = GENERATED
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155150
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155151
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155151
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155152
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155152
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155153
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155153
USED_SETS already has entry for
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155154
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155154
VERSION = GENERATED
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155154
USED_SETS already has entry for
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155155
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 155155
VERSION = GENERATED
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 155155
USED_SETS already has entry for
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166144
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166144
VERSION = GENERATED
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166144
USED_SETS already has entry for
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166145
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166145
VERSION = GENERATED
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166145
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166146
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166146
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166147
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166147
USED_SETS being Updated with values :
PROC_NAME = PROD_PHYSICS_CDF
RUN = 166148
VERSION = 1
used_set created from jobset: 32083 process_name: PROD_PHYSICS_CDF Run: 166148






The warnings are given because there already existed entries for some of these runs for PROD_PHYSICS_CDF in the integration database.

For any further questions please mail me at msmartin@fnal.gov