UP arrow Back to CDF Run II Computing Projects

CDF Run II Data-File Catalog

This URL: http://www-cdf.fnal.gov/upgrades/computing/projects/dfcatalog/dfcatalog.html

Introduction:

An overview of the Datafile Catalog is below. Here is a view of some of the contents of the Catalog; the contents are shown in a way which illustrates the use of the web browser.  The Catalog is explained from a more detailed view here; this also is the developer's view. The Project Management page shows people involved and the status of the project.

CDF Notes:

Further documentation of the CDF Data-File Catalog is found in the following CDF Notes: 4761, 5380 and 5983.

Overview:

The CDF Datafile Catalog is a database which keeps track of the datasets in the experiment; datasets are collections of filesets which are collections of files, which are themselves collections of events.  The events of a dataset are selected either by a Level 3 trigger bit, by an offline filter module, or by other means.  The particular collection of objects in an event can be either in raw bank format for raw datasets, in production output for primary datasets, or in user-defined format for secondary, tertiary,... datasets.

The Datafile Catalog keeps track of luminosity for datasets. For this purpose, runs are divided up into runsections (about 30-60 sec of online run time) and the integrated luminosity calculated for runsections. The total for a dataset is then the sum of luminosities for the runsections in that dataset. The list of runsections in each file in the dataset is kept with the file record in the Catalog so that luminosity can be obtained for the whole dataset.

Data Quality bits are defined in the Catalog for each runsection, so that cuts on these bits can be applied by an AC++ input module. Thus, the events from a particular runsection can be excluded from analysis.

Data is stored on tape by fileset; a list of tapes  is maintained in the Catalog.  As explained later, user jobs always access data from disk, so that the detail of which tape to access is not needed by a user.  The Catalog will also keep lists of user tapes (all tapes stored in the tape robot will have a record in the Catalog).

Mulitple Catalogs, called "books", will exist to allow individual users and physics groups to create datasets.  These books will be separate from the main Catalog but in the same Oracle database server. The main Catalog Book thus will contain raw datasets and the datasets created by production processing.   The user books will contain seconday, tertiary, etc datasets. The Catalog meta-data can be copied  between books, and also between offsite institutions and Fermilab.

Input and Output modules for AC++ have been written by the DH group; they exist in CDF CVS package DHMODS with names DHInput and DHOutput.  A user guide exists. These modules allow users to specify input by dataset, as well as by file. For the creation of a new dataset,  DHOutput writes files which are collected into filesets,  put on tape, and entered in the user's Catalog by DH software and daemons.  DHOutput will also write simple files on a disk area specified in the talkto.  Input and Output from AC++ modules is always to disk; transfer of data between tape and disk is done by separate Data Handling jobs or daemons.

Online datasets are defined as a stream of triggers in Level 1, Level 2, and Level 3, with a Level 3 bit pattern used for selection.  These definitions are maintained in the CDF trigger table database., not in the Datafile Catalog.  Reconstruction processing in the offline Production Farm produces primary datasets from the online datasets and the primary datasets will be remade whenever a new reconstruction procedure is adopted.  Thus offline datasets are characterized both by the online dataset specification and by the version of the reconstruction procedure.

The Catalog does not store calibration constants; these are stored in the CDF calibration database. Nor does this catalog store lists of valid calibration constants, these are stored in a validity table in the calibration database.

A prototype version of the Datafile Catalog is described here.

The CDF Data-File Catalog performed well during the Commissioning Run in 2000, a status report is given here.

For an overview of the Catalog and its place in the DH project, see
http://www-cdf.fnal.gov/cdfnotes/cdf5310_chep2000_dh_overview.ps   (2695 kb)
Abstract:   /cdf/pub/cdf5310_chep2000_dh_overview.txt   (<1 kb)

This web page maintained by Terry Watts