UP arrow Back to CDF Run II Computing Projects

Datafile Catalog - Contents and Browser

This URL: http://www-cdf.fnal.gov/upgrades/computing/projects/dfcatalog/browse_content.html

The main tables in a Datafile Catalog are (a table is a group of records):

Dataset/Stream Records ( browse dataset table or compose a dataset query):

A dataset or a stream is a collection of filesets. There may be about 1000 datasets. There may be 5-20,000 filesets in a dataset (raw streams are larger). A job will usually specify a dataset (A job will not normally specify a list of files but can). The job will access the Catalog just once, near the start of the job. The Catalog table name is CDF2_DATASETS.

Fileset Records ( browse some recent records or compose a fileset query):

A fileset is a group of files stored in the same partition on the same physical tape. Typically, there are 10 files in a fileset. There may be about 200,000 filesets. A fileset belongs to only one dataset. The Catalog table name is CDF2_FILESETS.

File Records ( browse some recent records or compose a file query):

Files are planned to be about 1 Gbyte in size. There are expected to be about 1 million file records in Catalogs. A file belongs to only one fileset.  The events of any runsection in a file of a given dataset are contained totally in that file. The Catalog table name is CDF2_FILES.

Run Section Records ( compose a run section query):

Run Section records are produced at the rate of about 1 run section per each 500 Mbytes of raw data flowing from the online system during data taking. The current plan is to break run sections every 30 seconds, but the break may also occur at specific Level 2 event numbers. The run sections apply to all datasets. Run sections will be numbered sequentially from 1 to about 2,000,000. The Catalog table name is CDF2_RUNSECTIONS.

General Browser for the Catalog:

The general Catalog browser is here. This setting of the browser allows a greater variety of enquiries than those in the links above. Details of the browser can be found in CDF Note 5591.

The complete set of tables in the Datafile Catalog is (click names for a listing of fields):

(Note: these listings are most useful for experts)

 
CDF2_DATASETS_REGISTRY Register dataset id and other info (like registering cdf note number)
CDF2_DATASETS Event counts, file counts, luminosity,  description, etc
   CDF2_PARENT_DATASETS Input dataset(s) used to create given dataset
   CDF2_DATASET_STATUSES Text string for status numbers
   CDF2_PROD_VERSION_DESCS Text string for prod version nuumbers
CDF2_FILESETS File count, dataset id, etc
CDF2_FILES Event, run info, luminosity
   CDF2_RUNSECTION_RANGES Ranges of runsection in given file
   CDF2_FILE_LIVETIMES Average prescales for triggers with dynamic prescaling
CDF2_RUNSECTIONS Time stamp, event and run info, data quality, luminosity (online, offline)
   CDF2_RUNSECTION_LIVETIMES Dynamic prescale values
   CDF2_DATA_QUALITY_DESCS Text string describing each data quality bit
CDF2_TAPES Tapes with contents in tape robot
   CDF2_TAPE_STATUSES Text string for tape status numbers

Catalog Books

The Datafile Catalog is partitioned into "Books". The main "Production" book is where the raw and primary datasets are recorded during online data logging and Production Farm processing. Individual user books and physics group books can also exist (ask for one when you apply for an account on the CDF Central Analysis Cluster). All books can be read by anyone, but write access is limited to the owner, i.e. to the individual for user books, to physics group designees for group books, and to operators of the data logging and the Production Farm for the Production Book.

User books are intended for the creation of secondary/tertiary/... datasets. The user should register a new dataset (on a web registry page which behaves like the cdf note number registry page) and this creates a record or row in table JOSEPH.CDF2_DATASETS (for user with unix login account "joseph").  When joseph runs his AC++ job and writes files with the DHOutput module, file records will be created in table JOSEPH.CDF2_FILES. Subsequently, Data Handling jobs will put the files into filesets and onto tape; these actions create rows in table JOSEPH.CDF2_FILESETS.

Monte Carlo jobs which generate simulated data can also create datasets in user books.

Group books should not be used for creating datasets because there is not an easy way to prevent users erasing each other's entries in a group book. A method of moving the book entries for a completed dataset from a user book to a group book will exist. It is probable that the creation of a dataset will be complicated and will have false starts so that isolation of the catalog book until the dataset is final will make the creation process more easily managed.

The browser can show all books that are being used by querying for all *.CDF2_DATASETS tables (the "*" can be used as a wild card in the browser and here finds any book or owner). Note that the Production Book is the set FILECATALOG.* of tables.

Some tables exist only in the Production Book - CDF2_TAPES, CDF2_TAPEPOOLS, etc. The tables connected to runsections written for detector data also exist only in the Production Book; runsections written for MC simulated data can exist in other books.

Generalized Brower for All Offline Database Tables:

The browser links above are specialized queries derived from a general browser for all offline tables, including calibration tables, etc.

Queries that use SQLPLUS

Queries can be given directly to the Offline Oracle Database in SQL form. Here is an introductory page on SQL, or try one of the many books such as "SQL for Dummies" to learn some structured query language.

An Oracle client is setup as part of the CDF Run II Offline setup (offsite and onsite,even abroad). Here is a session on fcdfsgi2.fnal.gov which enquired the list of fields or columns in the table CDF2_FILESETS:


fcdfsgi2> sqlplus cdf_reader/reader@cdfofprd

SQL*Plus: Release 8.0.5.0.0 - Production on Thu Jul 27 15:18:16 2000

(c) Copyright 1998 Oracle Corporation.  All rights reserved.
 

Connected to:
Oracle8i Enterprise Edition Release 8.1.6.1.0 - Production
With the Partitioning option
JServer Release 8.1.6.1.0 - Production

SQL> describe cdf2_filesets
 Name                            Null?    Type
 ------------------------------- -------- ----
 FILESET_NAME                    NOT NULL VARCHAR2(12)
 CREATE_TIME                     NOT NULL NUMBER(38)
 TAPE_LABEL                      NOT NULL VARCHAR2(6)
 DS_NAME_ID                      NOT NULL VARCHAR2(6)
 TAPE_PARTITION                           NUMBER(38)
 FILE_COUNT                               NUMBER(38)

SQL> exit
Disconnected from Oracle8i Enterprise Edition Release 8.1.6.1.0 - Production
With the Partitioning option
JServer Release 8.1.6.1.0 - Production



PLEASE DO NOT REMAIN AT THE "SQL>" PROMPT FOR MORE THAN THE MINIMUM TIME SINCE WE SAVE MONEY IF WE KEEP DOWN THE NUMBER OF SIMULTANEOUS CONNECTS TO THE ORACLE SERVER.