Minutes of the Event Data Model Working Group Meeting 13 April 1999 Rob Kennedy, for the CDF Run II Event Data Model Working Group Attending: Rob Kennedy, Jim Kowalkowski, Marc Paterno, Philippe Canal, Liz Sexton-Kennedy, Rick Snider, Stephan Lammel By Video: MIT (Betsy Hafen) By Phone: Berkeley (Marge Shapiro) Note: The text of RDK's slides have been folded into these minutes. There is no separate posting of postscript for this week. I) EDM Prototype Status - Rob Kennedy Rob K presented an update on the EDM prototype status. The goal for May 01 is "More realistic EDM infrastructure, uses more realistic objects, higher-level physics objects, collection, and an initial version of ROOT I/O. Level 3 record format defined. Begin integration with Framework/AC++ where possible." Status: EDM prototype has core infrastructure working (Read and Write Handles, EventRecord, RecordIterator, EventHeader (similar content to the LRIH bank), IdManager, ToyMuon). ROOT I/O nominally in StorableObject classes. Similar to, but not identical to, released design notes. Fails to compile due to *apparent* lack of support for namespaces in ROOT Object I/O system. Will work around, do I/O this week. (19-Apr-99: This problem has been confirmed by RDK and Philippe Canal and will be reported to the ROOT team. ROOT Object I/O just does not work right now with user-defined classes embedded in a namespace. Meanwhile the Edm package will have use of the Edm namespace stripped out... which has been confirmed to eliminate the current ROOT Object I/O problems.) Discussion: Marc Paterno asked about the mechanism to prevent mark ReadHandles and WriteHandles as no longer valid once an EventRecord takes over control of an object. ReadHandles should become invalid when an EventRecord is destroyed. WriteHandles should become invalid as soon as an object is added to the EventRecord. The current prototype forces the pointer in a WriteHandle to null during EventRecord::append(). There is no treatment of ReadHandles going invalid at present. Marc pointed that D0 has an implementation of ReadHandles which can perform as desired without adding significant run-time overhead. Rob will look into this after the 01-May milestone is reached. There was mention of the TString class in ROOT not being derived from TObject. Post-meeting, Rob found that this is true, but that there is a TObject-derived wrapper class TObjString which contains an instance of TString. This is similar to our adapter approach, and so has already been considered. II) Root Event Model/ Data Handling Discussion A) Event Data and User Access Patterns There was a suggestion, which may have been mis-interpreted, that users performing analyses should be able to interact with event data files in the PAW-style analysis pattern. In other words, users would be able to interactively query event data files as if they were column-wise n-tuples. We have referred to this as event data browsing, and is considered desirable but not a requirement of the EDM. Rob K's elaborated response: "Using event data files as CWN requires that either all objects be written in split-object mode (each primitive data member is split into a separate branch), or all objects written in whole-object mode must be interpretable by cint. With ~100+ classes coming from Production, the split-object mode approach would gives us events with 1000s of branches. This could be applied to tertiary data sets, however, but the result would end up looking like PAW-style non-object CWNs (containing only primitive data types). The EDM should not be required to restore a CDF Track class instance if only two of its data members are stored in an PAW-style non-object CWN." "The whole-object/branch approach requires the use of cint to interact with the data. We have determined that we cannot rely on cint to interpret the Standard C++ classes which we already have in Offline software. We have as a requirement that we not depend on cint to achieve the Data Handling Decision to implement Event I/O with ROOT." "On the user interface side, the hierarchical ROOT event model we have described so far should be no more difficult to navigate than a n-tuple file containing multiple directories." "We are trying to separate how we treat bulk event data sets from user physics analysis data sets. In the former, we see the EDM and the DH system treating files containing composite objects in a manageable number of branches. In the latter, we see many potential approaches using HepTuple, ROOT Object CWNs, or PAW-style non-object CWN, etc. We are considering event data browsing as a desirable evolutionary path, but it is not a requirement of the EDM. This would still not provide the compact datasets of the PAW-style non-object CWN where only the exact number of primitive data type fields were stored be event." B) General ROOT Event Model There was a detailed exchange of ideas regarding the ROOT Event Model coming from the EDM Working Group (which Rob outlined), and the current Data Handling system design (which Stephan outlined). Stephan pointed out that the out-of-order arrival of files defeats the ROOT synchronization of data coming from different sources or from different stages of processing. There then appeared to be general agreement that the so-called "TTree1" layer of the EDM ROOT Event Model served little purpose. Data from different stages of processing (which TTree1s would co-ordinate) could not generally be served in an efficient manner. The TTree2 layer in the EDM-WG ROOT Event Model, which co-ordinates groups of branches saved together in split-files, still seems to be consistent with the Data Handling system, though not perhaps as originally envisioned by the EDM Working Group. There were many details in this discussion, some of which I (RDK) missed writing down in detail. Whole file-sets must be spooled to avoid delivery-out-of-order problems, which occur at file granularity only. How to best utilize split-file mode... with more revised datasets or fewer datasets spread over different media? So long as there is agreement on this TTree2 layer's utility, perhaps these details can be reviewed at a later date or in a write-up (volunteers?). C) Meta-data size It was generally agreed that the amount and location of ROOT meta-data for split-file mode should be investigated. Rob will pursue this in the EDM prototype when feasible. The "EventHeader" branch which the EDM-WG suggested may be always disk-resident (or not) might consist only of the LRIH_Bank-type information (added per Stephan's request: a 4-byte int for run_section, and 64 bits for a primary_dataset_mask) and any ROOT meta-data for split-file mode. The EventHeader LRIH_Bank-type data is only about 16 words long (8 old words + 9 additional words - 1 old word unused). .the end.