CDF Grid Computing Projects and Documentation

 

The basic goal of the effort summarized here is to greatly increase the amount of effective computing available to the average CDF member and for special projects while maintaining an environment that is easy to use for the average physicist. Where possible, familiar tools and interfaces are used while trying to extend their applicability to a global computing environment.

At present, we have achieved a uniform job submission environment to a set of resources that increases the total cpu available over the central systems by approximately 1.8 THz on a guaranteed basis. Over the next few months to a year, we expect to expand this base and to target the availability of additional resources that do not use the "classic CAF" paradigm by moving into a more general grid environment. We will also switch the base operating systems and software to more modern versions in an effort to stay in synch with (and also lead) other developments in the field. During this process, we expect to maintain our present concentration on ease of use for the physicist.

The number of active remote facilities open to general CDF use has expanded considerably since the inception of this project in January 2004. In addition, other facilities are on the horizon. If you want to join or add resources to this effort, send e-mail to Igor Sfiligoi and/or Ian Fisk, or more generally to CDF-DCAF-Admin at fnal.gov.

The next CDF Grid workshop will be held March 10-11, 2005 in room FCC1 in the Feynman Center at Fermilab. All collaborators interested in grid computing are invited. Our most recent previous workshop was held Sept. 9-10, 2004. The resulting WebTalks pages are available to CDF members for the first, second, and third sessions and the full agenda is also publicly available in plain text form.

 

Current Resources [*]
Cluster Name and Home Page Monitoring and Direct Information Links CPU (GHz) Disk space (TBytes)
Original FNAL CAF queues, user history, analyze, ganglia, sam station, consumption 1000 370
FNAL CondorCAF (Fermilab) queues, user history, analyze, ganglia, sam station, consumption 2200 (shared w/CAF)
CNAFCAF (Bologna, Italy) queues, user history, analyze, resources, network, sam station, datasets, consumption 480 32
KORCAF (KNU, Korea) queues, user history, ganglia, sam station, datasets, consumption 178 5.1
ASCAF (Academia Sinica, Taiwan) queues, user history, ganglia, sam station, datasets, consumption 134 3.0
SDSC CondorCAF (San Diego) queues, user history, analyze, ganglia, sam station, datasets, consumption 380 4.0
HEXCAF (Rutgers) queues, cpu, sam station, datasets, consumption 100 4.0
TORCAF (Toronto CDF) queues, user history, analyze, ganglia, disk status, sam station, datasets, consumption 576 10
JPCAF (Tsukuba, Japan) queues, user history, ganglia, sam station, datasets, consumption 152 10
CANCAF (Cantabria, Spain) queues, user history, ganglia, sam station 50 1.5
MIT (Boston, USA) (MC only) queues, user history, analyze 322 3.2
Current Totals [*]: 5572 448

[*] Total resources at each cluster are only listed for those portions dedicated to CDF on a guaranteed average basis, and open to all CDF users. In many cases, especially for the larger offsite clusters, more capacity is available and can be scheduled for special needs. Contact the site administrator at each site above if you have questions or would like to pursue a special project.

 

DCAF to principal mapping

 

For a list of data sets currently pinned at each of these clusters, see this link.

 

User Instructional Pages and Useful Links: 

Job submission, Data Handling and Use of CDF Grid

User tutorial and brief guide to general use.

CDF CAF user Guide (postscript version).

What time is it? (java applet, self-updating) or normal HTML (static).

CDF FroNtier User's Guide describing the tiered cache server interface for the calibration database (soon to be extended to other portions of the CDF database).

CDF SAM Fast Navigator and HowTo use SAM in an AC++ analysis.

SAM User registration and Use Case document for analysis and MC request generation.

CDF SAM-At-A-Glance (status of SAM stations).

CDF SamTV (data transfer monitoring).

How to store a CDF file in SAM (useful for remote MC and data storage to FNAL).

 

Administrative Installation, Setup and Reference pages: 

CDF Remote Computing Tools Installation and Maintenance

DCAF Installation manual (postscript version): This is the definitive guide to installing a DCAF. Soon to be updated with a CondorCAF manual!

CDF software environment setup and installation for offsite usage.

SAM and data transfer tools installation and use, including gridftp.

More test scripts for remote cluster submission intended for initial use and checkout.

FroNtier wiki and squid cache installation guide for administrators.

The above links are current as of the most recent workshop, Sep. 9-10, 2004 at Fermilab. Previous workshops have been held Apr. 1-2, 2004 at Fermilab and Jan. 20-22, 2004 at the University of Florida. (The original documentation from the Florida workshop has largely been kept up to date rather than replaced, so these links still provide a valuable reference.)

 

Related documents

Related documentation stored in this directory can also be consulted for historical interest and reference.

Monitoring of network bandwidth and SAM data transfer throughput through SamTV is also available. These will likely be incorporated into a larger project through, for example, the MonALISA framework, or help to supplement it in the future.

The CDF SAM web pages and associated Sam At A Glance and administrative back door pages are also useful, along with the cdfkits CDF software distribution site web page and its associated links.

Site administrators can consult the archives of the CDF DCAF admin list on the FNAL Listserv site to search for previously covered installation and administration topics.

 

DCAF Installation SAM Installation CAF User Guide DCAF User Tutorial Workshop Agenda

 


Last edited 24 Sep 2004 AFS