21 May 2003 ICRB meeting minutes
Meeting's summary by the chair
This meeting saw a smaller audience then past ones. Need to
reflect on format and usefulness of ICRB ?
Are there other topics that the collaboration feels stronger about ?
Here are the minutes, trying to capture only what is not
in the slides. Summaries by the Chair,
which involve some judgement and are thus subjective, are in
italic
- Network to offsite should not be a limiting factor anymore,
FNAL will be able to increase as demand grow. Present link still
is far from saturated.
- SAM will give users also friendlier access to metadata
from offsite, not only to data
- Recent success stories with SAM:
- Ulrich moved 2.5TB to Karlsruhe with a simple script and
quite happily, vs. several weeks of work when he did 350GB "by hand"
with lots of network/server errors, reboots, retries etc.
Ulrich decided to use SAM just to avoid having to go through that
again. !
- Rick can copy 1GB/4min from FNAL to Glasgow
- Stefano imported directely to his desktop a raw data file from enstore
tapes
- CRC check to detect file corruption in transfers and automatically
retry works
- It appeears to the Chair that most of the functionalities
remote data users may desire are now available in SAM, but
usage of this is still difficult
- Rick believes SAM to be more or less in beta test now,
a lot of work is being done to make installation easier and
smoother, e.g. by distributing SAM software along with CDF offline
software. Some sample configurations will be validated and
documented as to how to setup SAM Stations. Enthusiasts can
start with Ulrich's document of Karlsruhe installation
CDF6441
- SAM deployment schedule according to Rick:
- dataset import to remote institutions works now
- MonteCarlo production with output in File Catalog by October
- SAM-aware AC++ on CAF in the fall
- there is ample room for additional help on SAM
- Usage of the cdf_grid@fnal mailing list for SAM communication to users
has been suggested. Since current cdfsam_admin
list is heavily used by developers.
- usage of SAM now requires being more of a
developer then a tester. In any case in the end a SAM station is not
a piece of code, is a cluster of interoperating computers, so
each site will need to be setup a bit differently and operation of
a SAM station will require some intelligence, vision, and system
administration knowledge
- In parallel with SAM, CAF cloning (aka dCaf installation) is
now possible at remote instructions, there is a document available
and a person (at UCSD) who will help. Details can be obtained from Frank
- The CAF installation document also contains indications
on suggested hardware configurations
- The statment is that dCAF installation can be done in one month
at most by a person with minimal Linux system administration
knwoledge, and maintenance is then a very modest load.
- Clearly also CAF installation is a setup of a computer
cluster, even more then SAM, while most of the workload will
presumably go in hardware maintenance and cdf software maintenance,
more then CAF operation, 0.5FTE is estimante by Frank as the
support needed for a well functioning system
- Sites that have a dCAF installation can use present CafGUI
for transparent operation with respect to FNAL-CAF, it is also
possible to setup very-low priority queue for usage by the other
CDF groups when computers are otherwise idle.
- Frank encourages sites to open their site in this way,
starting asap, so that we can gain experience in the political
side of computing sharing (still the most obscure aspect of
the GRID)
- for some sites job submission from the outside will not
be possible until JIM (the SAM/CAF GRID tools) is deployed, so
to get authentication even to place that do not like be part of
FNAL kerberos realm.
- JIM, and thus access to SAM/CAF via certificates instead
of tickets, will be available by end of this year
- It is true on the other hand that e.g. Toronto is
producing millions of MC events a day and copying them to
FNAL for import into DFC/tape without any of the new tools.
Chair's comments
It may pay to slow down SAM/JIM development
a little bit and get out some more documentation to help
others to join and try it out. So far each new SAM-Station
has generated tons of e-mails on the cdfsam mailing list.
While a non-kerberos based authentication may be
essential for places who do not allow their computers to
be put on FNAL realm, we can go a long distance without
automatic resource brokering and deciding by "human intelligence"
where best to submit a job.
While some sites have managed to
tap on "GRID-labelled" hardware resources and are now eager to
open them to CDF, a top-bottom specification of CDF computing
needs that indicates how much each (or some) should
provite in order to allow CDF to do the physics, is still
off the horizon. A voluntary and experimental approach to
resource sharing as indicated by Frank is likely the
best way to approach this.
Disclaimer: I have made a good-faithed effort to report what speakers
said from my notes and memory, but both are failable. Any responsibility
for omissions, error, or misinterpretation is mine and I will
be glad to correct this page as soon as those are pointed out.
Stefano Belforte
Last modified: Mon May 26 17:01:47 CDT 2003