From pyeh@phys.sinica.edu.tw Mon Mar  8 18:36:03 1999
Date: Thu, 03 Dec 1998 13:26:22 +0800 (CST)
From: Ping Yeh / Academia Sinica <pyeh@phys.sinica.edu.tw>
To: Yen-Chu Chen <chenyc@fnal.gov>, Antonio Wang Chan <tony@fnal.gov>
Cc: run2farms@fnal.gov
Subject: about Farm Batch System


Hi Yen-Chu and Tony,

    I have a question about FBS:  What is the main reason of using
a batch system for production?

    I understand that in a multi-user environment a queueing/batch
system is necessary for submitting jobs to remote nodes and keep
load balance.  But in a well-controlled farm system basically it is
not a multi-user environment.  Only the production accounts are using
the system.  Load balance is put into the design of production
control/monitor softwares, and it should not rely on a batch system
to keep the load balance.

    As for submitting jobs to worker nodes, commands like rsh
should suffice.

    As for the dependencies provided by FBS, it is nice but may not
be enough.  For example, if a tape contains 30 files and we only have
20 worker nodes, do you want to wait for the end of tape-copying of
30 files to start your reconstruction job?  No, you want to start
reconstruction job as soon as there are 20 files on disk.  That means
you need to monitor the raw files on disk and determine when to to start
reconstruction jobs.  Dependencies doesn't help in this case.


    I think I must have missed some important points.  Please comment.

							Thanks,
							Ping

From chenyc@fnal.gov Mon Mar  8 18:36:21 1999
Date: Thu, 3 Dec 1998 00:12:58 -0600 (CST)
From: Yenchu Chen <chenyc@fnal.gov>
To: Ping Yeh / Academia Sinica <pyeh@phys.sinica.edu.tw>
Cc: Antonio Wang Chan <tony@fnal.gov>, run2farms@fnal.gov
Subject: Re: about Farm Batch System

Hi Yeh Ping,

>     I understand that in a multi-user environment a queueing/batch
> system is necessary for submitting jobs to remote nodes and keep
> load balance.  But in a well-controlled farm system basically it is
> not a multi-user environment.  Only the production accounts are using
> the system.  Load balance is put into the design of production
> control/monitor softwares, and it should not rely on a batch system
> to keep the load balance.

1. I would prefer that the batch system takes care of the load balance.
   Our duty is to submit jobs based on our need.

2. The farm batch system is not necessary just for farming, it can also
   be used in analysis computing where we can have many users to submit
   jobs.

>     As for the dependencies provided by FBS, it is nice but may not
> be enough.  For example, if a tape contains 30 files and we only have
> 20 worker nodes, do you want to wait for the end of tape-copying of
> 30 files to start your reconstruction job?  No, you want to start
> reconstruction job as soon as there are 20 files on disk.  That means
> you need to monitor the raw files on disk and determine when to to start
> reconstruction jobs.  Dependencies doesn't help in this case.

1. Similar question was raised at one meeting I attended. My recollection
   of the answer is that in case one has 30 distributed files to analyze
   while there are only 20 processes allowed at that moment, 20 files
   will be analyzed right away. The other 10 files will wait until there
   are processes available.

2. The idea of distributing data on tape into smaller files on disks and
   analyze them there is that while the worker nodes are busy analyzing
   data the next job in queue can start to get data into disks. In that
   way we keep all worker nodes busy all the time idealy.


   I believe that one can do the same thing by running script files but
I would think that it should be easier to use farm batch system. We need
to write our own script file to produce JDF file and submit the job. We
want to have it running sort of automatically and kind of constantly 
while it should be also flexible enough to allow people to redistribute
computing power to certain streams as it is needed. I would concentrate
our effort to this controlling script and leave the load balance to
FBS.


   Best regards,    Yen-Chu Chen
                    chenyc@fnal.gov
                    (630) 840-8871 (experiment)
                    (886)-(2) 2789-9681 (Inst. of Phys., Academia Sinica)



From GPYEH@fnald Mon Mar  8 18:36:49 1999
Date: Wed, 02 Dec 1998 23:37:04 -0600
From: GPYEH@fnald
To: chenyc, tony@fibi02
Subject: Re: about Farm Batch System


   Dear Yeh Ping,

   It may be that we let  CDF groups  submit their big jobs to the
   PC Farms.

   So,  we may have more than a few users.

   Also,  batch system should be more flexible than submitting 
   sequential jobs ...

   Thanks.

                                               Cheers,
                                                 gp

From GPYEH@fnald Mon Mar  8 18:36:58 1999
Date: Wed, 02 Dec 1998 23:38:31 -0600
From: GPYEH@fnald
To: chenyc, tony@fibi02
Subject: Re: about Farm Batch System


   by  "CDF groups"

   I meant    Physics groups
              (for example)

From chenyc@fnal.gov Mon Mar  8 18:38:17 1999
Date: Thu, 3 Dec 1998 10:16:15 -0600 (CST)
From: Yenchu Chen <chenyc@fnal.gov>
To: Igor Mandrichenko <ivm@hppc>
Cc: Ping Yeh / Academia Sinica <pyeh@phys.sinica.edu.tw>,
     Antonio Wang Chan <tony@fnal.gov>, run2farms@fnal.gov
Subject: Re: about Farm Batch System

Hi Igor,

> Your dumping job (not section!) dumps a file after a file to the disk.
> After each file is ready, it submits single-process processing job for the
> file.
> This way
> 
> 	- your workers will start immediately (providing there are free CPU
> 	nodes, of course) when they have some work to do;
> 	- your workers will not waste time waiting for the data;
> 	- eventually, they process all 30 files, no matter how many
> 	worker nodes we have, even if this number is not constant over time.

   I was thinking about this last night. It should be a good thing if we
can try it out. 

   Best regards,    Yen-Chu Chen
                    chenyc@fnal.gov
                    (630) 840-8871 (experiment)
                    (886)-(2) 2789-9681 (Inst. of Phys., Academia Sinica)



From pyeh@phys.sinica.edu.tw Mon Mar  8 18:38:45 1999
Date: Thu, 03 Dec 1998 18:12:15 +0800 (CST)
From: Ping Yeh / Academia Sinica <pyeh@phys.sinica.edu.tw>
To: Yenchu Chen <chenyc@fnal.gov>
Cc: Antonio Wang Chan <tony@fnal.gov>, run2farms@fnal.gov
Subject: Re: about Farm Batch System


Hello people on run2farms mailing list:

   I'm sorry for the lengthy mail....  And I'm not trying to offend
anyone.  I'm just attempting to see the merits of having FBS by
arguing on the other side...  8)

							cheers,
							Ping

=================================================================


Hi Yen-Chu,

> 1. I would prefer that the batch system takes care of the load balance.
>    Our duty is to submit jobs based on our need.

   There should be no dynamic load balance in farm system.  You know
how many worker nodes you have in advance.  You know how many worker
nodes you should use for a given set of raw data files in advance.
You don't have to determine that on a job-by-job basis.  It is a constant
when there is no change in hardware.

   Your control script will just submit a job consisting a number of
tasks for worker nodes to do.  You may use FBS to do the dispatching,
or you may use a hand-made script to do the dispatching.  What's the
difference?  Why is FBS superior in this case?



> 
> 2. The farm batch system is not necessary just for farming, it can also
>    be used in analysis computing where we can have many users to submit
>    jobs.

   I'm only talking about batch system on Run 2 production farm.
Are you saying you want to allow many users to submit analysis jobs
on production farm?  I'm afraid we won't have that luxury.


> 1. Similar question was raised at one meeting I attended. My recollection
>    of the answer is that in case one has 30 distributed files to analyze
>    while there are only 20 processes allowed at that moment, 20 files
>    will be analyzed right away. The other 10 files will wait until there
>    are processes available.
> 
> 2. The idea of distributing data on tape into smaller files on disks and
>    analyze them there is that while the worker nodes are busy analyzing
>    data the next job in queue can start to get data into disks. In that
>    way we keep all worker nodes busy all the time idealy.

   So you agree that dependencies does NOT help you.  Your
data-reconstruction job do not depend on the finishing of
any tape-copying jobs.  It can start as long as 
	1. there are enough raw data on disk and
	2. there are enough processors to reconstruct them.
Whether or not both conditions are true can't be determined by
FBS alone.  You need to check them in your own control/monitor
script.  If you dispatch tasks to worker nodes by your own dispatch
script, you can know if condition 2 is met.  FBS is also one
alternative, but may not be absoluted necessary in this sense.


> 
>    I believe that one can do the same thing by running script files but
> I would think that it should be easier to use farm batch system. We need
> to write our own script file to produce JDF file and submit the job. We
> want to have it running sort of automatically and kind of constantly 
> while it should be also flexible enough to allow people to redistribute
> computing power to certain streams as it is needed. I would concentrate
> our effort to this controlling script and leave the load balance to
> FBS.
> 

    All of these (automatic, constant, redistribute) were accomplished
in Run 1 without FBS.  we had all the flexibilities in reconfiguring the
farm by ourselves.  For example, whenever I felt one worker node was not
stable, I could remove it from my worker node list and keep the jobs going,
even if I found it in the middle of the Friday night.  I would call/write
FCC experts to investigate it a little bit later.

    With FBS, the queues are defined by FCC experts.  Without redefining
the queues you will always have failures from the problematic node.

    This is one example of restrictions imposed by a batch system.

    I'm not saying that FBS is not useful for farm production.  We
eventually have to sit down and think hard about the detailed design
of the farm data flow, error prevention, error handling, ... etc.
All these will be affected on how the job submission/monitor/report
system works.  Then we will have a better idea if FBS suits all our
needs, and if not, how it should be improved.

							Regards,
							Ping

From ivm@hppc Mon Mar  8 18:39:02 1999
Date: Thu, 03 Dec 1998 09:13:45 -0600
From: Igor Mandrichenko <ivm@hppc>
To: Ping Yeh / Academia Sinica <pyeh@phys.sinica.edu.tw>,
     Yen-Chu Chen <chenyc@fnal.gov>, Antonio Wang Chan <tony@fnal.gov>
Cc: run2farms@fnal.gov
Subject: Re: about Farm Batch System

On Dec 3,  1:26pm, Ping Yeh / Academia Sinica wrote:
> Subject: about Farm Batch System
>
> Hi Yen-Chu and Tony,
>
>     I have a question about FBS:  What is the main reason of using
> a batch system for production?
>
>     I understand that in a multi-user environment a queueing/batch
> system is necessary for submitting jobs to remote nodes and keep
> load balance.  But in a well-controlled farm system basically it is
> not a multi-user environment.  Only the production accounts are using
> the system.  Load balance is put into the design of production
> control/monitor softwares, and it should not rely on a batch system
> to keep the load balance.

There is another solution: Load balance is responsibility of the batch
system, and users can spend more time on application programming
rather than load balancing. This is the approach we took in the design
of FBS.

>
>     As for submitting jobs to worker nodes, commands like rsh
> should suffice.

Yes, if you know where to start your process. But when you have 300
4-CPU nodes to choose from, this may not be an easy task...

>     As for the dependencies provided by FBS, it is nice but may not
> be enough.  For example, if a tape contains 30 files and we only have
> 20 worker nodes, do you want to wait for the end of tape-copying of
> 30 files to start your reconstruction job?  No, you want to start
> reconstruction job as soon as there are 20 files on disk.  That means
> you need to monitor the raw files on disk and determine when to to start
> reconstruction jobs.  Dependencies doesn't help in this case.

Well, I can give you a lot more examples where dependencies do not help.
It does not mean that there are no cases when they do help.

In the example above, you probably would want to start your first worker
immediately after first file is ready, second worker - when second file
is ready, and so on... Why wait for all 20 files to be there ?

I understand that dependency feature as it is implemented in FBS is not
the best solution for this problem. But there is another solution:

Your dumping job (not section!) dumps a file after a file to the disk.
After each file is ready, it submits single-process processing job for the
file.
This way

	- your workers will start immediately (providing there are free CPU
	nodes, of course) when they have some work to do;
	- your workers will not waste time waiting for the data;
	- eventually, they process all 30 files, no matter how many
	worker nodes we have, even if this number is not constant over time.

There are a lot of ways to use FBS (one of them is not to use it at all,
by the way), as any other batch system. Nobody suggests or advocates for any
particular way of using it. This is completely up to you to decide.

On the other hand, if you see how we can improve one or another feature
of FBS, or you have an idea of some new feature which you believe would
make your life easier, please let us know. The whole purpose of setting
this prototype farm up is to let you try and come up with your suggestions.

Thanks for your comments.

Igor


-- 
Igor Mandrichenko
Computing Division
Fermi National Accelerator Laboratory
E-mail: ivm@fnal.gov

From chenyc@fnal.gov Mon Mar  8 18:39:11 1999
Date: Thu, 03 Dec 1998 10:16:15 -0600 (CST)
From: Yenchu Chen <chenyc@fnal.gov>
To: Igor Mandrichenko <ivm@hppc>
Cc: Ping Yeh / Academia Sinica <pyeh@phys.sinica.edu.tw>,
     Antonio Wang Chan <tony@fnal.gov>, run2farms@fnal.gov
Subject: Re: about Farm Batch System

Hi Igor,

> Your dumping job (not section!) dumps a file after a file to the disk.
> After each file is ready, it submits single-process processing job for the
> file.
> This way
> 
> 	- your workers will start immediately (providing there are free CPU
> 	nodes, of course) when they have some work to do;
> 	- your workers will not waste time waiting for the data;
> 	- eventually, they process all 30 files, no matter how many
> 	worker nodes we have, even if this number is not constant over time.

   I was thinking about this last night. It should be a good thing if we
can try it out. 

   Best regards,    Yen-Chu Chen
                    chenyc@fnal.gov
                    (630) 840-8871 (experiment)
                    (886)-(2) 2789-9681 (Inst. of Phys., Academia Sinica)


From ivm@hppc Mon Mar  8 18:39:28 1999
Date: Thu, 03 Dec 1998 10:39:50 -0600
From: Igor Mandrichenko <ivm@hppc>
To: Yenchu Chen <chenyc@fnal.gov>
Subject: Re: about Farm Batch System

On Dec 3, 10:16am, Yenchu Chen wrote:
> Subject: Re: about Farm Batch System
> Hi Igor,
>
> > Your dumping job (not section!) dumps a file after a file to the disk.
> > After each file is ready, it submits single-process processing job for the
> > file.
> > This way
> >
> > 	- your workers will start immediately (providing there are free CPU
> > 	nodes, of course) when they have some work to do;
> > 	- your workers will not waste time waiting for the data;
> > 	- eventually, they process all 30 files, no matter how many
> > 	worker nodes we have, even if this number is not constant over time.
>
>    I was thinking about this last night. It should be a good thing if we
> can try it out.
>

Yen-Chu,

Let us know if you need any help from us in your trying.

Igor


-- 
Igor Mandrichenko
Computing Division
Fermi National Accelerator Laboratory
E-mail: ivm@fnal.gov

From wolbers@fnal.gov Mon Mar  8 18:41:09 1999
Date: Sat, 05 Dec 1998 15:17:47 -0600
From: S. Wolbers <wolbers@fnal.gov>
To: chenyc@fnal.gov, tony@fnal.gov, pyeh@fnal.gov
Subject: Batch on the farms

Ping,

    I tend to think that using a Fermilab-supported batch system for Run
II farming is an excellent idea.  I have not yet read through all of the
discussion that went on last week but some of my reasons are the
following:

    1. Something has to schedule and queue work on the farms.  A batch
system is a nice way (certainly a great deal nicer than some other
techniques) to do this.  The batch system gives an "environment" in
which to schedule work on the farms and to move that work around.  It
also allows a group of people to work together by use of the same tools. 

    2. Using a Fermilab supported batch system allows the CDF
collaboration to worry more about CDF-specific aspects of production.  

    3. The Fermilab system can easily be modified to accomodate CDF (and
D0's) needs.  All that is required is to make requests and work with the
farms groups to get changes made.

    4. We should try FBS to see how well it works.  It certainly is not
a bad thing to give this a try, see what is needed, get modifications
made, and then make a decision as to how we use it.

    5. Using a batch system allows for much better monitoring of what is
going on.  It allows ease in scheduling as well (cancelling jobs,
starting jobs at certain times, etc.).  Monitoring is very important
with a system as large as the one we are going to use.  

Steve

From pyeh@phys.sinica.edu.tw Mon Mar  8 18:44:24 1999
Date: Tue, 08 Dec 1998 11:12:08 +0800 (CST)
From: Ping Yeh / Academia Sinica <pyeh@phys.sinica.edu.tw>
To: Steve Wolbers <wolbers@fnal.gov>
Cc: chenyc@fnal.gov, tony@fnal.gov, pyeh@fnal.gov
Subject: Re: Batch on the farms


Hi Steve,

    You made several excellent points on using a batch system.
I agree that using a batch system like FBS/LSF makes life easier
on most of the task, especially submitting and monitoring.

    As I said in my previous mail, Paoti and I are interested in
studying FBS/LSF from the point of view of error prevention, reporting,
and handling.  I think we'll find out if FBS can do what we want.

							cheers,
							Ping

