CDF RUN 1 M&C (MONITOR AND CONTROLS) SYSTEM --AKA ACNET/ Alarms and limits

PARTIAL NOTE : UNDER CONSTRUCTION, version 0.3, John Yoh, 2/3/99
(1) INTRODUCTION AND OVERVIEW OF RUN 1 HV/LV/... ONLINE MONITORING
(2) ACNET/HV/LV ONLINE CONTROL AND MONITORING
(3) OTHER ONLINE DETECTOR MONITORING
(4) SOME GENERAL COMMENTS
(5) SUGGESTIONS FOR RUN 2
(A) APPENDIX A : SOME MAJOR FAILURES IN ONLINE MONITORING IN RUN 1

Note : The following preliminary note is based on my memory of what happened
in run 1 (I was 
  ...ACNET/Alarms-and-limit Fermilab liaison (1990-1995) with the Penn group
  ...shift captain for 1992-1994, and then 
  ...operations manager for 1994-1996). 
I will have to check the log book and other records to verify some
of the statements below. If anyone notices an error, please send me email at
johny@fnal.gov. Thanks

(1) INTRODUCTION AND OVERVIEW OF RUN 1 HV/LV/... ONLINE MONITORING

	The ACNET/A&L (Alarms and Limit) system for run 1 provided control 
and monitoring of HV for the detectors, as well as monitoring of LV/temp/....
During the period 1989 to 1996, the system was continuously upgraded and
performed the task reasonably well, through the effort of the CDF Fermilab and
Penn groups as well as the Fermilab support groups.
	We summarize below the system as it evolved during run 1, and discuss
some of the pros and cons of the system, including a post-mortem on some
of the major failures or near-failures, and suggestions for improvement
on the whole issue of quality control for run 2.

(2) ACNET/HV/LV ONLINE CONTROL AND MONITORING

	During the late run 1 running, the online HV/.. control/monitoring
(A&L --Alarms and Limits) scenario was as follows :
  (a) 3 ACNET consoles in the control room, 
	one at the middle of the control room displaying one of the 4 CDF 
	pages (E33, 34, 35, 36 ??), 
	  with an additional monitor just above which displays two graphics 
	  charts
  (a1) one, a bar graph containing the HV info for the (17??) systems,
      with 2 bars per system (Min/Max) for both system, color coded :
	green for 95% to 105 % (??)
	purple for above 105%
	yellow for 25% to 95%
	red for below 25%   ?????
  (a2) the second, a table, contained the alarm status of (20 ??) systems,
      including the above systems as well as LV, GAMMA supply, etc. (total 
      500+ alarms possible) the limits for alarm and severe alarms were 
      adjustable, and were typically +- 5 V for PM's, and perhaps +- ?? for 
      chambers. The A&L system runs off an ACNET program and could be disabled. 
      Alarms were queried roughly every few seconds, and if an alarm existed, a 
      DECTALK message occurred (roughly 10 seconds afterwards ??--this could be
      disabled). It was a real pain to bypass the roughly few % bad channels 
      (changing limits so that these known bad alarms would not trigger the 
      A&L table). Alarm information was stored in disk files, and could be
      recalled.

   In addition, 2 more ACNET consoles resided at the west end of the control
	room, each with an additional monitor above; these were used mainly
	as spares for the control and A&L system, as well as for when shots
	were in, and for luminosity and loss monitoring. Typically, integrated
	luminosity for the store (both delivered, and recorded) were 
	continuously monitored during a run, along with such other items as
	beam rates : LOSTP, LOSTPB, Luminosity, current in the ring, and
	other losses (e.g., muon losses), etc.
   Information on certain items were stored in ACNET disks --such as the
	integrated luminosity every 10 minutes, etc. Other information
	such as losses, etc. could be stored, perhaps even at finer
	time periods --but such storage was limited (???). 
   Another item was the hard copy unit (color plotter--real slow)--though 
	B&W copies can be routed to the control room laser printer (??)

 ** HV CONTROL **
	
	The interaction of the shift people with the HV control and monitoring
system was via the ACNET terminals, with special programs that can be invoked
by clicking on certain BUTTONS, which allows for many of the HV systems :
  ..Turning the HV of the various chambers ON, OFF or STANDBY (some systems)
  ..Adjusting some of the HV of individual units (e.g., PM tubes)--password
	protected. 

	A sheet of instructions was pasted next to the console (as well as
on various instruction books) --which detailed :
  ..Which HV is to be turned ON or OFF, when (e.g., some HV could be raised
	to STANDBY even before the scrapping is finished)
  ..What is the order of the HV turn on to shorten the period of raising HV
	before data-taking (it usually takes about 5+ minutes for most of the
	chambers to be ready for data taking --this is quite costly--since
	5 minutes is about 1% of a 8 hour store, and 2% of the integrated
	luminosity (since initial lum is roughly twice the average lum)
  ..Certain chambers must be turned off (or to standby) before the store is
	dumped. (again a few minutes wastage !!!).
  ..certain devices must not be turned off (e.g., SVX could only be alternated between
	STANDBY and ON by shift people; turning it OFF would change the
	cooling scenario and could damage the device !!!)
  ..PM tubes HV were not touched by shift people --even if a unit caused an
	alarm due to voltage fluctuations. Instead experts were called, and shift
	people only changed it under the supervision of the experts. 

  ** HV MONITORING **

	The HV and other items are monitored by the A&L (Alarms and Limit)
system of ACNET, written by the Penn group and Maki Sekiguchi.

(3) OTHER ONLINE DETECTOR MONITORING

	Other online monitoring includes :
  (3A) ACNET accessible items :
	Accelerator items such as cryogenics, machine performance,
	  
  (3B) non-ACNET items
       GASMON
       data-taking online consumers such as YMON, LUMMON, TRIGMON, PHYSMON, 
	  SPY, 
  (3C) miscellaneous 

(4) SOME GENERAL COMMENTS

(to be completed)

(5) SUGGESTIONS FOR RUN 2

	Monitoring and Controls of HV/LV/misc for run 2 should build on our
experiences in run1, providing at least the functionality available in run
1 and adding other desirable features.

	The current plan to use the PC based FIX-Dynamics commercial package
is a step in the right direction. It should result in a more robust and
more flexible system.

(A) APPENDIX A : SOME MAJOR FAILURES IN ONLINE MONITORING IN RUN 1

	A list of the major online monitoring failures is given below :

(A1) SVX temperature bump of ??/??/95?? One one occasion, the SVX temperature
	rose (20??) degrees due to a failure of the cooling system(??)

(A2) ACNET System crashes
      ACNET system crashes occur rarely, usually in tandem with machine
	breakdowns or power outage. We probably lost no more than a few
	hours of beam time due to ACNET system crashes
      ACNET problems

(A3) Equipment failures
      There were many occasions where equipment failed, such as a console.
	Since there were spares, that usually does not mean downtime.

(A4) Noise problems : During a period early run 1A, there were occasions where
	a erroneous command was received by the program, sometimes setting the
	voltage to a dangerously high level way above normal on-voltage. This
	was determined to be a noise problem; subsequently, all commands were
	sent twice, requiring a confirming 2nd sending before the program would
	accept the command. The problem disappeared. The 1st floor electronics
	room is a very noisy environment; there were also some concern when one
	uses a walkie-talkie (a wired intercom system was available and 
	recommended for usage, rather than a walkie-talkie, unlike in the 
	collision hall, where walkie-talkies were used). 

(A5) Pilot errors : (If my memory is correct) Very early on, there was a period 
	when the PM of the CHA/WHA/CEM was under the control of ACNET system.
	Some unauthorized changes of voltages (??) occurred, which got Paolo 
	really mad--he pulled the system from ACNET. The system was only 
	reinstalled after control was password protected (??).