Changes between Initial Version and Version 1 of HetProcedures/Virus


Ignore:
Timestamp:
Aug 20, 2019 9:32:54 PM (5 years ago)
Author:
admin
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HetProcedures/Virus

    v1 v1  
     1== Virus Trouble Shooting Procedures ==
     2
     3Tips and Tricks when trying to bring up the Virus array
     4
     5=== PON error while loading microcode ===
     6
     7If you fail to load microcode and get a PON cArc error, then a power supply on a controller
     8has probably failed.  I saw this when the +16.5 and +36 vdc power supplied on a controller
     9would not come on while trying to load the microcode. You will then have to turn off enclosures
     10and multiplexer one at a time in order to find the bad controller.  See [#testOrder Testing Order]
     11for a method of finding a bad controller.
     12
     13=== Power on/off order for the multiplexers and controllers ===
     14
     15I usually power on the multiplexers first then I power on the controller.  Since
     16the multiplexers supply the clock and sync signals to the controllers I start the multiplexers
     17first.  For the same reason I power down the multiplexers after powering off the controllers.
     18The system is not suppose to care in which order they are started and I have not done
     19extensive testing the see if this is the case.  Chalk it up to paranoia.
     20
     21=== Ghost controllers ===
     22
     23Sometimes you will get an error message in the `vdas:/var/log/tcs_logs/virus/virus_server.log` file
     24that reads
     25
     26`ERROR [camra_hardware.cpp  :update_status       : 358] - hardware - Found reply, at global index 22, to RTD from controller ID 2 that was not present at initialization`
     27
     28The global index and command may be different from '22' and 'RTD'.
     29
     30This problem usually means that a controller has powered off on its own for some reason. We have
     31seen this happen if there is a strong spike on the power lines to the controllers.  You can determine
     32which multiplexer this failed unit is attached to by counting the 'alive' multiplexers starting from zero
     33and then seeing which spectrograph is on that multiplexer port.  This will be much easier when all
     34spectrographs are installed and connected to the correct multiplexer ports.
     35
     36=== Testing for a bad controller === #testOrder
     37
     38To find a single bad controller (or multiplexer) use the half splitting method to
     39isolate the system down to the controller
     40
     41 1. Turn on only one side to localize faulty system to one side; if
     42    that side works test the other side.
     43
     44 2. Turn on half the multiplexers and all controllers on the faulty side; if those multiplexers/controllers
     45    work, turn on and test the other set of multiplexer/controllers
     46
     47 3. Turn on half of those multiplexers/controllers;  if the system works,
     48    turn on and test the other half.
     49
     50 4. Turn on faulty multiplexers/controllers one at a time to finally isolate to
     51    a particular multiplexer.
     52
     53 5. Turn on only the faulty multiplexers and only one power supply to isolate
     54    half the spectrographs.
     55
     56 6. Once you are down to four spectrographs you will need to have someone
     57    go up and unplug two spectrographs for testing until you find the faulty one.
     58
     59
     60