Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of HetProcedures/Virus

Timestamp:: Aug 20, 2019 9:32:54 PM (5 years ago)
Author:: admin
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

HetProcedures/Virus

                       v1
+== Virus Trouble Shooting Procedures ==
+Tips and Tricks when trying to bring up the Virus array
+=== PON error while loading microcode ===
+If you fail to load microcode and get a PON cArc error, then a power supply on a controller
+has probably failed.  I saw this when the +16.5 and +36 vdc power supplied on a controller
+would not come on while trying to load the microcode. You will then have to turn off enclosures
+and multiplexer one at a time in order to find the bad controller.  See [#testOrder Testing Order]
+for a method of finding a bad controller.
+=== Power on/off order for the multiplexers and controllers ===
+I usually power on the multiplexers first then I power on the controller.  Since
+the multiplexers supply the clock and sync signals to the controllers I start the multiplexers
+first.  For the same reason I power down the multiplexers after powering off the controllers.
+The system is not suppose to care in which order they are started and I have not done
+extensive testing the see if this is the case.  Chalk it up to paranoia.
+=== Ghost controllers ===
+Sometimes you will get an error message in the `vdas:/var/log/tcs_logs/virus/virus_server.log` file
+that reads
+`ERROR [camra_hardware.cpp  :update_status       : 358] - hardware - Found reply, at global index 22, to RTD from controller ID 2 that was not present at initialization`
+The global index and command may be different from '22' and 'RTD'.
+This problem usually means that a controller has powered off on its own for some reason. We have
+seen this happen if there is a strong spike on the power lines to the controllers.  You can determine
+which multiplexer this failed unit is attached to by counting the 'alive' multiplexers starting from zero
+and then seeing which spectrograph is on that multiplexer port.  This will be much easier when all
+spectrographs are installed and connected to the correct multiplexer ports.
+=== Testing for a bad controller === #testOrder
+To find a single bad controller (or multiplexer) use the half splitting method to
+isolate the system down to the controller
+. Turn on only one side to localize faulty system to one side; if
+    that side works test the other side.
+. Turn on half the multiplexers and all controllers on the faulty side; if those multiplexers/controllers
+    work, turn on and test the other set of multiplexer/controllers
+. Turn on half of those multiplexers/controllers;  if the system works,
+    turn on and test the other half.
+. Turn on faulty multiplexers/controllers one at a time to finally isolate to
+    a particular multiplexer.
+. Turn on only the faulty multiplexers and only one power supply to isolate
+    half the spectrographs.
+. Once you are down to four spectrographs you will need to have someone
+    go up and unplug two spectrographs for testing until you find the faulty one.