wiki:HetProcedures/Virus

Virus Trouble Shooting Procedures

Tips and Tricks when trying to bring up the Virus array

PON error while loading microcode

If you fail to load microcode and get a PON cArc error, then a power supply on a controller has probably failed. I saw this when the +16.5 and +36 vdc power supplied on a controller would not come on while trying to load the microcode. You will then have to turn off enclosures and multiplexer one at a time in order to find the bad controller. See Testing Order for a method of finding a bad controller.

Power on/off order for the multiplexers and controllers

I usually power on the multiplexers first then I power on the controller. Since the multiplexers supply the clock and sync signals to the controllers I start the multiplexers first. For the same reason I power down the multiplexers after powering off the controllers. The system is not suppose to care in which order they are started and I have not done extensive testing the see if this is the case. Chalk it up to paranoia.

Ghost controllers

Sometimes you will get an error message in the vdas:/var/log/tcs_logs/virus/virus_server.log file that reads

ERROR [camra_hardware.cpp :update_status : 358] - hardware - Found reply, at global index 22, to RTD from controller ID 2 that was not present at initialization

The global index and command may be different from '22' and 'RTD'.

This problem usually means that a controller has powered off on its own for some reason. We have seen this happen if there is a strong spike on the power lines to the controllers. You can determine which multiplexer this failed unit is attached to by counting the 'alive' multiplexers starting from zero and then seeing which spectrograph is on that multiplexer port. This will be much easier when all spectrographs are installed and connected to the correct multiplexer ports.

Testing for a bad controller

To find a single bad controller (or multiplexer) use the half splitting method to isolate the system down to the controller

  1. Turn on only one side to localize faulty system to one side; if that side works test the other side.
  1. Turn on half the multiplexers and all controllers on the faulty side; if those multiplexers/controllers work, turn on and test the other set of multiplexer/controllers
  1. Turn on half of those multiplexers/controllers; if the system works, turn on and test the other half.
  1. Turn on faulty multiplexers/controllers one at a time to finally isolate to a particular multiplexer.
  1. Turn on only the faulty multiplexers and only one power supply to isolate half the spectrographs.
  1. Once you are down to four spectrographs you will need to have someone go up and unplug two spectrographs for testing until you find the faulty one.
Last modified 5 years ago Last modified on Aug 20, 2019 9:32:54 PM