| 1 | == Virus Trouble Shooting Procedures == |
| 2 | |
| 3 | Tips and Tricks when trying to bring up the Virus array |
| 4 | |
| 5 | === PON error while loading microcode === |
| 6 | |
| 7 | If you fail to load microcode and get a PON cArc error, then a power supply on a controller |
| 8 | has probably failed. I saw this when the +16.5 and +36 vdc power supplied on a controller |
| 9 | would not come on while trying to load the microcode. You will then have to turn off enclosures |
| 10 | and multiplexer one at a time in order to find the bad controller. See [#testOrder Testing Order] |
| 11 | for a method of finding a bad controller. |
| 12 | |
| 13 | === Power on/off order for the multiplexers and controllers === |
| 14 | |
| 15 | I usually power on the multiplexers first then I power on the controller. Since |
| 16 | the multiplexers supply the clock and sync signals to the controllers I start the multiplexers |
| 17 | first. For the same reason I power down the multiplexers after powering off the controllers. |
| 18 | The system is not suppose to care in which order they are started and I have not done |
| 19 | extensive testing the see if this is the case. Chalk it up to paranoia. |
| 20 | |
| 21 | === Ghost controllers === |
| 22 | |
| 23 | Sometimes you will get an error message in the `vdas:/var/log/tcs_logs/virus/virus_server.log` file |
| 24 | that reads |
| 25 | |
| 26 | `ERROR [camra_hardware.cpp :update_status : 358] - hardware - Found reply, at global index 22, to RTD from controller ID 2 that was not present at initialization` |
| 27 | |
| 28 | The global index and command may be different from '22' and 'RTD'. |
| 29 | |
| 30 | This problem usually means that a controller has powered off on its own for some reason. We have |
| 31 | seen this happen if there is a strong spike on the power lines to the controllers. You can determine |
| 32 | which multiplexer this failed unit is attached to by counting the 'alive' multiplexers starting from zero |
| 33 | and then seeing which spectrograph is on that multiplexer port. This will be much easier when all |
| 34 | spectrographs are installed and connected to the correct multiplexer ports. |
| 35 | |
| 36 | === Testing for a bad controller === #testOrder |
| 37 | |
| 38 | To find a single bad controller (or multiplexer) use the half splitting method to |
| 39 | isolate the system down to the controller |
| 40 | |
| 41 | 1. Turn on only one side to localize faulty system to one side; if |
| 42 | that side works test the other side. |
| 43 | |
| 44 | 2. Turn on half the multiplexers and all controllers on the faulty side; if those multiplexers/controllers |
| 45 | work, turn on and test the other set of multiplexer/controllers |
| 46 | |
| 47 | 3. Turn on half of those multiplexers/controllers; if the system works, |
| 48 | turn on and test the other half. |
| 49 | |
| 50 | 4. Turn on faulty multiplexers/controllers one at a time to finally isolate to |
| 51 | a particular multiplexer. |
| 52 | |
| 53 | 5. Turn on only the faulty multiplexers and only one power supply to isolate |
| 54 | half the spectrographs. |
| 55 | |
| 56 | 6. Once you are down to four spectrographs you will need to have someone |
| 57 | go up and unplug two spectrographs for testing until you find the faulty one. |
| 58 | |
| 59 | |
| 60 | |