wiki:HetProcedures/RA/overview/problems

Details on Fixing problems

Trouble restarting the Camra servers

  • Symptom:
Traceback (most recent call last):
  File "/opt/het/hetdex/bin/syscmd", line 173, in <module>
    result = eval( 'cli.' + command )
  File "<string>", line 1, in <module>
  File "<string>", line 3, in get_hardware_status_method
pytcs.error: remote exception: Failed to update hardware state. 
  • Solution:
    • First try to reset the camera servers in the usual way
    • If the camra server will not come up try the following:
The hardware was in a faulted state and refusing to respond to any commands.  Here is the error I see in /var/log/vdas.log from Steve's attempt to restart vdas:

  what():  ( CArcMuxPCIe::Command() ): ( CArcMuxPCIe::ReadReply() ): [ Mux Id: 0x0 ] Time Out [ 31 sec ] while reading reply registers [ status: 0x80 ]!
Exception Details: 0x202 GID 0xffffffff 0xffffffff 0xffffffff 0xffffffff

This is indicative of a problem with ARC hardware and/or software and the only recourse I have is to power cycle and restart things (starting with the readout controllers).  
The controllers for both LRS2 and VIRUS are on the same power supplies so we have to stop both ldas and vdas any time we have to power cycle the controllers associated with either of them.  

# logged in to lrs2
sudo /etc/init.d/ldas stop
# logged in to vdas
$HET_SRC_ROOT/camra/testing/cycle-readout-controller-power.sh
sudo /etc/init.d/vdas restart
# logged back in to lrs2
sudo /etc/init.d/ldas restart

Recall, we can monitor the log output from either of these by

 tail -f /var/log/vdas.log

and/or

tail -f /var/log/ldas.log

Data is not transferring or problems with pivot

If the files have already been sync'd then running the $HET_SRC_ROOT/camra/testing/proxy-pivot-<instrument>-data.sh script should have an empty output, aside from the motd banner when the copy to TACC happens. 
If there are any files under /mnt/camra_ramfs (on either lrs2 or vdas), and this script has empty output, then the files have been copied.  
Just for safety's sake, one should test to see the the local file system and the HET RAID array have been sync'd.

The pivot is restarted by 'sudo /etc/init.d/vdas_pivot restart'.  You can watch the pivot console by 'tail -f /var/log/vdas_pivot.log'.

Here is how to check the file synchronization, once you are happy that the files under /mnt/camra_ramdisk have been duplicated you can rm * in that directory:

sudo su - hetdex

$HET_SRC_ROOT/camra/testing/proxy-pivot-virus-data.sh 


Last modified 8 years ago Last modified on May 2, 2016 7:49:17 AM