wiki:HetProcedures/VirtualHost

Virtual Host and/or Network Switch outage recovery procedure

Shutdown before outage

For a planned outage do the following prior to shutting down the virtual hosts and network switches;

  • on utility-apps.het.astronomy.utexas.edu
    1. shut down the apc server in order to cleanly close the network connections to the TDK power supplies
      1. sudo service apcServer stop
  • on htcs.het.astronomy.utexas.edu
    1. shut down all the servers to cleanly close the network connections
      1. sudo service pfipServer stop
      2. sudo service pasServer stop
      3. sudo service legacyServer stop
      4. sudo service trackerServer stop
      5. sudo service tcsServer stop
    2. Kill the caRepeater process
  • on lrs2
    1. Disable vacuum gauges and ion pumps in lrsServer
      1. syscmd -l "disable_pressure_gauges()"

Restart and recovery

Once the virtual hosts and network switches are operating again the following steps are needed for recovery of operations.

  • on utility-apps.het.astronomy.utexas.edu
    1. shutdown the running tcs services
      1. sudo service tcsNamed stop
      2. sudo service logRelay stop
      3. sudo service apcServer stop
    2. restart the tcs services
      1. sudo service tcsNamed start
      2. sudo service logRelay start
      3. sudo service apcServer start
    3. verify that the services are running
      1. chksys, this should return a response like,
         hetdex   19373     1  0 102152 24116  1 Mar06 ?        00:00:45 /opt/het/hetdex/bin/tcsnamed --named-route tcp://192.168.66.99:30000 -c /opt/het/hetdex/etc/configfiles/tcsnamed.conf
         hetdex   19508     1  0 156388 95896  1 Mar06 ?        00:01:17 /opt/het/hetdex/bin/tcs_log_relay --named-route tcp://192.168.66.99:30000
         hetdex   11093     1  0 335329 14600  1 Mar06 ?        00:05:04 /opt/het/hetdex/bin/apcServer --named-route tcp://192.168.66.99:30000 -c /opt/het/hetdex/etc/configfiles/apcServer.conf
        
  • start a weather gui and verify that all readings are present. It may take 5 minutes for dust readings to occur. If there are missing weather readings,
    1. try restarting the weather system from the tolauncher
    2. check to weather process logs in /data1/archive/weather/logs/
    3. login to hetwx.het.astronomy.utexas.edu
      1. su - wx
      2. trouble shoot the programs in /home/hetwx/wx/weather_sys
      If truss temperatures are missing,
    4. use rdesktop to connect to trusst.het.astronomy.utexas.edu, login as guider
    5. restart the Truss Temp Reader and Server - v3.3 via the shortcut on the desktop
  • on ute2.het.astronomy.utexas.edu
    1. verify that event monitors are running
      1. chksys, should return a response like,
        hetdex   12996     1  0 121132 6124   0 12:02 ?        00:00:37 /opt/het/hetdex/bin/tcs_monitor --named-route tcp://192.168.66.99:30000 --system-names pfip_epics_relay --key-filter .*status --db-file /tmp/MonLogs/20220307T120000.pfipEpicsRelay_db
        hetdex   13062     1  0 122182 11208  0 12:05 ?        00:00:25 /opt/het/hetdex/bin/tcs_monitor --named-route tcp://192.168.66.99:30000 --system-names tcp://scs-smoco1:10001 --db-file /tmp/MonLogs/20220307T120501.scsMonitor_db
        
        hetdex   12895     1  4 122291 10724  2 12:00 ?        00:12:05 /opt/het/hetdex/bin/event_monitor --named-route tcp://192.168.66.99:30000 --system-names apc,tracker,tcs,pas,pfip,legacy,lrs2,virus,log-relay,event-monitor,thermocube_side1_epics_relay,thermocube_side2_epics_relay,dimmpoller,pfipmaxon --system-filter .* --source-filter .* --key-filter .* --db-file /tmp/MonLogs/20220307T120000.db
        hetdex   12996     1  0 121132 6124   0 12:02 ?        00:00:37 /opt/het/hetdex/bin/tcs_monitor --named-route tcp://192.168.66.99:30000 --system-names pfip_epics_relay --key-filter .*status --db-file /tmp/MonLogs/20220307T120000.pfipEpicsRelay_db
        hetdex   12895     1  4 122291 10724  2 12:00 ?        00:12:05 /opt/het/hetdex/bin/event_monitor --named-route tcp://192.168.66.99:30000 --system-names apc,tracker,tcs,pas,pfip,legacy,lrs2,virus,log-relay,event-monitor,thermocube_side1_epics_relay,thermocube_side2_epics_relay,dimmpoller,pfipmaxon --system-filter .* --source-filter .* --key-filter .* --db-file /tmp/MonLogs/20220307T120000.db
        
  • on htcs.het.astronomy.utexas.edu
    1. Restart the Tcs servers
      1. sudo service pfipServer start
      2. sudo service pasServer start
      3. sudo service legacyServer start
      4. sudo service trackerServer start
      5. sudo service tcsServer start
    2. verify the servers are running by starting the tcsGui and check for green lights.
    3. the caRepeater will start with the pfipServer
  • on lrs2
    1. Enable pressure gauges
      1. syscmd -l "enable_pressure_gauges()"
      2. if no connection, power off pressure gauges and ion pumps
      3. restart lrs2 server
      4. pong on pressure gauges and ion pumps
  • launch rdesktop and connect to mirrormaster.het.astronomy.utexas.edu
    1. login as guider
    2. verify the Y:\ drive is connected to gracie::\common
    3. launch the CXRecorder from the desktop shortcut.
    4. on the Devices tab, select both accelerometers
    5. Click the Start Recording
    6. close the remote desktop connection
  • on dome.het.astronomy.utexas.edu
    1. ensure that the DAS server has been restarted
  • on scs-smoco1
    1. ensure that SCS server has been restarted
      1. on tolauncher SCS->Restart SCS System
  • On poster
    1. ensure that het-operations email list is operational
      1. Restart sendmail (Not sure if this is all that needs to be done. May need to start mailman, but mailman won't remain started)
  • On izar
    1. Make sure weather display is show on the LCD on top of control room rack
      1. With keyboard in rack, click button to reload web page.
Last modified 7 months ago Last modified on Oct 2, 2023 3:13:44 PM