Virtual Host and/or Network Switch outage recovery procedure
Shutdown before outage
For a planned outage do the following prior to shutting down the virtual hosts and network switches;
- on utility-apps.het.astronomy.utexas.edu
- shut down the apc server in order to cleanly close the network connections to the TDK power supplies
- sudo service apcServer stop
- shut down the apc server in order to cleanly close the network connections to the TDK power supplies
- on htcs.het.astronomy.utexas.edu
- shut down all the servers to cleanly close the network connections
- sudo service pfipServer stop
- sudo service pasServer stop
- sudo service legacyServer stop
- sudo service trackerServer stop
- sudo service tcsServer stop
- Kill the caRepeater process
- shut down all the servers to cleanly close the network connections
- on lrs2
- Disable vacuum gauges and ion pumps in lrsServer
- syscmd -l "disable_pressure_gauges()"
- Disable vacuum gauges and ion pumps in lrsServer
Restart and recovery
Once the virtual hosts and network switches are operating again the following steps are needed for recovery of operations.
- on utility-apps.het.astronomy.utexas.edu
- shutdown the running tcs services
- sudo service tcsNamed stop
- sudo service logRelay stop
- sudo service apcServer stop
- restart the tcs services
- sudo service tcsNamed start
- sudo service logRelay start
- sudo service apcServer start
- verify that the services are running
- chksys, this should return a response like,
hetdex 19373 1 0 102152 24116 1 Mar06 ? 00:00:45 /opt/het/hetdex/bin/tcsnamed --named-route tcp://192.168.66.99:30000 -c /opt/het/hetdex/etc/configfiles/tcsnamed.conf hetdex 19508 1 0 156388 95896 1 Mar06 ? 00:01:17 /opt/het/hetdex/bin/tcs_log_relay --named-route tcp://192.168.66.99:30000 hetdex 11093 1 0 335329 14600 1 Mar06 ? 00:05:04 /opt/het/hetdex/bin/apcServer --named-route tcp://192.168.66.99:30000 -c /opt/het/hetdex/etc/configfiles/apcServer.conf
- chksys, this should return a response like,
- shutdown the running tcs services
- start a weather gui and verify that all readings are present. It may take 5 minutes for dust readings to occur.
If there are missing weather readings,
- try restarting the weather system from the tolauncher
- check to weather process logs in /data1/archive/weather/logs/
- login to hetwx.het.astronomy.utexas.edu
- su - wx
- trouble shoot the programs in /home/hetwx/wx/weather_sys
- use rdesktop to connect to trusst.het.astronomy.utexas.edu, login as guider
- restart the Truss Temp Reader and Server - v3.3 via the shortcut on the desktop
- on ute2.het.astronomy.utexas.edu
- verify that event monitors are running
- chksys, should return a response like,
hetdex 12996 1 0 121132 6124 0 12:02 ? 00:00:37 /opt/het/hetdex/bin/tcs_monitor --named-route tcp://192.168.66.99:30000 --system-names pfip_epics_relay --key-filter .*status --db-file /tmp/MonLogs/20220307T120000.pfipEpicsRelay_db hetdex 13062 1 0 122182 11208 0 12:05 ? 00:00:25 /opt/het/hetdex/bin/tcs_monitor --named-route tcp://192.168.66.99:30000 --system-names tcp://scs-smoco1:10001 --db-file /tmp/MonLogs/20220307T120501.scsMonitor_db hetdex 12895 1 4 122291 10724 2 12:00 ? 00:12:05 /opt/het/hetdex/bin/event_monitor --named-route tcp://192.168.66.99:30000 --system-names apc,tracker,tcs,pas,pfip,legacy,lrs2,virus,log-relay,event-monitor,thermocube_side1_epics_relay,thermocube_side2_epics_relay,dimmpoller,pfipmaxon --system-filter .* --source-filter .* --key-filter .* --db-file /tmp/MonLogs/20220307T120000.db hetdex 12996 1 0 121132 6124 0 12:02 ? 00:00:37 /opt/het/hetdex/bin/tcs_monitor --named-route tcp://192.168.66.99:30000 --system-names pfip_epics_relay --key-filter .*status --db-file /tmp/MonLogs/20220307T120000.pfipEpicsRelay_db hetdex 12895 1 4 122291 10724 2 12:00 ? 00:12:05 /opt/het/hetdex/bin/event_monitor --named-route tcp://192.168.66.99:30000 --system-names apc,tracker,tcs,pas,pfip,legacy,lrs2,virus,log-relay,event-monitor,thermocube_side1_epics_relay,thermocube_side2_epics_relay,dimmpoller,pfipmaxon --system-filter .* --source-filter .* --key-filter .* --db-file /tmp/MonLogs/20220307T120000.db
- chksys, should return a response like,
- verify that event monitors are running
- on htcs.het.astronomy.utexas.edu
- Restart the Tcs servers
- sudo service pfipServer start
- sudo service pasServer start
- sudo service legacyServer start
- sudo service trackerServer start
- sudo service tcsServer start
- verify the servers are running by starting the tcsGui and check for green lights.
- the caRepeater will start with the pfipServer
- Restart the Tcs servers
- on lrs2
- Enable pressure gauges
- syscmd -l "enable_pressure_gauges()"
- if no connection, power off pressure gauges and ion pumps
- restart lrs2 server
- pong on pressure gauges and ion pumps
- Enable pressure gauges
- launch rdesktop and connect to mirrormaster.het.astronomy.utexas.edu
- login as guider
- verify the Y:\ drive is connected to gracie::\common
- launch the CXRecorder from the desktop shortcut.
- on the Devices tab, select both accelerometers
- Click the Start Recording
- close the remote desktop connection
- on dome.het.astronomy.utexas.edu
- ensure that the DAS server has been restarted
- on scs-smoco1
- ensure that SCS server has been restarted
- on tolauncher SCS->Restart SCS System
- ensure that SCS server has been restarted
- On poster
- ensure that het-operations email list is operational
- Restart sendmail (Not sure if this is all that needs to be done. May need to start mailman, but mailman won't remain started)
- ensure that het-operations email list is operational
- On izar
- Make sure weather display is show on the LCD on top of control room rack
- With keyboard in rack, click button to reload web page.
- Make sure weather display is show on the LCD on top of control room rack
Last modified 7 months ago
Last modified on Oct 2, 2023 3:13:44 PM