wiki:NightOperations/Policies/software

Software Policy for Night Staff

Version April 29, 2021

All night staff members may work on projects or produce software* which are used in science and engineering operations. Software developed by night staff must follow these guidelines in order to be integrated and maintained successfully in our ever-growing software landscape, with support from the software engineers in west Texas and in Austin.

Goals of software version control:

  • flexible modification of code as needed
  • reverting errors associated with changes
  • developing code that depends on other code
  • backup and restore capabilities
  • avoiding code that requires a single user to understand/fix

Guidelines for night staff software development

  • Users have and use their own user accounts on all systems:
    • software is designed to be run by all users, not by a common user (e.g., astronomer or guider)
    • users make changes and run code as their own user, not as a common user
  • All operational software is stored in repositories
    • includes all code that:
      • interacts with telescope hardware
      • is intended to be run by more than one user
      • is run as part of science/engineering operations
    • repositories are currently managed in SVN on cetus
    • Chris can create new repositories on request
    • Each repository should have a script called "install" (or something similar) which contains all necessary information to install the code. Ask Chris if you need help.
    • Changes (commits) must include comments describing the nature of the change.
  • Installations are performed to computers (zeus/janus/juno) from these repositories
    • Chris, Jim, Sergey, and Steven J can all perform installations
    • improvement and standardization to our installation procedures are underway - feedback is welcomed

Best practices in night staff software

  • In Python, prefer version 3 over version 2, for longterm support
  • Python scripts should use argparse (or similar) to provide help from "-h", to improve their usability
    • non-Python codes should also produce their own documentation/instructions
    • all code beyond a reasonable complexity should have additional documentation on this wiki as necessary
  • Avoid using syscmd calls unless absolutely necessary, since they are very slow and will be deprecated in the future
  • Use "try" blocks in Python whenever sending hardware commands
  • log messages for console output should (probably) be saved in
    /data1/archive/logs/
    
    or
    /opt/het/hetdex/logs/
    
    • Recommended formatting for console log files is: execname-user@machine-timestamp.log
      • execname Arbitrary name for the program, probably the name of the script but could be anything uniquely identifying the program that created the log file.
      • user username of user executing the process
      • machine Hostname of the machine on which the process executes. (Not necessarily the one from which the user appeared to be launching it, eg. from a Launcher menu entry).
      • timestamp A (non-punctuated) ISO 8601 timestamp, in UTC (so, ending with a 'Z').
    • Example:
      /opt/het/hetdex/logs/console/tcsGui/tcsGui-stevenj@zeus-20210427T002804Z.log
      
  • output or results from analysis should be saved in their own folder in this directory:
    /data1/archive/
    

Searching the repositories of installed code

codepuller is a tool written by Chris R which regularly collects all code from our repositories on cetus and dumps the latest trunk into /data1/het/sources, or the entire repository where /trunk does not exist. The resulting directory tree is updated daily. You can grep this directory for any string, thus searching all current production code. It does not include branches (development code) or tags (past releases). Here are some examples of how to search for commands:

[stevenj@zeus ~]$ grep -r V309 /data1/het/sources/*
/data1/het/sources/cetus/astronomy/srdev/astro/operations/eon_tasks.sh:    log "Script aborted. Check state of lamps, LRS2 IP and VG and VIRUS V309 IP"
/data1/het/sources/cetus/astronomy/srdev/astro/operations/eon_tasks.sh:    apcCmd on V309_IONPump
/data1/het/sources/cetus/astronomy/miscutils/opscals.sh:##check V309 IP
/data1/het/sources/cetus/astronomy/miscutils/opscals.sh:#removed 18 Aug 2022 when V309IP was decommissioned and new ion pumps installed
/data1/het/sources/cetus/astronomy/miscutils/opscals.sh:#    echo "V309 IP is still powered on!! I will not continue";
/data1/het/sources/cetus/hetdex-pr/het/apc/testing/VEncl2MiscPDU.conf:outlets = VEncl2PLC,V309_IONPump,VEncl2Camera


[stevenj@zeus ~]$ grep -r ipvg /data1/het/sources/*
/data1/het/sources/cetus/astronomy/srdev/astro/operations/eon_tasks.sh:    ipvg on > /dev/null
/data1/het/sources/cetus/astronomy/srdev/astro/operations/eon_tasks.sh:    log "Script did not touch IP/VG so please do not forget to run ipvg on -V when finished"
/data1/het/sources/cetus/astronomy/ra/operations/eon_tasks.sh:    ipvg on > /dev/null
/data1/het/sources/cetus/astronomy/ra/operations/eon_tasks.sh:[ $skipip = "y" ] && log "Script did not touch IP/VG so please do not forget to run ipvg on when finished"
/data1/het/sources/cetus/astronomy/ra/operations/virus_tests.sh:        ipvg on -V > /dev/null
/data1/het/sources/cetus/astronomy/ra/operations/virus_tests.sh:    ipvg off -V > /dev/null
/data1/het/sources/cetus/astronomy/adsf/adsf.sh:		ipvg on -V > /dev/null
/data1/het/sources/cetus/astronomy/adsf/adsf.sh:    ipvg off -V > /dev/null
/data1/het/sources/cetus/hetdex-pr/het/scripting/scripts/ipvg.sh:    USAGE: ipvg <on|off> [stat]
/data1/het/sources/cetus/hetdex-pr/het/scripting/scripts/cal.py:            os.system("ipvg off >/dev/null 2>&1")
/data1/het/sources/cetus/hetdex-pr/het/scripting/scripts/cal.py:                os.system("ipvg on")




Links to our documentation on workstations and software repositories from Chris:



*Software here is used broadly to include all types of instructions and data that tell our systems how to work. It may include shell or python scripts, compiled code, libraries, data files, etc.



Details about our implementation of Python 3

(This comes from an email from Chris Robison on 13-Jan 2022)

In short, /usr/local/bin/python3.10 is the new "official" Python 3 platform on HET systems, as of a recent change to the scripts that configure workstations and servers here. Python 3.10 is the most current stable version of Python, released 2021-10-04. Scripts written for Python 3 at HET should use one of the following shebang lines:

Recommended:
#!/usr/local/bin/python3.10 (explicit reference to the official platform)

Will also work:
#!/usr/local/bin/hetpython (more on this below)
#!/usr/bin/env python3.10 (may break or behave unexpectedly on future systems, but will work on other systems where 3.10 is installed differently)

The longer explanation:
For Python 3, we can no longer follow the configuration approach we've been using for Python 2. We were doing that for a while, and it was causing problems with OS-managed software. The configuration I'm describing here is a means to fix and prevent these problems in a way that requires the least change to existing development workflow.

In our python 2 environment, we have the following characteristics:

  • /usr/local/bin/python points to /bin/python2, so that python runs Python 2.7. This is a step I've had to do myself in the script that builds new systems -- Red Hat no longer supplies a "python" symlink at all. To be clear, by default there is no "python" so the shebang line #!/usr/bin/env python does not work by default, and any documentation you may find suggesting this line is implying that you're going to replace this line with whatever works on your system, whether this is mentioned or not. Due to the vast majority of our python code using some call to "python" and assuming Python 2, I've set up new systems here to work with this assumption. "python" will continue to point to Python 2.7 for the foreseeable future, to avoid breaking existing code.
  • Dependencies are installed globally, using sudo python2 -m pip install or equivalent. This means that any Python 2 script can use these dependencies, which are guaranteed to be present in the global configuration.

The latter point is important -- using this configuration is risky, as it A) forces everyone to write against exactly the same set of dependencies and versions thereof, and B) it can potentially conflict with any software managed by the OS package manager (yum/dnf). It works well enough in our case, because Python 2 is no longer used for any OS-managed software in RHEL 8, and so far we're all okay with the dependencies we're using (outside of a few complaints that have emerged occasionally in the past).

Why we can't do the same with Python 3:
RHEL 8 uses Python 3.6 extensively for its own management utilities, and for OS-managed packages of third-party software. For OS utilities, they've created their own isolated Python 3 environment in an attempt to reduce their exposure to user modifications of the main Python 3.6 environment (/usr/bin/python3), which you can see in the shebang for utilities like dnf (try less /usr/bin/dnf), #!/usr/libexec/platform-python. However, this only covers RHEL utilities themselves, and in some cases in spite of this certain utilities are still exposed via shell calls to other third-party Python 3 software, using /usr/bin/python3. So when you run sudo python3 -m pip install ... or equivalent as we do for python2, you may end up upgrading or changing software in such a way that it is incompatible (eg. due to an API change), and critical system utilities break. We've seen this happen.

For these reasons, installing software to the global Python environment managed by the OS using 'sudo' is strongly discouraged in the Python community, and modern versions of 'pip' will complain with warning messages if it detects it might have been invoked in this way.

In summary, stuff was starting to break -- Stephen Cook and I were both seeing management utilities failing with Python tracebacks, and something had to be done.

The current solution:

  • Isolation: We have adopted a separate, isolated Python 3 environment by means of a custom installation (which is why it's located in /usr/local). While it's not perfect, this gets around the most critical problems with installing dependencies globally, and allows the selection of any suitable Python version -- the latest stable version 3.10 is the obvious choice in this context.
  • Updates: As new stable versions of Python are released (3.11, 3.12, etc), we will add these to the configuration of HET systems, and deprecate and remove old versions on a separate, delayed schedule. The symlink /usr/local/bin/hetpython will always point to the latest version officially supported for HET systems; if you use this in your scripts (#!/usr/local/bin/hetpython or #!/usr/bin/env hetpython) then you'll need to make sure that your code still runs on the newer versions or be aware that you may see failures when HET migrates. By using an explicit reference to a Python version as recommended above, you can avoid this situation and be assured that your code will not break (for this reason, anyway).
  • Dependencies: The current plan is to install dependencies (astropy, matplotlib, pandas, etc) to all these future Python 3 environments in the same way, via the same requirements file (/usr/local/share/hetsetup_res/hetsetup-hetpython.lst). As HET migrates to new versions, the dependencies present in each will be kept the same to ease migration. To put it in another way, new versions of these dependencies will appear in all environments at the same time as they're released. If such an upgrade breaks existing scripts, those scripts can be fixed or a version specifier can be added to the requirements file to prevent the upgrade until the incompatibility is resolved.

Though I have other ideas for future development and deployment mechanisms that would provide greater developer freedom with respect to versions and dependencies, the above solution will remain in place indefinitely.

Another reason to avoid using the OS-provided Python 3.6 -- it also is no longer supported. Python 3.6.15 end-of-life was 2021-12-23. And the version of Python 3.6 provided by RHEL 8 is actually an older patch release, currently 3.6.8. This is fine for software packaged with the OS; it's supported by Red Hat. But 3.6 is definitely no longer an acceptable Python version for new software development. We can't get rid of it, but we should avoid using it for our own code.






Last modified 19 months ago Last modified on Oct 3, 2022 3:21:23 PM