HPC Platform Engineer (Contractor) -
About the Role
We are seeking a highly skilled contractor to help build and support ourhigh-performance computing environment. In this role, you’ll ensure our computeclusters, GPUs, storage, and scheduling systems run efficiently so our teamscan execute large-scale research, engineering, and AI/ML workloads. The idealcandidate brings strong technical proficiency, excellent communication skills,and the ability to adapt quickly within a fast-paced environment. This is a three-month contract with the possibility of extension.
Key Responsibilities
· Develop automation for provisioning a Slurm cluster.
· Assist with Slinky deployments.
· Review and contribute to enhancements of the Slinkydeployment workflow.
· Support customers by addressing queries related to theSlurm HPC cluster.
· Manage and support HPC clusters, including computenodes, GPUs, and storage
· Maintain and optimize job schedulers (e.g., Slurm,PBS, LSF)
· Automate deployments and configuration using Python,Bash, Ansible/Terraform
· Monitor performance and troubleshoot cluster or jobissues
· Partner with users to improve workflow speed,reliability, and resource usage
What We’re Looking For
· Strong Linux systems experience (RHEL, Rocky, Ubuntu,etc.)
· Hands-on experience with HPC scheduling systems
· Familiarity with high-speed networking(Infiniband/RoCE) and parallel file systems
· Strong scripting and automation skills
· Excellent problem-solving and communication abilities
Nice to Have
· Experience with GPU clusters (NVIDIA, CUDA stack)
· Knowledge of containers in HPC (Apptainer/Singularity)
· Exposure to AI/ML workloads or cloud-based HPC tools




