Careers

HPC Platform Engineer (Contractor) -

Contractor

About the Role

We are seeking a highly skilled contractor to help build and support ourhigh-performance computing environment. In this role, you’ll ensure our computeclusters, GPUs, storage, and scheduling systems run efficiently so our teamscan execute large-scale research, engineering, and AI/ML workloads. The idealcandidate brings strong technical proficiency, excellent communication skills,and the ability to adapt quickly within a fast-paced environment. This is a three-month contract with the possibility of extension.

Key Responsibilities

·       Develop automation for provisioning a Slurm cluster.

·       Assist with Slinky deployments.

·       Review and contribute to enhancements of the Slinkydeployment workflow.

·       Support customers by addressing queries related to theSlurm HPC cluster.

·       Manage and support HPC clusters, including computenodes, GPUs, and storage

·       Maintain and optimize job schedulers (e.g., Slurm,PBS, LSF)

·       Automate deployments and configuration using Python,Bash, Ansible/Terraform

·       Monitor performance and troubleshoot cluster or jobissues

·       Partner with users to improve workflow speed,reliability, and resource usage

What We’re Looking For

·       Strong Linux systems experience (RHEL, Rocky, Ubuntu,etc.)

·       Hands-on experience with HPC scheduling systems

·       Familiarity with high-speed networking(Infiniband/RoCE) and parallel file systems

·       Strong scripting and automation skills

·       Excellent problem-solving and communication abilities

Nice to Have

·       Experience with GPU clusters (NVIDIA, CUDA stack)

·       Knowledge of containers in HPC (Apptainer/Singularity)

·       Exposure to AI/ML workloads or cloud-based HPC tools

Max file size 10MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Your application has been successfully submitted.
Oops! Something went wrong while submitting the form.