Lead HPC & AI Infrastructure Engineer
JOB_53494454733888Job type
PermanentLocation
HampshireWorking Pattern
Full-timeSpecialism
InfrastructureIndustry
Technology & Internet ServicesPay
130,000
Lead HPC & AI Infrastructure Engineer – Fully Remote | Cutting-Edge Projects | Unlimited Holiday
Your new company
Step into the future of computing with a trailblazing organisation at the intersection of AI innovation and High Performance Computing (HPC). This company is redefining scalable infrastructure, building GPU-optimised environments that power advanced research and enterprise workloads. With a strong commitment to ethical computing and technical excellence, they’re shaping the next generation of AI platforms.
Your new role
- Designing end-to-end infrastructure solutions across compute, storage, and networking
 - Producing detailed technical documentation: hardware specs, data centre layouts, cabling, power and cooling
 - Installing and tuning Linux-based operating systems and configuring SLURM job schedulers
 - Optimising high-speed networking technologies (Infiniband, RoCE)
 - Automating deployments and maintenance using Ansible, Terraform, Bash, and Python
 - Troubleshooting complex distributed systems and mentoring junior engineers
 
What you'll need to succeed
- Proven experience designing and scaling large HPC clusters (hundreds to thousands of nodes)
 - Strong SLURM configuration skills – partitions, priorities, resource management
 - Advanced Linux administration and performance tuning
 - Expertise in high-performance networking (Infiniband, RoCE, RDMA)
 - Experience with distributed file systems (Lustre, Ceph, WEKA, VAST)
 - Proficiency in automation and scripting (Ansible, Terraform, Bash, Python)
 - A solid understanding of monitoring, resilience, and security compliance
 - Excellent documentation skills and a passion for mentoring and knowledge sharing
 
- Containerisation in HPC (Singularity, Docker, Apptainer)
 - Familiarity with AI/ML workflows, GPU-aware MPI, NVLink
 - Experience in cloud, academic, or research environments
 - Vendor hardware validation and data centre planning
 
What you'll get in return
- Share options and long-term incentives
 - Unlimited holiday policy
 - 100% remote working with flexible hours
 - A culture of internal promotion and career development
 - A collaborative, forward-thinking team
 - Enhanced family-friendly policies
 - A truly flexible and supportive workplace
 
What you need to do now
If you're interested in this role, click 'apply now' to forward an up-to-date copy of your CV, or call us now.
If this job isn't quite right for you, but you are looking for a new position, please contact us for a confidential discussion about your career.
Talk to Jacob Clift, the specialist consultant managing this position
Located in Southampton, 3rd Floor, One Dorset Street, SouthamptonTelephone 023 82 020 113