Lead HPC & AI Infrastructure Engineer
JOB_53494454733888Job type
PermanentLocation
HampshireWorking Pattern
Full-timeSpecialism
InfrastructureIndustry
Technology & Internet ServicesPay
130,000
Lead HPC & AI Infrastructure Engineer – Fully Remote | Cutting-Edge Projects | Unlimited Holiday
Your new company
Step into the future of computing with a trailblazing organisation at the intersection of AI innovation and High Performance Computing (HPC). This company is redefining scalable infrastructure, building GPU-optimised environments that power advanced research and enterprise workloads. With a strong commitment to ethical computing and technical excellence, they’re shaping the next generation of AI platforms.
Your new role
- Designing end-to-end infrastructure solutions across compute, storage, and networking
- Producing detailed technical documentation: hardware specs, data centre layouts, cabling, power and cooling
- Installing and tuning Linux-based operating systems and configuring SLURM job schedulers
- Optimising high-speed networking technologies (Infiniband, RoCE)
- Automating deployments and maintenance using Ansible, Terraform, Bash, and Python
- Troubleshooting complex distributed systems and mentoring junior engineers
What you'll need to succeed
- Proven experience designing and scaling large HPC clusters (hundreds to thousands of nodes)
- Strong SLURM configuration skills – partitions, priorities, resource management
- Advanced Linux administration and performance tuning
- Expertise in high-performance networking (Infiniband, RoCE, RDMA)
- Experience with distributed file systems (Lustre, Ceph, WEKA, VAST)
- Proficiency in automation and scripting (Ansible, Terraform, Bash, Python)
- A solid understanding of monitoring, resilience, and security compliance
- Excellent documentation skills and a passion for mentoring and knowledge sharing
- Containerisation in HPC (Singularity, Docker, Apptainer)
- Familiarity with AI/ML workflows, GPU-aware MPI, NVLink
- Experience in cloud, academic, or research environments
- Vendor hardware validation and data centre planning
What you'll get in return
- Share options and long-term incentives
- Unlimited holiday policy
- 100% remote working with flexible hours
- A culture of internal promotion and career development
- A collaborative, forward-thinking team
- Enhanced family-friendly policies
- A truly flexible and supportive workplace
What you need to do now
If you're interested in this role, click 'apply now' to forward an up-to-date copy of your CV, or call us now.
If this job isn't quite right for you, but you are looking for a new position, please contact us for a confidential discussion about your career.
Talk to Jacob Clift, the specialist consultant managing this position
Located in Southampton, 3rd Floor, One Dorset Street, SouthamptonTelephone 023 82 020 113