Build aligned and more complete AI to accelerate humanity’s progress on the world’s most important problems
March 18
🏢 In-office - San Francisco
Build aligned and more complete AI to accelerate humanity’s progress on the world’s most important problems
• Lead and participate in projects to build massive scale (1000's of GPU's), highly available and secure AI training and Inference infrastructure • Ensure highly available GPU workloads for training and production inference purposes • Respond to and investigate incidents across security and availability • Platform Engineering to enable our engineers to be super fast • Come up with and propose ways of accelerating the team • Troubleshoot and resolve issues across GPU resources, networking, OS, drivers and cloud environments, automate detection and recovery of such issues
• Extensive experience with GCP or similar cloud platforms • Strong Linux knowledge in the k8s, VM and bare metal context, including bash scripting and troubleshooting • Strong general coding skills • Strong IaC knowledge with extensive experience in Terraform or Pulumi • Hardware knowledge around server hardware, networking hardware and GPUs • Strong knowledge of hardening cloud native and especially k8s workloads
• Benchmark-based compensation in the 75th or 90th percentile, including base salary, generous equity, and benefits • 401K with 6% match • Flexible working hours • In-person (SF or Vienna) or remote • A small, fast-paced, highly focused team
Apply Now