August 28
🏢 In-office - Bay Area
• Responsible for building and maintaining the infrastructure that enables development, deployment, and scaling of machine learning, data, and service pipelines • Collaborate closely with data scientists and full-stack developers to shape machine learning services and systems • Ensure scalability, reliability, availability, and performance of RAG 2.0 systems
• Master’s degree in Computer Science, Engineering, or a related field (Ph.D. preferred) • 5+ years of experience in building highly available, production-grade containerized distributed ML systems • 5+ years of experience with ML infrastructure technologies such as Kubernetes, GCP, AWS, ArgoCD, Terraform, CloudFormation, infrastructure as code, containerization, SLURM automation, Linux fundamentals, GitHub Actions, and CI/CD • Demonstrated experience in managing cloud-based Kubernetes services • 5+ years of experience in one programming language such as Python, Java, or Go • Familiarity with key concepts in machine learning, natural language processing, and computer vision • Familiarity with data preprocessing, feature engineering, and data visualization • Familiarity with large GPU clusters and high-performance computing/networking • Experience in mentoring and growing junior engineers into successful leaders • Excellent communication and collaboration skills, with the ability to work effectively in a fast-paced and dynamic environment
• equity • benefits
Apply Now