March 12
🔄 Hybrid – Bay Area
• The platform for running Big Data and AI workloads • An early member of the infrastructure team, building the platform that will scale Kumo to handle huge cloud datasets • Maximize the productivity of dozens of engineers & future customers • Work with a diverse group of ML scientists, infrastructure engineers, product engineers, and leaders to influence productionizing and scaling new ML technologies, build tools to increase velocity and help with full-stack user experiences • Design and build multiple core systems from scratch, make key design decisions, passionate about foundational infrastructure work, managing model lifecycles and ML Ops, CI/CD and advanced packaging, versioning, and deployment strategies
• BS (preferred MS, PhD.) in Computer Science. • B2B SaaS experience, architecting experience in building a large-scale distributed system at scale. • 3+ years of experience writing production code in Java, Javascript, C++, or Python (NO NEW GRADS) • Experience with productionizing cloud applications, including Docker and Kubernetes, CI/CD and advanced packaging, versioning, and deployment strategies, containers and serverless architecture, online/offline feature stores, model performance monitoring • Familiarity with popular MLOps tooling from cloud vendors like GCP (Vertex AI), AWS (SageMaker) or Azure Machine Learning and MLFlow, Kubeflow, etc. • Proficiency with general full-stack application development, such as defining data models, building abstractions for business logic, and developing customer-facing Web Front Ends or public APIs/SDKs for the application. • Experience with Infrastructure-as-code development (e.g., Terraform, Cloud Formation, Ansible, Chef, Bash scripting, etc.) Core understanding of data modeling and fundamentals of data engineering (e.g. integrations/connectors, pipelines, ETL/ELT processes)
• Build and extend components of the core Kumo infrastructure Willingness to respond and be a key participant in our incident management process and develop tools for better Root Cause Analysis and reduction in MTTR (mean time to respond) Build and automate CI-CD pipelines, and release tooling to support continuous delivery and true zero-downtime deployments across different cloud providers using the latest cloud-native technologies. Build the Kumo ML Ops platform, which will be able to data drift, track model versions, report on production model performance, alert the team of any anomalous model behavior, and run programmatic A/B tests on production models. You will work on advanced tools developed for the world’s leading cloud-native machine learning engine that uses graph deep learning technology
Apply Now