April 18
🏢 In-office - Bay Area
• ML System engineers in our team are responsible for one or more of the following • Deployment and management of high-performing compute clusters. • Enhancing inference and training performance through optimizations across the system stack, encompassing high-level mechanisms such as queuing and scheduling, medium-level optimizations within inference and training engines, and low-level optimizations targeting GPU kernel efficiency.
• Experience building and rapidly prototyping production cloud-based software • Demonstrated fluency with data structures, algorithms, architecture, and agile software best practices in any language • Experience in Python and C++/Rust • Understanding of the latest technologies in LLMs, like LoRa, Mamba, etc. • Understanding or willingness to learn about the entire system stack • Desire to work in an inclusive and collaborative environment • An interest in continually learning from others, teaching others, and digging into new challenges
• Desire to create speed of light training and inference systems for next-generation AI • Deep technology expertise in machine learning systems, e.g. TinyML, Triton, CUDA, ROCm, Exo, MLIR, Halide, etc
Apply Now