We are a small, friendly and multi-disciplinary AI studio creating a personal AI for everyone.
July 24
🏢 In-office - Bay Area
We are a small, friendly and multi-disciplinary AI studio creating a personal AI for everyone.
• As part of Inflection’s commitment to deploying high-performance models for enterprise applications, our inference team ensures that these models run efficiently and effectively in real-world scenarios. • Research engineers in this role focus on optimizing model inference processes, reducing latency, and improving throughput without compromising model performance, ensuring robust deployment in enterprise environments.
• Have experience with deploying and optimizing LLMs for inference, both in cloud and on-prem environments. • Are adept at using tools and frameworks for model optimization and acceleration, such as ONNX, TensorRT, or TVM. • Enjoy troubleshooting and solving complex problems related to model performance and scaling. • Have a deep understanding of the trade-offs involved in model inference, including hardware constraints and real-time processing requirements. • Are proficient with PyTorch and familiar with infrastructure management tools like Docker and Kubernetes for deploying inference pipelines.
• Unlimited paid time off • Parental leave and flexibility for all parents and caregivers • Generous medical, dental and vision plans for US employees • Compliance with country-specific benefits for non-US employees • Visa sponsorship for new hires • Avenues for personal growth such as coaching, conference attendance, or specific trainings
Apply Now