March 12
🏢 In-office - Bay Area
• Design and develop data collection pipelines to gather and preprocess diverse datasets from various sources. • Design and develop data processing pipelines, including data labeling, data filtering, data cleaning, data visualization, data auditing, etc. • Implement machine learning models to improve the quality and diversity of data.
• Strong proficiency in building large-scale data processing pipelines • Familiar with distributed workload (e.g., multiprocessing) • Proficiency in at least one programming language commonly used in machine learning, such as Python • Ability to write clean, maintainable code • Proficiency in at least one deep learning framework, such as PyTorch • Bachelor's degree in computer science or equivalent • Excellent problem-solving skills and attention to detail • Experience in building large-scale datasets • Hands-on experience in the cloud, like AWS, Azure, or GCP • Experience in machine learning • Active Github contributions are a big plus • Multilingual skills contributing to language diversity crucial for robust model training • Experience with fairness, toxicity, data privacy regulations, and compliance considerations
• Opportunity to work in an early-stage startup • Join a team of Deep Learning, Optimization, NLP, AutoML, and Statistics scientists and engineers • Work on high quality generative AI models for language and beyond • Opportunity to implement and train deep neural networks • Potential for aligning models to human values • Opportunity to develop state-of-the-art models towards AGI • Opportunity to design and develop data collection and processing pipelines • Opportunity to implement machine learning models to improve data quality and diversity
Apply Now