March 28
🔄 Hybrid – Bay Area
• Serve as a thought leader in data engineering, including schemas, frameworks, and data models • Implement new sources and connectors for data ingestion • Build scalable job management on Kubernetes for managing petabytes of data • Optimize Spark or Flink applications for batch or streaming modes • Tune clusters for resource efficiency and reliability • Collaborate with a passionate team working at the intersection of open source and enterprise
• 3+ years of experience in data pipelines with Apache Spark or Apache Flink • 2+ years of experience in workflow orchestration tools like Apache Airflow, Dagster • Proficient in Java, Maven, Gradle, and other build tools • Strong SQL query writing and troubleshooting skills • Experience with managing large-scale data on cloud storage • Excellent problem-solving and debugging skills • Operational excellence in monitoring, deploying, and testing job workflows • Nice to haves: Experience with k8s, terabyte scale data pipelines, and various big data technologies
• Opportunity to work on cutting-edge technology in data engineering • Chance to contribute to and grow Apache Hudi, used by major enterprises • Joining a team of experienced builders backed by significant funding • Fast expansion and growth opportunities within the company
Apply Now