September 6
🏡 Remote – Anywhere in California
• Model and manage data assets in a data lake architecture. • Build and manage a data lake in AWS leveraging and augmenting existing LakeFormation based architecture. • Build and maintain data pipelines from various data sources, including streaming datasets, APIs, and various data stores, leveraging PySpark and AWS Glue. • Create data sets from the data lake to support various use cases, such as business analytics, dashboards, reports, and machine learning. • Drive technical decisions on the best ways to serve data consumers. • Leverage existing AWS architectures and design new ones where needed, using the CDK toolkit. • Operationalize data workloads in AWS, automating pipelines and implementing appropriate monitoring. • Work with cross-functional teams to discover business needs and design appropriate data flows.
• Bachelor’s degree in computer science, similar technical field of study, or equivalent practical experience. • Minimum 3 years of hands-on experience developing data solutions in a modern cloud environment. • Fluency in Python. • Experience authoring and maintaining ETL jobs (PySpark experience a plus). • Experience designing and interacting with relational and non-relational data stores. • Experience with AWS ecosystem and resources and using Infrastructure-as-code methodologies (CDK a plus). • Demonstrated ability to manage production data workloads.
Apply Now