Swirlds is a software platform designed to build fully-distributed applications that harness the power of the cloud without servers$1. .$1
September 8
🏡 Remote – Anywhere in California
Ansible
AWS
Azure
Bash
Cloud
Distributed Systems
Docker
Firewalls
Google Cloud Platform
Kubernetes
Open Source
Python
Terraform
Web3
Swirlds is a software platform designed to build fully-distributed applications that harness the power of the cloud without servers$1. .$1
• Leading the design, deployment, and management of infrastructure, ensuring high availability, reliability, and scalability • Building, mentoring, and leading a globally distributed SRE team across multiple time zones (APAC, LATAM, etc.) with a follow-the-sun on-call support model • Developing and managing SLAs for availability, performance, and uptime while driving operational excellence and automation • Creating and implementing strategies for continuous delivery, monitoring, and incident response to ensure minimal downtime and rapid recovery • Partnering with engineering teams to design scalable and fault-tolerant architecture and processes • Overseeing security best practices, including vulnerability management, monitoring, and compliance with industry standards • Developing tools and processes for automation of infrastructure, monitoring, alerting, and incident management • Managing budgets, vendors, and third-party tools related to infrastructure, ensuring cost-effectiveness and efficiency • Ensuring comprehensive documentation and training for all infrastructure, deployment, and operational processes
• 10+ years of experience in Site Reliability Engineering (SRE) or infrastructure engineering, with at least 5 years in leadership roles • Proven experience in designing, deploying, and managing large-scale distributed systems, preferably in a cloud environment (AWS, GCP, Azure) • Strong expertise in automation tools (Terraform, Ansible, etc.) and scripting languages (Python, Bash, etc.) • Strong experience with containerization and orchestration technologies such as Docker and Kubernetes • Deep understanding of network infrastructure, load balancing, firewalls, VPNs, and security best practices • Proven track record of meeting or exceeding SLAs for system uptime and performance • Experience building and leading teams across multiple regions and time zones • Familiarity with managing infrastructure in a highly regulated or security-sensitive environment • Strong understanding of CI/CD pipelines and incident management platforms (PagerDuty, Opsgenie) • Strong understanding of LGTM stack • Excellent leadership, communication, and project management skills
Apply Now