Online accounting software. Connects to all things business: accountants, bookkeepers, banks, enterprise & apps.
Accounting • SaaS • Banking • Invoicing • Design
2 days ago
🏡 Remote – Anywhere in California
Online accounting software. Connects to all things business: accountants, bookkeepers, banks, enterprise & apps.
Accounting • SaaS • Banking • Invoicing • Design
• Investigating operational surprises and supporting teams in post incident activities. • Conducting in depth incident analysis and maximizing post incident learning across the organization • Complete short term reliability consultancy and enablement engagements such as SLO reviews and facilitating pre-mortems. • Improving on call health, uplifting observability and addressing any operational hotspots • Identifying, planning and leading implementation of reliability uplift work and initiatives • Support delivery of strategic features and initiatives with reliability and distributed systems expertise • Observing and improving rituals and practices relating to production operations, incident response and incident learning
• Solid experience in logging, monitoring and observability of a highly distributed system • Leading incident management and response and troubleshooting efforts, including critical, complex and high severity incidents • Post incident reviews, incident analysis and learning from incidents • Experience working in a tech or product company with comparable scale and complexity • Systems thinking and thinking about how systems and components interact, how they respond to failure • Proficiency in one or more object-oriented programming languages (C#, JavaScript, Java, Python etc) or experience with infrastructure-as-code (e.g. Terraform, Cloudformation) • Experience working with cloud providers such as AWS, Azure or GCP • Experience with designing, developing and operating distributed systems and large scale software systems • Strong experience delivering technical initiatives in an operational, site reliability or platform engineering capacity • The ability to solve engineering challenges outside of your own team, including using influence rather than authority to enact change • Demonstrated experience in reliability concepts like capacity management, autoscaling, deployment and release safety, software strategies for reliability, fault tolerance and graceful failure • Experienced in implementing customer focused Service Level Objectives (SLOs) • Experience using software engineering to solve operational and reliability challenges • Understanding of human factors, safety science and resilience engineering • Experience working in environments with advanced security and networks
Apply Now