About Confluent

The leading data streaming platform for enterprises

🏢 Tech👥 1001+ employees📅 Founded 2014📍 Mountain View, California💰 $455.9m⭐ 3.6

B2BEnterpriseInternal toolsBig dataCloud Computing

Key Highlights

Founded in 2014, built on Apache Kafka technology
Raised $455.9 million in Series D funding
Headquartered in Mountain View, California
1,001+ employees dedicated to data streaming solutions

Confluent, founded in 2014 and headquartered in Mountain View, California, is a leading data streaming platform built on the open-source Apache Kafka project. With over 1,001 employees, Confluent has raised $455.9 million in funding and serves a diverse range of enterprise customers, providing real-...

🎁 Benefits

Confluent offers unlimited holiday, life insurance, income protection insurance, and delicious catered food twice a week. Employees also benefit from ...

🌟 Culture

Confluent fosters a culture that values open-source collaboration and innovation, emphasizing the importance of real-time data solutions. The company ...

🌐 Website 💼 LinkedIn 𝕏 Twitter All 137 jobs →

Site Reliability Engineer • Staff

Confluent • Ontario - Remote

Posted 4w ago🏠 Remote Staff Site Reliability Engineer 📍 Ontario

Apply Now →

Skills & Technologies

aws gcp azure automation incident management

Overview

Confluent is hiring a Staff Site Reliability Engineer to enhance incident management and reliability for their cloud platform. You'll work with AWS, GCP, and Azure to build automation and improve tooling. This position requires expertise in proactive reliability improvements and incident response.

Job Description

Who you are

You have extensive experience in site reliability engineering, particularly in incident management and reliability within multi-cloud environments. Your background includes hands-on technical work and strategic program ownership, allowing you to drive proactive reliability improvements that prevent incidents before they occur. You thrive in collaborative settings, coaching teams through post-mortems and evolving incident response practices.

You possess deep systems thinking skills, enabling you to analyze systemic failure patterns and design reliability improvements effectively. Your expertise spans across cloud platforms such as AWS, GCP, and Azure, and you are adept at building automation and improving tooling to enhance operational efficiency. You are a strong communicator, capable of teaching and coordinating with teams to ensure sustainable operations.

What you'll do

In this role, you will spend approximately 75% of your time on engineering tasks, focusing on building automation and improving tooling for incident management. You will analyze failure patterns and design reliability improvements that enhance the overall performance of Confluent Cloud, which processes millions of events per second. The remaining 25% of your time will be dedicated to teaching and coordination, where you will coach teams through post-mortems and train incident commanders to ensure effective incident response practices.

You will be part of a global team that provides follow-the-sun coverage, ensuring clean handoffs and sustainable working hours for all team members. Your contributions will directly impact the reliability and performance of the data streaming platform, allowing companies to react faster and build smarter solutions. You will collaborate with cross-functional teams to foster a culture of continuous improvement and innovation in incident management.

What we offer

Confluent offers a dynamic work environment where you can lead, grow, and challenge what’s possible in the field of data streaming. You will have the opportunity to work with cutting-edge technologies and be part of a team that values diverse perspectives and collaboration. We encourage you to apply even if your experience doesn't match every requirement, as we believe in the potential of every individual to contribute to our mission.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Confluent.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Site Reliability Engineer

Clarifai•📍 Canada - Remote

Clarifai is seeking a Senior Site Reliability Engineer to ensure the smooth operation and high availability of their AI platform. You'll work with Kubernetes, Python, and Golang to address infrastructure challenges. This role requires expertise in cloud systems and microservice architecture.

🏠 RemoteSenior

1 month ago

Site Reliability Engineer

Zscaler•📍 Netherlands - Remote

Zscaler is hiring a Staff Site Reliability Engineer to join their Cloud Infrastructure & Operations team. You'll leverage your expertise to enhance the reliability and performance of their systems. This role is remote and based in the Netherlands.

🏠 RemoteStaff

1w ago

Site Reliability Engineer

PandaDoc•📍 Germany - Remote

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll work with Python, Java, AWS, and Kubernetes to manage incident processes and observability stacks. This role requires solid programming experience and expertise in maintaining production services.

🏠 RemoteSenior

1d ago

Site Reliability Engineer

PandaDoc•📍 Spain - Remote

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll work with Python, Java, AWS, and Kubernetes to manage incident processes and observability tools. This role requires solid programming experience and expertise in maintaining production services.

🏠 RemoteSenior

1d ago

Site Reliability Engineer

Zscaler•📍 San Jose - Hybrid

Zscaler is hiring a Staff Site Reliability Engineer to enhance their zero trust security platform. You'll work in a hybrid role based in San Jose or fully remote across the USA. This position requires expertise in site reliability engineering.

🏢 HybridStaff

1w ago

Browse all jobs →