About Replit

The coding platform that empowers everyone to learn

🏢 Tech👥 101-200 employees📅 Founded 2016📍 SoMa, San Francisco, CA💰 $472.2m

B2CB2BArtificial IntelligenceEnterpriseTrainingLearningSaaS

Key Highlights

Raised $472.2 million in funding
Millions of users, including Google and Facebook employees
Supports popular languages like C++, JavaScript, and PHP
Remote-first culture with flexible work hours

Replit is a collaborative coding platform that simplifies programming for learners, educators, and developers. Based in SoMa, San Francisco, Replit has attracted millions of users, including employees from major tech companies like Google, Facebook, and Stripe. The company has raised $472.2 million ...

🎁 Benefits

Replit offers a remote-first work environment with flexible hours, equity options, and a home office setup stipend. Employees enjoy comprehensive heal...

🌟 Culture

Replit's culture is centered around accessibility in coding, allowing users to start programming without complex setups. The company values innovation...

🌐 Website 💼 LinkedIn 𝕏 Twitter All 63 jobs →

Site Reliability Engineer • Staff

Replit • Foster City - Hybrid

Posted 3 months ago🏢 Hybrid Staff Site Reliability Engineer 📍 Foster City

Apply Now →

Skills & Technologies

aws docker kubernetes prometheus grafana

Overview

Replit is hiring a Staff Site Reliability Engineer to ensure the reliability and performance of their infrastructure. You'll work with AWS, Docker, and Kubernetes to implement automation and best practices. This role requires a strong background in SRE principles and experience in building resilient systems.

Job Description

Who you are

You have a strong background in Site Reliability Engineering, with a passion for building and maintaining resilient systems at scale. You possess deep expertise in cloud infrastructure, particularly with AWS, and have experience implementing automation to enhance operational efficiency. Your ability to analyze reliability problems and design effective solutions sets you apart as a leader in your field.

You are skilled in observability practices, having designed and implemented comprehensive monitoring, logging, and tracing solutions. You understand the importance of real-time visibility into system health and performance, and you are adept at creating dashboards and metrics that facilitate proactive issue detection. Your experience in leading incident response efforts demonstrates your capability to manage high-pressure situations effectively.

You thrive in collaborative environments, mentoring and educating your peers to instill a culture of reliability within the engineering team. Your communication skills allow you to bridge the gap between development and operations, ensuring that reliability is a core value at Replit. You are committed to continuous improvement and are always looking for ways to enhance the reliability of infrastructure.

Desirable

Experience with container orchestration tools like Kubernetes and monitoring tools such as Prometheus and Grafana is a plus. Familiarity with CI/CD pipelines and infrastructure as code practices will further enhance your contributions to the team.

What you'll do

As a Staff Site Reliability Engineer at Replit, you will architect and implement observability solutions that provide comprehensive insights into system performance. You will lead the design and implementation of monitoring, logging, and tracing systems, ensuring that the infrastructure can scale efficiently while maintaining high availability. Your role will involve defining and driving reliability standards across the organization, collaborating closely with development teams to integrate these practices into their workflows.

You will proactively identify and analyze reliability issues across the stack, designing and implementing software and systems that create significant improvements. Your responsibilities will include automating operational tasks to reduce manual intervention and enhance system resilience. You will also lead incident response efforts, ensuring that issues are resolved swiftly and effectively, minimizing downtime and impact on users.

In addition to your technical responsibilities, you will play a crucial role in mentoring and educating the broader engineering team. You will share your knowledge of reliability best practices, fostering a culture of continuous improvement and collaboration. Your contributions will directly impact the reliability and performance of Replit's infrastructure, enabling millions of developers to build applications seamlessly.

What we offer

At Replit, you will be part of a mission-driven team dedicated to democratizing software development. We offer a hybrid work environment that allows you to collaborate in the office while also providing flexibility for remote work. Our culture values diverse perspectives and experiences, and we encourage candidates from all backgrounds to apply. You will have the opportunity to work with cutting-edge technologies and contribute to a platform that empowers users worldwide.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Replit.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Site Reliability Engineer

Replit•📍 Foster City - Hybrid

Replit is hiring a Site Reliability Engineer to ensure the reliability and performance of its infrastructure. You'll work with tools like Terraform and Ansible to automate operational tasks and implement monitoring solutions. This role requires a passion for building resilient systems at scale.

🏢 Hybrid

11 months ago

Site Reliability Engineer

Zoox•📍 Foster City - On-Site

Zoox is seeking a Site Reliability Engineer to ensure the availability and performance of services for autonomous vehicles. You'll work with systems processing massive data volumes and support compute-intensive pipelines. This role requires expertise in Linux, Docker, and Kubernetes.

🏛️ On-Site

1 month ago

Site Reliability Engineer

Assured•📍 Remote - Remote

Assured is hiring a Staff Site Reliability Engineer to build efficient, reliable, secure, and scalable infrastructure. You'll work with AWS, Docker, and Kubernetes to automate the delivery of modern SaaS platforms. This position requires experience in a start-up environment.

🏠 RemoteStaff

1 year ago

Site Reliability Engineer

Bugcrowd•📍 United States - Remote

Bugcrowd is seeking a Staff Site Reliability Engineer to lead the reliability strategy across engineering teams. You'll work with technologies like AWS, Docker, and Kubernetes to tackle complex infrastructure challenges. This role requires significant experience in site reliability engineering.

🏠 RemoteStaff

2 months ago

Site Reliability Engineer

Zscaler•📍 San Jose - Hybrid

Zscaler is hiring a Staff Site Reliability Engineer to enhance their zero trust security platform. You'll work in a hybrid role based in San Jose or fully remote across the USA. This position requires expertise in site reliability engineering.

🏢 HybridStaff

1w ago

Browse all jobs →