PagerDuty

About PagerDuty

The digital operations platform for modern enterprises

🏢 Tech👥 1001+ employees📅 Founded 2009📍 SoMa, San Francisco, CA💰 $173.8m3.9
B2BEnterpriseInternal toolsSaaSDevOpsCloud Computing

Key Highlights

  • Over 10,500 enterprise customers including GE & Capital One
  • Raised $173.8 million in Series C funding
  • Headquartered in SoMa, San Francisco, CA
  • Recognized in AWS Partner Awards for innovation

PagerDuty is a leading digital operations management platform headquartered in SoMa, San Francisco, CA. Serving over 10,500 enterprises, including GE, Capital One, IBM, and Spotify, PagerDuty specializes in incident management and alerting services. The company has raised $173.8 million in funding a...

🎁 Benefits

Employees enjoy 20 hours of paid volunteer time annually, comprehensive health insurance, and a generous paid parental leave policy. The company also ...

🌟 Culture

PagerDuty fosters a culture centered around operational excellence and innovation, focusing on empowering DevOps teams. The company emphasizes hands-o...

PagerDuty

Site Reliability Engineer Mid-Level

PagerDutyToronto - On-Site

Posted 1w ago🏛️ On-SiteMid-LevelSite Reliability Engineer📍 Toronto💰 CA$115,000 - CA$165,000 / yearly
Apply Now →

Overview

PagerDuty is hiring a Site Reliability Engineer II to support and improve foundational infrastructure for their real-time digital operations platform. You'll work with technologies like Kubernetes and AWS in Toronto. This position requires experience in reliability and scalability of systems.

Job Description

Who you are

You have a solid background in site reliability engineering, with experience in building and operating foundational infrastructure that supports real-time digital operations. Your expertise in Kubernetes and Docker allows you to manage containerized applications effectively, ensuring high availability and performance. You are comfortable working in a Linux environment and have a good understanding of networking concepts, which helps you troubleshoot and optimize system performance. You are a collaborative team player who enjoys tackling complex problems and contributing to the overall reliability and security of systems.

You have experience with cloud platforms, particularly AWS, and understand how to leverage cloud services to enhance system scalability and resilience. Your ability to analyze system metrics and logs enables you to identify potential issues before they impact users. You are proactive in implementing best practices for incident management and have a keen interest in continuous improvement processes. You thrive in a flexible work environment and are committed to building a more equitable world through your work.

What you'll do

As a Site Reliability Engineer II at PagerDuty, you will support and improve foundational infrastructure, including networking, compute platforms, and ingress/traffic management systems. You will contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new features. Your role will involve collaborating with cross-functional teams to ensure that the infrastructure meets the needs of the business and its customers. You will also participate in incident response and post-mortem analysis to drive improvements in system reliability.

You will work closely with the Core Infrastructure team to build and evolve foundational network and compute infrastructure, ensuring that it can handle millions of events and alerts daily. Your contributions will directly impact the reliability, scalability, and security of the services that customers rely on to keep their businesses running. You will have the opportunity to mentor junior engineers and share your knowledge of best practices in site reliability engineering.

What we offer

At PagerDuty, you will be part of a dynamic team that values collaboration and innovation. We offer a flexible, award-winning workplace where you can grow your skills and advance your career. You will have access to professional development opportunities and the chance to work on meaningful projects that make a difference in the world. We are committed to providing reasonable accommodations for qualified individuals with disabilities in our job application process. Join us in building a more equitable world through technology.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at PagerDuty.

Similar Jobs You Might Like

Based on your interests and this role

Fivetran

Site Reliability Engineer

Fivetran📍 Toronto - On-Site

Fivetran is seeking a Site Reliability Engineer II to enhance the reliability of their data platform. You'll collaborate with engineering teams and contribute to incident response and infrastructure monitoring. This role requires experience with AWS, Docker, and Kubernetes.

🏛️ On-SiteMid-Level
1 month ago
Achievers

Site Reliability Engineer

Achievers📍 Toronto

Achievers is hiring a Staff Site Reliability Engineer to manage and advance their global infrastructure. You'll work with GCP/GKE and AI-driven workflows to build reliable, scalable cloud systems. This position requires approximately 15 years of technical expertise in distributed systems.

Staff
3w ago
MongoDB

Site Reliability Engineer

MongoDB📍 Alberta - Remote

MongoDB is seeking a Senior Site Reliability Engineer to join the Fabric team, focusing on building and maintaining robust infrastructure for secure communication. You'll leverage your expertise in networking and distributed systems. This role requires 6+ years of experience.

🏠 RemoteSenior
10h ago
Fivetran

Site Reliability Engineer

Fivetran📍 Toronto

Fivetran is seeking a Senior Site Reliability Engineer to enhance the reliability and performance of their data platform. You'll collaborate with engineering teams and utilize skills in AWS, Docker, and Kubernetes to ensure infrastructure stability.

Senior
1 month ago
Pinterest

Site Reliability Engineer

Pinterest📍 Toronto

Pinterest is hiring a Senior Site Reliability Engineer to ensure the reliability of their large-scale distributed systems. You'll work with technologies like AWS, Docker, and Kubernetes to develop software solutions that enhance system operability. This role requires significant experience in site reliability engineering.

Senior
1w ago