
About PandaDoc
Streamlining document workflows for growing organizations
Key Highlights
- Over 35,000 customers including Cisco and HubSpot
- Headquartered in San Francisco, California
- Raised $50M+ from investors like Rembrandt Venture Partners
- Offers unlimited PTO and flexible remote work options
PandaDoc is a document workflow automation platform headquartered in San Francisco, California, that serves over 35,000 organizations, including notable clients like Cisco and HubSpot. The platform streamlines the creation, management, and signing of digital documents such as proposals, quotes, and ...
🎁 Benefits
PandaDoc offers competitive salaries, equity options, unlimited PTO, and a flexible remote work policy, allowing employees to maintain a healthy work-...
🌟 Culture
PandaDoc fosters a remote-friendly culture that emphasizes collaboration and innovation, encouraging employees to contribute ideas and take ownership ...
Skills & Technologies
Overview
PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll work with Python, Java, AWS, and Kubernetes to manage incident processes and observability tools. This role requires solid programming experience and expertise in maintaining production services.
Job Description
Who you are
You have solid programming experience, particularly with Python (Django and AsyncIO) and/or Java (Spring Boot), which allows you to contribute effectively to service codebases and enhance production reliability. Your experience in maintaining an observability tools suite, specifically LGTM - Loki, Grafana, Tempo, and Mimir, equips you to manage and evolve the observability stack effectively.
You possess strong experience with AWS and Kubernetes, enabling you to maintain production applications and participate in on-call rotations. Your proficiency in working with relational databases, particularly PostgreSQL, and messaging systems like RabbitMQ, NATS, or Kafka, ensures that you can handle the complexities of production environments.
As an experienced on-call SRE engineer, you understand the importance of incident management processes and tools, and you are committed to driving efforts in observability and capacity planning. You enjoy collaborating with product engineers to foster SRE principles within the R&D organization and are eager to mentor both the SRE team and product engineers.
What you'll do
In this role, you will own and influence the incident management process from end to end, ensuring that incidents are managed efficiently and effectively. You will maintain and evolve the on-prem observability stack, keeping production applications running smoothly by participating in the on-call rotation and developing automations and tools to support platform reliability.
You will contribute to production services with a focus on performance and resiliency, collaborating closely with product engineers to implement SRE principles throughout the organization. Your role will involve mentoring junior engineers and fostering a culture of reliability and excellence within the team.
What we offer
PandaDoc offers a dynamic work environment where you can make a significant impact on the reliability of our services. You will have the opportunity to work with cutting-edge technologies and be part of a team that values collaboration and innovation. We encourage you to apply even if your experience doesn't match every requirement, as we believe in the potential of diverse backgrounds and perspectives.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at PandaDoc.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll work with Python, Java, AWS, and Kubernetes to manage incident processes and observability stacks. This role requires solid programming experience and expertise in maintaining production services.

Site Reliability Engineer
PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll manage incident processes, observability tools, and contribute to service codebases using Python and Java. This role requires solid experience in AWS and Kubernetes.

Site Reliability Engineer
Clarifai is seeking a Senior Site Reliability Engineer to ensure the smooth operation and high availability of their AI platform. You'll work with Kubernetes, Python, and Golang to address infrastructure challenges. This role requires expertise in cloud systems and microservice architecture.

Site Reliability Engineer
Wikimedia is hiring a Senior Site Reliability Engineer to operate and enhance systems for data-oriented teams. You'll work with technologies like Kubernetes and Hadoop to ensure system reliability and scalability. This role requires strong experience in distributed systems and automation.

Site Reliability Engineer
Wikimedia is hiring a Senior Site Reliability Engineer to operate and enhance systems for data-oriented teams. You'll work with technologies like Kubernetes, Hadoop, and Kafka to ensure system reliability and scalability. This role requires strong experience in distributed systems and automation.