About PandaDoc

Streamlining document workflows for growing organizations

🏢 Retail, Tech👥 251-1K📅 Founded 2013📍 San Francisco, California, United States

Key Highlights

Over 35,000 customers including Cisco and HubSpot
Headquartered in San Francisco, California
Raised $50M+ from investors like Rembrandt Venture Partners
Offers unlimited PTO and flexible remote work options

PandaDoc is a document workflow automation platform headquartered in San Francisco, California, that serves over 35,000 organizations, including notable clients like Cisco and HubSpot. The platform streamlines the creation, management, and signing of digital documents such as proposals, quotes, and ...

🎁 Benefits

PandaDoc offers competitive salaries, equity options, unlimited PTO, and a flexible remote work policy, allowing employees to maintain a healthy work-...

🌟 Culture

PandaDoc fosters a remote-friendly culture that emphasizes collaboration and innovation, encouraging employees to contribute ideas and take ownership ...

🌐 Website 💼 LinkedIn 𝕏 Twitter All 87 jobs →

Site Reliability Engineer • Senior

PandaDoc • Germany - Remote

Posted 1d ago🏠 Remote Senior Site Reliability Engineer 📍 Germany

Apply Now →

Skills & Technologies

python java spring boot aws kubernetes postgresql grafana rabbitmq kafka

Overview

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll work with Python, Java, AWS, and Kubernetes to manage incident processes and observability stacks. This role requires solid programming experience and expertise in maintaining production services.

Job Description

Who you are

You have solid programming experience, particularly with Python (Django and AsyncIO) and/or Java (Spring Boot), which allows you to contribute effectively to service codebases and enhance production services. Your experience in maintaining an observability tools suite, specifically LGTM - Loki, Grafana, Tempo, and Mimir, equips you to manage the observability stack efficiently. You possess strong experience with AWS and Kubernetes, ensuring that production applications run smoothly and reliably.

Your proficiency in working with relational databases, particularly PostgreSQL, and messaging systems such as RabbitMQ, NATS, or Kafka, enables you to handle data flow and communication effectively within the system. As an experienced on-call SRE engineer, you understand the importance of incident management processes and tools, and you are committed to minimizing downtime for customers.

You enjoy collaborating with product engineers to foster SRE principles within the R&D organization, and you are eager to mentor both the SRE team and product engineers. Your proactive approach to preventing incidents and resolving performance bottlenecks is a key asset in maintaining service resiliency.

Desirable

Experience with additional programming languages or tools is a plus, as is familiarity with CI/CD pipelines and infrastructure as code practices. You thrive in environments where you can influence processes and contribute to the overall reliability of services.

What you'll do

In this role, you will own and influence the incident management process end-to-end, ensuring that incidents are managed effectively and efficiently. You will maintain and evolve the on-prem observability stack, keeping production applications running smoothly by participating in the on-call rotation. Your contributions will include developing automations and tools to support platform reliability, which will be crucial in enhancing the overall performance of production services.

You will collaborate closely with product engineers to implement SRE principles, fostering a culture of reliability and resilience within the team. Your role will also involve mentoring junior engineers, sharing your knowledge and expertise to help them grow in their careers. By actively contributing to service codebases, you will help prevent incidents and resolve performance bottlenecks, ensuring that customers receive a reliable service with minimal downtime.

What we offer

PandaDoc offers a dynamic work environment where you can make a significant impact on the reliability of our services. We value collaboration and encourage you to apply even if your experience doesn't match every requirement. Join us in our mission to provide exceptional service to our customers while working with cutting-edge technologies in a supportive and innovative team.

We provide competitive compensation and benefits, along with opportunities for professional growth and development. As part of our team, you will have the chance to work on exciting projects that challenge your skills and expand your knowledge in the field of Site Reliability Engineering.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at PandaDoc.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Site Reliability Engineer

PandaDoc•📍 Remote (Europe) - Remote

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll manage incident processes, observability tools, and contribute to service codebases using Python and Java. This role requires solid experience in AWS and Kubernetes.

🏠 RemoteSenior

1d ago

Site Reliability Engineer

PandaDoc•📍 Spain - Remote

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll work with Python, Java, AWS, and Kubernetes to manage incident processes and observability tools. This role requires solid programming experience and expertise in maintaining production services.

🏠 RemoteSenior

1d ago

Site Reliability Engineer

Clarifai•📍 Canada - Remote

Clarifai is seeking a Senior Site Reliability Engineer to ensure the smooth operation and high availability of their AI platform. You'll work with Kubernetes, Python, and Golang to address infrastructure challenges. This role requires expertise in cloud systems and microservice architecture.

🏠 RemoteSenior

1 month ago

Site Reliability Engineer

Wikimedia•📍 Remote - Remote

Wikimedia is hiring a Senior Site Reliability Engineer to operate and enhance systems for data-oriented teams. You'll work with technologies like Kubernetes and Hadoop to ensure system reliability and scalability. This role requires strong experience in distributed systems and automation.

🏠 RemoteSenior

3w ago

Site Reliability Engineer

Wikimedia•📍 Remote - Remote

Wikimedia is hiring a Senior Site Reliability Engineer to operate and enhance systems for data-oriented teams. You'll work with technologies like Kubernetes, Hadoop, and Kafka to ensure system reliability and scalability. This role requires strong experience in distributed systems and automation.

🏠 RemoteSenior

3w ago

Browse all jobs →