
About Affirm
Transparent financing for modern consumers
Key Highlights
- 21M+ consumers and 337,000+ merchants using Affirm
- Raised $1.1B in funding, currently in Series F
- Flexible payback options from 3 to 36 months
- Headquartered in Chinatown, San Francisco, CA
Affirm, headquartered in Chinatown, San Francisco, CA, is a leading fintech company specializing in point-of-sale installment loans. With over 21 million consumers and 337,000+ merchants including Shopify, KAYAK, and Walmart, Affirm offers flexible payback options ranging from 3 to 36 months. The co...
🎁 Benefits
Affirm offers a remote-first workforce policy, allowing employees to work from anywhere in their home country. Benefits include 18 weeks of paid paren...
🌟 Culture
Affirm's culture is centered around transparency and consumer empowerment, with a focus on delivering honest financial products. The company actively ...
Skills & Technologies
Overview
Affirm is seeking a Staff Site Reliability Engineer to enhance platform reliability and incident management. You'll work with AWS, Docker, and Kubernetes to ensure application performance and resilience. This role requires extensive experience in SRE practices.
Job Description
Who you are
You are a seasoned Site Reliability Engineer with a strong background in software and systems engineering, possessing at least 5 years of experience in the field. You have a deep understanding of incident management and are skilled in building and iterating on reliability practices that enhance the overall performance of applications. Your expertise in cloud infrastructure, particularly with AWS, allows you to effectively manage and optimize resources for high availability and scalability.
You are proficient in containerization and orchestration technologies such as Docker and Kubernetes, enabling you to streamline deployment processes and improve operational efficiency. Your experience with monitoring tools like Prometheus equips you to provide visibility into application performance, ensuring that you can proactively address potential issues before they impact users. You are also familiar with infrastructure as code practices, particularly with Terraform, which you use to automate and manage infrastructure deployments.
You thrive in collaborative environments, engaging in architectural discussions and providing guidance on best practices for operating applications. Your ability to communicate effectively with engineering teams helps foster a culture of reliability and accountability. You are passionate about mentoring others and sharing your knowledge to elevate the team's overall capabilities.
Desirable
Experience with chaos engineering and load testing is a plus, as it demonstrates your commitment to building resilient systems. Familiarity with configuration management tools and observability frameworks will further enhance your contributions to the SRE team.
What you'll do
In this role, you will lead efforts to define and implement frameworks for operating applications effectively. You will guide the development of Service Level Objectives (SLOs) and drive the incident management process, ensuring that incidents are handled efficiently and lessons learned are documented for future reference. Your responsibilities will include steering the implementation of change management practices and engaging in architectural conversations to improve system reliability.
You will collaborate closely with engineering teams to provide training and consulting on best practices for application performance. By recommending observability and alerting configurations, you will help teams gain insights into their systems, enabling them to make informed decisions about performance improvements. Your role will also involve building tooling that enhances operational capabilities and supports the SRE team's mission to protect the customer experience.
What we offer
At Affirm, you will be part of a small yet crucial team that plays a vital role in the company's mission to reinvent credit. We offer a supportive environment where you can grow your skills and make a significant impact on our engineering organization. You will have the opportunity to work remotely from Spain, allowing for a flexible work-life balance. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds in our team.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Affirm.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
Affirm is seeking a Staff Site Reliability Engineer to enhance platform reliability and incident management. You'll work with AWS, Docker, and Kubernetes to ensure application performance and resilience. This role requires extensive experience in SRE practices.

Site Reliability Engineer
Affirm is seeking a Senior Site Reliability Engineer to enhance the reliability of their cloud infrastructure. You'll work with Kubernetes and automation tools to ensure a robust environment for engineering teams. This role is fully remote based in Spain.

Site Reliability Engineer
PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll work with Python, Java, AWS, and Kubernetes to manage incident processes and observability tools. This role requires solid programming experience and expertise in maintaining production services.

Site Reliability Engineer
Affirm is seeking a Senior Site Reliability Engineer to enhance the reliability of their cloud infrastructure. You'll work with Kubernetes and automation tools to support Affirm's engineering teams. This role requires strong cloud engineering skills and experience in operational excellence.

Site Reliability Engineer
PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll manage incident processes, observability tools, and contribute to service codebases using Python and Java. This role requires solid experience in AWS and Kubernetes.