
About ThoughtWorks
Transforming businesses through technology and innovation
Key Highlights
- Headquartered in Chicago, Illinois, with 43 global offices
- Approximately 7,000 employees worldwide
- Serves clients including BMW, BBC, and the UN
- Focus on software development and digital transformation
ThoughtWorks is a global technology consultancy headquartered in Chicago, Illinois, with over 43 offices across 14 countries. The company specializes in software development, digital transformation, and agile consulting, serving clients like BMW, the BBC, and the United Nations. With a workforce of ...
🎁 Benefits
ThoughtWorks offers competitive salaries, equity options, a generous PTO policy, and flexible remote work arrangements. Employees also benefit from a ...
🌟 Culture
ThoughtWorks fosters a culture of continuous learning and innovation, emphasizing agile methodologies and collaborative problem-solving. The company v...
Skills & Technologies
Overview
ThoughtWorks is hiring a Lead Service Reliability Engineer to enhance operational efficiency and reliability within the infrastructure domain. You'll work with AWS, Docker, and Kubernetes to implement solutions that improve system performance. This role requires expertise in incident management and automation.
Job Description
Who you are
You have a strong background in Site Reliability Engineering (SRE) with a focus on reliability, resilience, and system performance — your experience allows you to champion SRE principles effectively. You understand the importance of integrating automation and monitoring into operational processes, ensuring that systems are not only reliable but also agile and customer-focused.
With a commitment to continuous improvement, you cultivate a collaborative culture within your team — you believe in shared responsibility and are dedicated to helping organizations meet their reliability and business objectives. Your technical expertise is complemented by your ability to communicate effectively with both technical and business stakeholders.
You have experience in enhancing incident management processes — you know how to develop prioritization matrices, manage triage, and conduct post-mortem analyses to implement corrective actions. Your proactive approach means you focus on improvements rather than just reactive fixes, ensuring that systems evolve alongside client needs.
You are familiar with cost optimization strategies and scalable solutions — your expertise plays a key role in streamlining operations and boosting efficiency. You thrive in environments that encourage curiosity and innovation, and you are eager to contribute to a team that values purpose and impact.
Desirable
Experience with cloud platforms such as AWS or Azure is a plus — familiarity with containerization technologies like Docker and orchestration tools like Kubernetes will set you apart. A solid understanding of Linux systems and scripting languages will enhance your ability to automate processes and improve system reliability.
What you'll do
As a Lead Service Reliability Engineer at ThoughtWorks, you will take a lead role in ensuring technical excellence within the DAMO service line. You will be responsible for understanding SRE goals from both technical and business perspectives, providing solutions that enhance reliability and system performance. Your role will involve identifying and implementing architectures that enable fault tolerance and improve response times during incidents.
You will enhance the incident management process by developing effective communication strategies during production incidents, ensuring that client stakeholders are kept informed and that issues are resolved efficiently. Your focus will be on proactive improvements, allowing the organization to move from traditional operations to a more agile approach.
You will work closely with cross-functional teams to integrate automation and monitoring into daily operations — your contributions will help facilitate a culture of continuous improvement and shared responsibility. You will also play a key role in mentoring junior engineers, fostering a collaborative environment where everyone can thrive.
What we offer
At ThoughtWorks, you will be part of a team that values innovation and curiosity — we encourage you to apply even if your experience doesn't match every requirement. You will have the opportunity to work on impactful projects that drive efficiency and reliability in our clients' systems. We offer a competitive salary and benefits package, along with opportunities for professional growth and development in a supportive environment.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at ThoughtWorks.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
ThoughtWorks is hiring a Senior Site Reliability Engineer to ensure technical excellence and operational efficiency within the infrastructure domain. You'll specialize in reliability, resilience, and system performance while utilizing automation and monitoring tools. This role requires expertise in SRE principles and a commitment to continuous improvement.

Site Reliability Engineer
Point72 is hiring a Site Reliability Engineer to develop and maintain complex distributed systems for their Macro Technology team. You'll focus on optimizing operations and ensuring system reliability. This role requires a strong background in software and systems engineering.

Site Reliability Engineer
Groupon is seeking a Lead Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems. You'll collaborate with diverse teams to implement cutting-edge technologies and best practices. This role requires strong leadership and expertise in operational excellence.

Site Reliability Engineer
AvePoint is seeking a Site Reliability Engineer to build and operate a Whole-of-Government runtime platform. You'll design and manage AWS and Kubernetes-based infrastructure while ensuring system stability and performance. This role requires experience with GitLab and CI/CD automation.

Integration Reliability Engineer
Stripe is hiring an Integration Reliability Engineer to ensure the health and stability of financial partner integrations. You'll work with SQL, Python, and Java to support Stripe's global money movement strategy. This role requires a strong understanding of technical systems and project management skills.