ThoughtWorks

About ThoughtWorks

Transforming businesses through technology and innovation

🏢 Tech👥 5K-10K📅 Founded 1993📍 Chicago, Illinois, United States

Key Highlights

  • Headquartered in Chicago, Illinois, with 43 global offices
  • Approximately 7,000 employees worldwide
  • Serves clients including BMW, BBC, and the UN
  • Focus on software development and digital transformation

ThoughtWorks is a global technology consultancy headquartered in Chicago, Illinois, with over 43 offices across 14 countries. The company specializes in software development, digital transformation, and agile consulting, serving clients like BMW, the BBC, and the United Nations. With a workforce of ...

🎁 Benefits

ThoughtWorks offers competitive salaries, equity options, a generous PTO policy, and flexible remote work arrangements. Employees also benefit from a ...

🌟 Culture

ThoughtWorks fosters a culture of continuous learning and innovation, emphasizing agile methodologies and collaborative problem-solving. The company v...

Overview

ThoughtWorks is hiring a Lead Service Reliability Engineer to enhance operational efficiency and reliability within the infrastructure domain. You'll work with AWS, Docker, and Kubernetes to implement solutions that improve system performance. This role requires expertise in incident management and automation.

Job Description

Who you are

You have a strong background in Site Reliability Engineering (SRE) with a focus on reliability, resilience, and system performance — your experience allows you to champion SRE principles effectively. You understand the importance of integrating automation and monitoring into operational processes, ensuring that systems are not only reliable but also agile and customer-focused.

With a commitment to continuous improvement, you cultivate a collaborative culture within your team — you believe in shared responsibility and are dedicated to helping organizations meet their reliability and business objectives. Your technical expertise is complemented by your ability to communicate effectively with both technical and business stakeholders.

You have experience in enhancing incident management processes — you know how to develop prioritization matrices, manage triage, and conduct post-mortem analyses to implement corrective actions. Your proactive approach means you focus on improvements rather than just reactive fixes, ensuring that systems evolve alongside client needs.

You are familiar with cost optimization strategies and scalable solutions — your expertise plays a key role in streamlining operations and boosting efficiency. You thrive in environments that encourage curiosity and innovation, and you are eager to contribute to a team that values purpose and impact.

Desirable

Experience with cloud platforms such as AWS or Azure is a plus — familiarity with containerization technologies like Docker and orchestration tools like Kubernetes will set you apart. A solid understanding of Linux systems and scripting languages will enhance your ability to automate processes and improve system reliability.

What you'll do

As a Lead Service Reliability Engineer at ThoughtWorks, you will take a lead role in ensuring technical excellence within the DAMO service line. You will be responsible for understanding SRE goals from both technical and business perspectives, providing solutions that enhance reliability and system performance. Your role will involve identifying and implementing architectures that enable fault tolerance and improve response times during incidents.

You will enhance the incident management process by developing effective communication strategies during production incidents, ensuring that client stakeholders are kept informed and that issues are resolved efficiently. Your focus will be on proactive improvements, allowing the organization to move from traditional operations to a more agile approach.

You will work closely with cross-functional teams to integrate automation and monitoring into daily operations — your contributions will help facilitate a culture of continuous improvement and shared responsibility. You will also play a key role in mentoring junior engineers, fostering a collaborative environment where everyone can thrive.

What we offer

At ThoughtWorks, you will be part of a team that values innovation and curiosity — we encourage you to apply even if your experience doesn't match every requirement. You will have the opportunity to work on impactful projects that drive efficiency and reliability in our clients' systems. We offer a competitive salary and benefits package, along with opportunities for professional growth and development in a supportive environment.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at ThoughtWorks.

Similar Jobs You Might Like

Based on your interests and this role

ThoughtWorks

Site Reliability Engineer

ThoughtWorks📍 Singapore - On-Site

ThoughtWorks is hiring a Senior Site Reliability Engineer to ensure technical excellence and operational efficiency within the infrastructure domain. You'll specialize in reliability, resilience, and system performance while utilizing automation and monitoring tools. This role requires expertise in SRE principles and a commitment to continuous improvement.

🏛️ On-SiteSenior
23h ago
Point72

Site Reliability Engineer

Point72📍 India

Point72 is hiring a Site Reliability Engineer to develop and maintain complex distributed systems for their Macro Technology team. You'll focus on optimizing operations and ensuring system reliability. This role requires a strong background in software and systems engineering.

Mid-Level
2w ago
Groupon

Site Reliability Engineer

Groupon📍 Bangalore - On-Site

Groupon is seeking a Lead Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems. You'll collaborate with diverse teams to implement cutting-edge technologies and best practices. This role requires strong leadership and expertise in operational excellence.

🏛️ On-SiteLead
2 months ago
AvePoint

Site Reliability Engineer

AvePoint📍 Singapore

AvePoint is seeking a Site Reliability Engineer to build and operate a Whole-of-Government runtime platform. You'll design and manage AWS and Kubernetes-based infrastructure while ensuring system stability and performance. This role requires experience with GitLab and CI/CD automation.

Mid-Level
1w ago
Stripe

Integration Reliability Engineer

Stripe📍 Singapore

Stripe is hiring an Integration Reliability Engineer to ensure the health and stability of financial partner integrations. You'll work with SQL, Python, and Java to support Stripe's global money movement strategy. This role requires a strong understanding of technical systems and project management skills.

Mid-Level
21h ago