Microsoft

About Microsoft

Empowering every person and organization on the planet

🏢 Tech👥 100K+📅 Founded 1975📍 Redmond, Washington, United States

Key Highlights

  • Market cap exceeds $2 trillion
  • 100,000+ employees worldwide
  • Leading cloud services through Azure
  • Major clients include Walmart and BMW

Microsoft Corporation, headquartered in Redmond, Washington, is a leading technology company known for its software products like Windows and Office, as well as cloud services through Azure. With over 100,000 employees, Microsoft serves millions of customers globally, including major enterprises lik...

🎁 Benefits

Microsoft offers competitive salaries, stock options, generous PTO policies, and comprehensive health benefits. Employees also enjoy a flexible remote...

🌟 Culture

Microsoft fosters a culture of innovation and inclusivity, emphasizing collaboration across teams and a commitment to diversity. The company values em...

Microsoft

Site Reliability Engineer Mid-Level

MicrosoftRedmond - On-Site

Posted 2w ago🏛️ On-SiteMid-LevelSite Reliability Engineer📍 Redmond💰 $100,600 - $199,000 / yearly
Apply Now →

Overview

Microsoft is hiring a Site Reliability Engineer 2 for the M365 Copilot App Platform team to enhance the reliability and performance of their services. You'll work with Azure, Kubernetes, and Docker to ensure system robustness. This position requires deep technical expertise in distributed systems and incident management.

Job Description

Who you are

You have 3+ years of experience in site reliability engineering or a related field, with a strong focus on distributed systems and infrastructure. You possess deep technical expertise in Azure and Linux, and you are comfortable working with Kubernetes and Docker to manage containerized applications. Your experience includes analyzing production telemetry and participating in incident response, ensuring that systems remain reliable and performant.

You are proficient in Python and have a solid understanding of CI/CD practices, which allows you to automate deployment pipelines effectively. Your familiarity with monitoring tools like Prometheus enables you to maintain observability across services, ensuring that any issues are quickly identified and resolved. You thrive in collaborative environments, working closely with partner teams to enhance service robustness and scalability.

What you'll do

As a Site Reliability Engineer II, you will be responsible for improving the availability, reliability, and performance of the M365 Copilot app's middle-tier services. You will analyze production telemetry to identify areas for improvement and participate in on-call rotations to respond to incidents as they arise. Your role will involve driving engineering changes that enhance service robustness at scale, ensuring that partner teams can depend on the platform for their AI-enabled experiences.

You will collaborate with cross-functional teams to develop and implement best practices for incident management and system monitoring. Your contributions will directly impact the success of Microsoft 365 Copilot, one of the company's key strategic products in the competitive AI landscape. You will also have opportunities to mentor junior engineers and contribute to a culture of continuous improvement within the team.

What we offer

Microsoft offers a competitive salary and benefits package, including opportunities for professional development and growth. You will work in a supportive environment that values collaboration and innovation, allowing you to make a meaningful impact on the future of AI at Microsoft. Join us in empowering every person and organization on the planet to achieve more.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Microsoft.