
About Microsoft
Empowering every person and organization on the planet
Key Highlights
- Market cap exceeds $2 trillion
- 100,000+ employees worldwide
- Leading cloud services through Azure
- Major clients include Walmart and BMW
Microsoft Corporation, headquartered in Redmond, Washington, is a leading technology company known for its software products like Windows and Office, as well as cloud services through Azure. With over 100,000 employees, Microsoft serves millions of customers globally, including major enterprises lik...
🎁 Benefits
Microsoft offers competitive salaries, stock options, generous PTO policies, and comprehensive health benefits. Employees also enjoy a flexible remote...
🌟 Culture
Microsoft fosters a culture of innovation and inclusivity, emphasizing collaboration across teams and a commitment to diversity. The company values em...
Skills & Technologies
Overview
Microsoft is hiring a Site Reliability Engineer 2 for the M365 Copilot App Platform team to enhance the reliability and performance of their services. You'll work with Azure, Kubernetes, and Docker to ensure system robustness. This position requires deep technical expertise in distributed systems and incident management.
Job Description
Who you are
You have 3+ years of experience in site reliability engineering or a related field, with a strong focus on distributed systems and infrastructure. You possess deep technical expertise in Azure and Linux, and you are comfortable working with Kubernetes and Docker to manage containerized applications. Your experience includes analyzing production telemetry and participating in incident response, ensuring that systems remain reliable and performant.
You are proficient in Python and have a solid understanding of CI/CD practices, which allows you to automate deployment pipelines effectively. Your familiarity with monitoring tools like Prometheus enables you to maintain observability across services, ensuring that any issues are quickly identified and resolved. You thrive in collaborative environments, working closely with partner teams to enhance service robustness and scalability.
What you'll do
As a Site Reliability Engineer II, you will be responsible for improving the availability, reliability, and performance of the M365 Copilot app's middle-tier services. You will analyze production telemetry to identify areas for improvement and participate in on-call rotations to respond to incidents as they arise. Your role will involve driving engineering changes that enhance service robustness at scale, ensuring that partner teams can depend on the platform for their AI-enabled experiences.
You will collaborate with cross-functional teams to develop and implement best practices for incident management and system monitoring. Your contributions will directly impact the success of Microsoft 365 Copilot, one of the company's key strategic products in the competitive AI landscape. You will also have opportunities to mentor junior engineers and contribute to a culture of continuous improvement within the team.
What we offer
Microsoft offers a competitive salary and benefits package, including opportunities for professional development and growth. You will work in a supportive environment that values collaboration and innovation, allowing you to make a meaningful impact on the future of AI at Microsoft. Join us in empowering every person and organization on the planet to achieve more.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Microsoft.