
About Nebius AI
Empowering AI with robust infrastructure solutions
Key Highlights
- Publicly traded on Nasdaq, expanding AI infrastructure market
- Headquartered in Amsterdam with hubs in the US, Europe, and Israel
- Team of around 400 skilled engineers focused on AI/ML
- Specializes in large-scale GPU clusters and cloud platforms
Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...
🎁 Benefits
Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...
🌟 Culture
Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...
Overview
Nebius AI is seeking an Incident Manager to oversee incident and problem management processes across their data center infrastructure. You'll coordinate between internal teams and external partners to ensure effective resolution of issues. This role requires strong process improvement skills and a focus on automation.
Job Description
Who you are
You have a strong background in incident management, with experience in coordinating between various teams to resolve complex issues efficiently. You enjoy diving deep into processes and are committed to continuously improving workflows to enhance service reliability. Your ability to communicate effectively with both technical and non-technical stakeholders is essential for success in this role. You thrive in collaborative environments and are eager to contribute to a team that values innovation and initiative.
Desirable
Experience in cloud computing or AI infrastructure is a plus, as is familiarity with ITIL frameworks. You are proactive in seeking out opportunities for automation and process optimization, and you have a keen eye for detail that helps you identify areas for improvement. Your passion for technology and problem-solving drives you to stay updated on industry trends and best practices.
What you'll do
As an Incident Manager at Nebius AI, you will lead the Incident and Problem Management processes for all hardware, firmware, and IT operational issues that impact services. You will coordinate efforts between the IT Infrastructure team, Engineering, and external vendors to ensure quick and effective resolution of incidents. Your role will involve managing and maintaining the Knowledge Base, encouraging contributions to service documentation, and fostering a culture of continuous improvement within the team. You will also analyze incident trends to identify root causes and implement preventive measures, ensuring that the organization learns from past incidents to enhance future performance.
What we offer
Nebius AI provides a competitive salary and a comprehensive benefits package, along with opportunities for professional growth within the company. You will enjoy flexible working arrangements in a dynamic and collaborative work environment that values initiative and innovation. As Nebius continues to grow and expand its products, you will have the chance to work at the forefront of AI and cloud computing, contributing to solutions that address real-world challenges.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Nebius AI.
Similar Jobs You Might Like
Based on your interests and this role

Incident Manager
Crusoe is seeking an Incident Manager to lead the management of high-visibility technical incidents and customer escalations. You'll ensure service reliability and drive product improvements based on incident data. This role requires strong leadership skills and the ability to thrive in high-pressure situations.

Incident Manager

Incident Manager
Doctolib is seeking an Incident Manager Team Leader to enhance operational excellence and reliability within their Operations Center team. You'll lead a team focused on incident management and collaborate with engineering and business teams to improve platform reliability.