Nebius AI

About Nebius AI

Empowering AI with robust infrastructure solutions

🏢 Tech👥 51-250📅 Founded 2022📍 Amsterdam, North Holland, Netherlands

Key Highlights

  • Publicly traded on Nasdaq, expanding AI infrastructure market
  • Headquartered in Amsterdam with hubs in the US, Europe, and Israel
  • Team of around 400 skilled engineers focused on AI/ML
  • Specializes in large-scale GPU clusters and cloud platforms

Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...

🎁 Benefits

Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...

🌟 Culture

Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...

Overview

Nebius AI is seeking a Senior Software Engineer to join their Hardware Infrastructure Observability team. You'll design and develop services for monitoring server fleets and data center systems, utilizing skills in Python and Linux. This role is based in Amsterdam.

Job Description

Who you are

You have 5+ years of experience in software engineering, particularly in building and maintaining infrastructure observability systems. Your expertise in Python and Linux allows you to develop robust monitoring solutions that ensure the reliability of large-scale server fleets. You are familiar with containerization technologies such as Docker and orchestration tools like Kubernetes, which you have used to streamline deployment processes and enhance system performance. Your experience with monitoring tools like Prometheus and Grafana enables you to create insightful dashboards and alerts that help maintain system health. You thrive in collaborative environments, working closely with cross-functional teams to drive improvements and resolve incidents effectively. You are proactive in investigating issues and implementing root-cause fixes, ensuring that systems remain operational and efficient.

Desirable

Experience with cloud infrastructure and AI/ML systems is a plus, as is familiarity with incident response protocols and debugging techniques. You are comfortable working in a fast-paced environment and are eager to learn new technologies that can enhance your contributions to the team.

What you'll do

As a Senior Software Engineer at Nebius, you will be responsible for designing and developing services and agents that provide deep visibility into a large server fleet and data center engineering systems. You will evolve metrics, aggregation, and alerting pipelines to improve signal quality and ensure that the infrastructure remains healthy. Your role will involve building maintenance workflows and automation processes that facilitate safe and predictable fleet-wide changes. You will also investigate incidents hands-on, including on-host debugging, and drive root-cause fixes to enhance system reliability. Collaboration with other engineers and teams will be key as you work to improve the overall performance and efficiency of the infrastructure.

What we offer

Nebius offers a competitive salary and a comprehensive benefits package, along with opportunities for professional growth within the company. You will enjoy flexible working arrangements and be part of a dynamic and collaborative work environment that values initiative and innovation. As Nebius continues to grow and expand its products, you will have the chance to contribute to exciting projects that shape the future of AI cloud infrastructure.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Nebius AI.

Similar Jobs You Might Like

Based on your interests and this role

Nebius AI

Software Engineering

Nebius AI📍 United States

Nebius AI is hiring a Senior Software Engineer to design and build backend systems for observability in cloud infrastructure. You'll work with technologies like Java and Python, focusing on metrics and monitoring. This position requires strong production experience in system design.

Senior
15h ago
Together AI

Software Engineering

Together AI📍 San Francisco

Together AI is hiring a Senior Software Engineer to build and scale foundational systems for their AI Acceleration Cloud. You'll work with AWS, Azure, and GCP to develop robust distributed storage solutions and observability platforms. This role requires 5+ years of experience in building large-scale systems.

Senior
2 months ago
Nebius AI

Systems Engineer

Nebius AI📍 Amsterdam - On-Site

Nebius AI is seeking a Systems Engineer for their Servers Hardware R&D Team to design and maintain high-performance cloud systems for AI workloads. You'll work with technologies like GPUs and networking infrastructure in Amsterdam.

🏛️ On-SiteMid-Level
15h ago
Databricks

Software Engineering

Databricks📍 Bengaluru

Databricks is hiring a Senior Software Engineer for their Observability team to develop solutions that provide insights into product and infrastructure health. You'll work with technologies like Java, Python, and AWS in Bengaluru.

Senior
1d ago
Roku

Software Engineering

Roku📍 Cambridge

Roku is seeking a Senior Software Engineer specializing in Observability to design and implement innovative solutions for high-performance systems. You'll work with Golang to enhance open-source observability tools. This role requires experience in building scalable systems.

Senior
9h ago