Nebius AI

About Nebius AI

Empowering AI with robust infrastructure solutions

🏢 Tech👥 51-250📅 Founded 2022📍 Amsterdam, North Holland, Netherlands

Key Highlights

  • Publicly traded on Nasdaq, expanding AI infrastructure market
  • Headquartered in Amsterdam with hubs in the US, Europe, and Israel
  • Team of around 400 skilled engineers focused on AI/ML
  • Specializes in large-scale GPU clusters and cloud platforms

Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...

🎁 Benefits

Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...

🌟 Culture

Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...

Skills & Technologies

Overview

Nebius AI is hiring a Senior Site Reliability Engineer to join the Compute Node team. You'll focus on Linux systems engineering and operational reliability while managing virtual machines across cloud regions. This position requires expertise in Linux and virtualization.

Job Description

Who you are

You have 5+ years of experience in site reliability engineering, with a strong focus on Linux systems and virtualization technologies. Your background includes troubleshooting complex production issues and ensuring the reliability and performance of compute nodes running virtual machines. You understand the intricacies of operating systems and hypervisors, and you are adept at analyzing and debugging systems across user space and kernel space.

You thrive in collaborative environments and enjoy working closely with cross-functional teams to embed reliability and observability into cloud infrastructure. Your problem-solving skills are top-notch, and you have a keen ability to analyze trade-offs at various layers of the system. You are passionate about cloud computing and are excited to contribute to the AI economy.

What you'll do

As a Senior Site Reliability Engineer at Nebius AI, you will be responsible for ensuring the reliability, availability, and performance of compute nodes across all cloud regions. You will analyze and debug Linux systems, working closely with the Compute Node team to enhance the cluster scheduler and node-level services. Your role will involve troubleshooting complex production issues and implementing solutions that improve operational reliability.

You will collaborate with other engineers to shape how reliability and observability are integrated into the Compute platform. Your contributions will directly impact the performance of virtual machines and the overall efficiency of the cloud infrastructure. You will also have opportunities to mentor junior engineers and share your expertise within the team.

What we offer

Nebius AI provides a competitive salary and a comprehensive benefits package, along with opportunities for professional growth within the company. We value initiative and innovation, fostering a dynamic and collaborative work environment. With flexible working arrangements, you can balance your professional and personal life while contributing to cutting-edge AI cloud infrastructure.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Nebius AI.

Similar Jobs You Might Like

Based on your interests and this role

Nebius AI

Site Reliability Engineer

Nebius AI📍 Amsterdam - Remote

Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and systems management.

🏠 RemoteSenior
22h ago
Nebius AI

Site Reliability Engineer

Nebius AI📍 Berlin - Remote

Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and Unix systems.

🏠 RemoteSenior
22h ago
Nebius AI

Site Reliability Engineer

Nebius AI📍 Prague - Remote

Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and Unix systems.

🏠 RemoteSenior
22h ago
PandaDoc

Site Reliability Engineer

PandaDoc📍 Remote (Europe) - Remote

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll manage incident processes, observability tools, and contribute to service codebases using Python and Java. This role requires solid experience in AWS and Kubernetes.

🏠 RemoteSenior
1d ago
Nebius AI

Site Reliability Engineer

Nebius AI📍 Amsterdam - Remote

Nebius AI is hiring a Network Site Reliability Engineer (NetSRE) to enhance network reliability and performance. You'll work with technologies like AWS, Docker, and Kubernetes. This position requires experience in network engineering and site reliability.

🏠 RemoteMid-Level
22h ago