Nebius AI

About Nebius AI

Empowering AI with robust infrastructure solutions

🏢 Tech👥 51-250📅 Founded 2022📍 Amsterdam, North Holland, Netherlands

Key Highlights

  • Publicly traded on Nasdaq, expanding AI infrastructure market
  • Headquartered in Amsterdam with hubs in the US, Europe, and Israel
  • Team of around 400 skilled engineers focused on AI/ML
  • Specializes in large-scale GPU clusters and cloud platforms

Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...

🎁 Benefits

Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...

🌟 Culture

Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...

Overview

Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and systems management.

Job Description

Who you are

You have solid experience with programming languages such as Go, Python, or C++ — you've tackled complex problems and understand the nuances of each language. Your deep understanding of classic algorithms and data structures allows you to optimize solutions effectively. You possess commercial experience with Unix systems and network technology — navigating these environments is second nature to you. Your expertise extends to systems for containerization and configuration management, including tools like Ansible, Terraform, Docker, Kubernetes, and Helm — you know how to implement and improve CI/CD processes to enhance operational efficiency.

Desirable

You have a desire to be involved in backend development — your interest in backend systems drives you to explore new technologies and methodologies. Experience designing, developing, and running high-load distributed systems is a bonus — you understand the challenges and solutions that come with scaling applications. Additionally, commercial experience with various cloud platforms will set you apart — you are familiar with the intricacies of cloud infrastructure and its management.

What you'll do

In this role, you will ensure fault-tolerance, scale, and uninterrupted operations for Nebius's services — your contributions will directly impact the reliability of our cloud infrastructure. You will use cutting-edge cloud technology to solve a variety of infrastructure problems — your innovative approach will help us stay ahead in the rapidly evolving AI cloud landscape. Implementing and improving CI/CD processes will be a key responsibility — you will streamline our deployment pipelines to enhance efficiency and reduce downtime. Collaborating with a team of highly skilled engineers, you will tackle complex challenges and contribute to the overall success of our projects.

What we offer

Nebius AI provides a competitive salary and a comprehensive benefits package — we value our employees and invest in their growth. Opportunities for professional growth within Nebius are abundant — as we expand our products, you will have the chance to develop your skills and advance your career. We offer flexible working arrangements to accommodate your lifestyle — whether you prefer remote work or a hybrid model, we support your needs. Join a dynamic and collaborative work environment that values initiative and innovation — your contributions will help shape the future of AI cloud infrastructure.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Nebius AI.

Similar Jobs You Might Like

Based on your interests and this role

Nebius AI

Site Reliability Engineer

Nebius AI📍 Berlin - Remote

Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and Unix systems.

🏠 RemoteSenior
15h ago
Nebius AI

Site Reliability Engineer

Nebius AI📍 Amsterdam - Remote

Nebius AI is hiring a Senior Site Reliability Engineer to join the Compute Node team. You'll focus on Linux systems engineering and operational reliability while managing virtual machines across cloud regions. This position requires expertise in Linux and virtualization.

🏠 RemoteSenior
15h ago
Nebius AI

Site Reliability Engineer

Nebius AI📍 Prague - Remote

Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and Unix systems.

🏠 RemoteSenior
15h ago
PandaDoc

Site Reliability Engineer

PandaDoc📍 Remote (Europe) - Remote

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll manage incident processes, observability tools, and contribute to service codebases using Python and Java. This role requires solid experience in AWS and Kubernetes.

🏠 RemoteSenior
1d ago
PandaDoc

Site Reliability Engineer

PandaDoc📍 Germany - Remote

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll work with Python, Java, AWS, and Kubernetes to manage incident processes and observability stacks. This role requires solid programming experience and expertise in maintaining production services.

🏠 RemoteSenior
1d ago