
About Nebius AI
Empowering AI with robust infrastructure solutions
Key Highlights
- Publicly traded on Nasdaq, expanding AI infrastructure market
- Headquartered in Amsterdam with hubs in the US, Europe, and Israel
- Team of around 400 skilled engineers focused on AI/ML
- Specializes in large-scale GPU clusters and cloud platforms
Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...
🎁 Benefits
Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...
🌟 Culture
Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...
Overview
Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and systems management.
Job Description
Who you are
You have solid experience with programming languages such as Go, Python, or C++ — you've tackled complex problems and understand the nuances of each language. Your deep understanding of classic algorithms and data structures allows you to optimize solutions effectively. You possess commercial experience with Unix systems and network technology — navigating these environments is second nature to you. Your expertise extends to systems for containerization and configuration management, including tools like Ansible, Terraform, Docker, Kubernetes, and Helm — you know how to implement and improve CI/CD processes to enhance operational efficiency.
Desirable
You have a desire to be involved in backend development — your interest in backend systems drives you to explore new technologies and methodologies. Experience designing, developing, and running high-load distributed systems is a bonus — you understand the challenges and solutions that come with scaling applications. Additionally, commercial experience with various cloud platforms will set you apart — you are familiar with the intricacies of cloud infrastructure and its management.
What you'll do
In this role, you will ensure fault-tolerance, scale, and uninterrupted operations for Nebius's services — your contributions will directly impact the reliability of our cloud infrastructure. You will use cutting-edge cloud technology to solve a variety of infrastructure problems — your innovative approach will help us stay ahead in the rapidly evolving AI cloud landscape. Implementing and improving CI/CD processes will be a key responsibility — you will streamline our deployment pipelines to enhance efficiency and reduce downtime. Collaborating with a team of highly skilled engineers, you will tackle complex challenges and contribute to the overall success of our projects.
What we offer
Nebius AI provides a competitive salary and a comprehensive benefits package — we value our employees and invest in their growth. Opportunities for professional growth within Nebius are abundant — as we expand our products, you will have the chance to develop your skills and advance your career. We offer flexible working arrangements to accommodate your lifestyle — whether you prefer remote work or a hybrid model, we support your needs. Join a dynamic and collaborative work environment that values initiative and innovation — your contributions will help shape the future of AI cloud infrastructure.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Nebius AI.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and Unix systems.

Site Reliability Engineer
Nebius AI is hiring a Senior Site Reliability Engineer to join the Compute Node team. You'll focus on Linux systems engineering and operational reliability while managing virtual machines across cloud regions. This position requires expertise in Linux and virtualization.

Site Reliability Engineer
Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and Unix systems.

Site Reliability Engineer
PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll manage incident processes, observability tools, and contribute to service codebases using Python and Java. This role requires solid experience in AWS and Kubernetes.

Site Reliability Engineer
PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll work with Python, Java, AWS, and Kubernetes to manage incident processes and observability stacks. This role requires solid programming experience and expertise in maintaining production services.