About Nebius AI

Empowering AI with robust infrastructure solutions

🏢 Tech👥 51-250📅 Founded 2022📍 Amsterdam, North Holland, Netherlands

Key Highlights

Publicly traded on Nasdaq, expanding AI infrastructure market
Headquartered in Amsterdam with hubs in the US, Europe, and Israel
Team of around 400 skilled engineers focused on AI/ML
Specializes in large-scale GPU clusters and cloud platforms

Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...

🎁 Benefits

Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...

🌟 Culture

Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...

🌐 Website 💼 LinkedIn All 257 jobs →

Site Reliability Engineer • Senior

Nebius AI • Prague - Remote

Posted 15h ago🏠 Remote Senior Site Reliability Engineer 📍 Prague

Apply Now →

Skills & Technologies

Go Python C++Unix Ansible Terraform Docker Kubernetes Helm

Overview

Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and Unix systems.

Job Description

Who you are

You have solid experience with programming languages such as Go, Python, or C++ — your expertise allows you to tackle complex problems and implement effective solutions in cloud environments. Your understanding of classic algorithms and data structures enables you to optimize performance and efficiency in your work.

You possess a deep understanding of Unix systems and network technology — this knowledge is crucial for ensuring the reliability and scalability of services. Your commercial experience with these systems equips you to handle real-world challenges effectively.

You have experience with systems for containerization and configuration management, including tools like Ansible, Salt, Terraform, Docker, Kubernetes, and Helm — these skills are essential for managing and deploying applications in a cloud environment. Your familiarity with these technologies allows you to streamline processes and improve operational efficiency.

You are eager to be involved in backend development — your interest in this area drives you to collaborate with other engineers and contribute to the development of high-load distributed systems. Your desire to learn and grow in this field is evident in your proactive approach to professional development.

You have commercial experience with a variety of cloud platforms — this experience gives you a broad perspective on cloud technologies and their applications in different contexts. Your ability to adapt to various platforms enhances your versatility as an engineer.

Desirable

You have a desire to be involved in backend development — this interest motivates you to explore new technologies and methodologies that can enhance your contributions to the team. Your willingness to learn and grow in this area is a valuable asset.

What you'll do

In this role, you will ensure fault-tolerance, scale, and uninterrupted operations for Nebius AI's services — your primary responsibility will be to maintain the reliability of cloud infrastructure while addressing any issues that arise. You will use cutting-edge cloud technology to solve a variety of infrastructure problems, ensuring that services run smoothly and efficiently.

You will implement and improve CI/CD processes — your expertise in continuous integration and deployment will help streamline development workflows and enhance collaboration among team members. You will work closely with other engineers to identify areas for improvement and implement best practices.

You will collaborate with cross-functional teams to design and develop high-load distributed systems — your ability to work effectively with others will be key to delivering robust solutions that meet the needs of customers. You will engage in discussions about system architecture and design, contributing your insights and expertise.

You will participate in coding interviews as part of the hiring process — your experience and knowledge will help identify top talent for the team. You will play a role in shaping the future of the engineering team by selecting candidates who align with Nebius AI's values and goals.

What we offer

Nebius AI offers a competitive salary and comprehensive benefits package — we value our employees and strive to provide a supportive work environment that fosters growth and development. You will have opportunities for professional growth within the company, allowing you to advance your career while contributing to innovative projects.

We provide flexible working arrangements — whether you prefer to work remotely or in the office, we accommodate your needs to ensure a healthy work-life balance. Our dynamic and collaborative work environment values initiative and innovation, encouraging you to share your ideas and contribute to the team's success.

As part of a growing company, you will have the chance to work at the cutting edge of AI cloud infrastructure — your contributions will directly impact the development of tools and resources that help customers solve real-world challenges. You will be part of a team that is dedicated to transforming industries and shaping the future of cloud computing.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Nebius AI.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll manage incident processes, observability tools, and contribute to service codebases using Python and Java. This role requires solid experience in AWS and Kubernetes.

🏠 RemoteSenior

1d ago

Site Reliability Engineer

PandaDoc•📍 Germany - Remote

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll work with Python, Java, AWS, and Kubernetes to manage incident processes and observability stacks. This role requires solid programming experience and expertise in maintaining production services.

🏠 RemoteSenior

1d ago

Site Reliability Engineer

Clarifai•📍 Canada - Remote

Clarifai is seeking a Senior Site Reliability Engineer to ensure the smooth operation and high availability of their AI platform. You'll work with Kubernetes, Python, and Golang to address infrastructure challenges. This role requires expertise in cloud systems and microservice architecture.

🏠 RemoteSenior

1 month ago

Browse all jobs →