Nebius AI

About Nebius AI

Empowering AI with robust infrastructure solutions

🏢 Tech👥 51-250📅 Founded 2022📍 Amsterdam, North Holland, Netherlands

Key Highlights

  • Publicly traded on Nasdaq, expanding AI infrastructure market
  • Headquartered in Amsterdam with hubs in the US, Europe, and Israel
  • Team of around 400 skilled engineers focused on AI/ML
  • Specializes in large-scale GPU clusters and cloud platforms

Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...

🎁 Benefits

Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...

🌟 Culture

Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...

Overview

Nebius AI is hiring a Network Site Reliability Engineer (NetSRE) to enhance network reliability and performance. You'll work with technologies like AWS, Docker, and Kubernetes. This position requires experience in network engineering and site reliability.

Job Description

Who you are

You have a strong background in network engineering and site reliability, with a focus on building and maintaining robust network infrastructures. You understand the importance of reliability goals and have experience defining SLIs, SLOs, and error budgets to ensure optimal performance. Your expertise in Linux and cloud technologies, particularly AWS, allows you to effectively manage and scale network services. You are proficient in automation tools such as Docker and Kubernetes, which you use to streamline operations and improve efficiency. Your experience with monitoring tools like Prometheus and Grafana enables you to proactively identify and resolve issues before they impact users. You are a collaborative team player who enjoys working with cross-functional teams to drive reliability improvements across the network.

Desirable

Experience with configuration management tools like Ansible is a plus, as is familiarity with version control systems such as Git. Knowledge of programming languages like Python will help you automate tasks and enhance network operations. You are passionate about continuous learning and staying updated with the latest trends in network reliability and cloud computing.

What you'll do

As a Network Site Reliability Engineer at Nebius AI, you will define and own reliability goals for network services, ensuring they meet the highest standards of availability and performance. You will drive reliability improvements across the entire network, focusing on site readiness, inter-site connectivity, and operational standards. Your role will involve owning incident response for your areas, leading investigations and postmortems to turn failures into durable fixes. You will collaborate closely with engineering teams to build the tooling and automation necessary to meet reliability targets. Your contributions will be critical in making the network safer to operate as the company scales quickly. You will also have opportunities to mentor junior engineers and contribute to the overall growth of the team.

What we offer

Nebius AI provides a competitive salary and a comprehensive benefits package, along with opportunities for professional growth within the company. We value initiative and innovation, fostering a dynamic and collaborative work environment. Flexible working arrangements are available, allowing you to balance your professional and personal life. Join us at Nebius AI and be part of a team that is shaping the future of AI cloud infrastructure.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Nebius AI.

Similar Jobs You Might Like

Based on your interests and this role

Nebius AI

Site Reliability Engineer

Nebius AI📍 Amsterdam - Remote

Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and systems management.

🏠 RemoteSenior
15h ago
Nebius AI

Site Reliability Engineer

Nebius AI📍 Amsterdam - Remote

Nebius AI is hiring a Senior Site Reliability Engineer to join the Compute Node team. You'll focus on Linux systems engineering and operational reliability while managing virtual machines across cloud regions. This position requires expertise in Linux and virtualization.

🏠 RemoteSenior
15h ago
PandaDoc

Site Reliability Engineer

PandaDoc📍 Remote (Europe) - Remote

PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll manage incident processes, observability tools, and contribute to service codebases using Python and Java. This role requires solid experience in AWS and Kubernetes.

🏠 RemoteSenior
1d ago
Optiver

Site Reliability Engineer

Optiver📍 Amsterdam - On-Site

Optiver is seeking a Site Reliability Engineer to manage and improve the reliability of their in-house trading systems. You'll work with technologies like Linux, Docker, and Kubernetes to ensure optimal performance in a high-pressure environment.

🏛️ On-Site
1 month ago
Cision

Site Reliability Engineer

Cision📍 India - Remote

Cision is hiring a Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of critical production services. You'll work independently while mentoring colleagues across various technical disciplines. This role is fully remote in India.

🏠 RemoteSenior
1 year ago