
About Nebius AI
Empowering AI with robust infrastructure solutions
Key Highlights
- Publicly traded on Nasdaq, expanding AI infrastructure market
- Headquartered in Amsterdam with hubs in the US, Europe, and Israel
- Team of around 400 skilled engineers focused on AI/ML
- Specializes in large-scale GPU clusters and cloud platforms
Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...
🎁 Benefits
Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...
🌟 Culture
Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...
Skills & Technologies
Overview
Nebius AI is hiring a Network Site Reliability Engineer (NetSRE) to enhance network reliability and performance. You'll work with technologies like AWS, Docker, and Kubernetes. This position requires experience in network engineering and site reliability.
Job Description
Who you are
You have a strong background in network engineering and site reliability, with a focus on building and maintaining robust network infrastructures. You understand the importance of reliability goals and have experience defining SLIs, SLOs, and error budgets to ensure optimal performance. Your expertise in Linux and cloud technologies, particularly AWS, allows you to effectively manage and scale network services. You are proficient in automation tools such as Docker and Kubernetes, which you use to streamline operations and improve efficiency. Your experience with monitoring tools like Prometheus and Grafana enables you to proactively identify and resolve issues before they impact users. You are a collaborative team player who enjoys working with cross-functional teams to drive reliability improvements across the network.
Desirable
Experience with configuration management tools like Ansible is a plus, as is familiarity with version control systems such as Git. Knowledge of programming languages like Python will help you automate tasks and enhance network operations. You are passionate about continuous learning and staying updated with the latest trends in network reliability and cloud computing.
What you'll do
As a Network Site Reliability Engineer at Nebius AI, you will define and own reliability goals for network services, ensuring they meet the highest standards of availability and performance. You will drive reliability improvements across the entire network, focusing on site readiness, inter-site connectivity, and operational standards. Your role will involve owning incident response for your areas, leading investigations and postmortems to turn failures into durable fixes. You will collaborate closely with engineering teams to build the tooling and automation necessary to meet reliability targets. Your contributions will be critical in making the network safer to operate as the company scales quickly. You will also have opportunities to mentor junior engineers and contribute to the overall growth of the team.
What we offer
Nebius AI provides a competitive salary and a comprehensive benefits package, along with opportunities for professional growth within the company. We value initiative and innovation, fostering a dynamic and collaborative work environment. Flexible working arrangements are available, allowing you to balance your professional and personal life. Join us at Nebius AI and be part of a team that is shaping the future of AI cloud infrastructure.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Nebius AI.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
Nebius AI is seeking a Senior Site Reliability Engineer to ensure fault-tolerance and scale for their cloud services. You'll work with technologies like Go, Python, and Kubernetes to solve infrastructure challenges. This role requires solid experience in programming and systems management.

Site Reliability Engineer
Nebius AI is hiring a Senior Site Reliability Engineer to join the Compute Node team. You'll focus on Linux systems engineering and operational reliability while managing virtual machines across cloud regions. This position requires expertise in Linux and virtualization.

Site Reliability Engineer
PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll manage incident processes, observability tools, and contribute to service codebases using Python and Java. This role requires solid experience in AWS and Kubernetes.

Site Reliability Engineer
Optiver is seeking a Site Reliability Engineer to manage and improve the reliability of their in-house trading systems. You'll work with technologies like Linux, Docker, and Kubernetes to ensure optimal performance in a high-pressure environment.

Site Reliability Engineer
Cision is hiring a Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of critical production services. You'll work independently while mentoring colleagues across various technical disciplines. This role is fully remote in India.