
About Nebius AI
Empowering AI with robust infrastructure solutions
Key Highlights
- Publicly traded on Nasdaq, expanding AI infrastructure market
- Headquartered in Amsterdam with hubs in the US, Europe, and Israel
- Team of around 400 skilled engineers focused on AI/ML
- Specializes in large-scale GPU clusters and cloud platforms
Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...
🎁 Benefits
Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...
🌟 Culture
Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...
Skills & Technologies
Overview
Nebius AI is seeking a Senior HPC Cluster Engineer to enhance and optimize their cloud platform focused on GPU computing and InfiniBand networks. You'll work with technologies like KVM and QEMU to ensure high performance in multi-GPU environments. This role requires expertise in Linux and virtualization technologies.
Job Description
Who you are
You have extensive experience in high-performance computing (HPC) environments, particularly with GPU clusters and InfiniBand networks. Your background includes a strong understanding of Linux systems and virtualization technologies such as KVM and QEMU. You are skilled in performance tuning and have a knack for troubleshooting complex infrastructure issues. You thrive in collaborative settings, working closely with hardware and software teams to optimize system performance. You are proactive in automating fault detection and resolution processes, ensuring the reliability of HPC systems. You are passionate about AI and cloud computing, eager to contribute to innovative solutions that drive the AI economy forward.
Desirable
Experience with cloud infrastructure and a solid understanding of AI/ML workloads would be a plus. Familiarity with automation tools and scripting languages can enhance your effectiveness in this role. A background in working with large-scale systems and a keen interest in emerging technologies will set you apart.
What you'll do
As a Senior HPC Cluster Engineer at Nebius AI, you will play a pivotal role in the development of our cutting-edge hyperscaler platform. You will be responsible for tuning the performance of GPU clusters and InfiniBand networks, ensuring optimal operation in multi-GPU environments. Your role will involve analyzing and improving infrastructure to support new hardware, as well as fine-tuning system performance to meet the demands of our AI-driven applications. You will collaborate with cross-functional teams to implement enhancements and optimizations that elevate our cloud platform's capabilities. Additionally, you will automate fault detection and resolution processes, contributing to the overall reliability and efficiency of our systems. Your expertise will help shape the future of AI cloud infrastructure, making a significant impact on how industries leverage AI technologies.
What we offer
At Nebius AI, we provide a competitive salary and a comprehensive benefits package designed to support your professional growth. You will have opportunities for career advancement within our organization, as we are committed to fostering talent and innovation. We offer flexible working arrangements to accommodate your lifestyle, along with a collaborative work environment that values initiative and creativity. Join us in our mission to transform industries through AI and cloud computing, and be part of a team that is at the forefront of technological advancement.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Nebius AI.
Similar Jobs You Might Like
Based on your interests and this role

Hpc Cluster Engineer
Nebius AI is seeking a Senior HPC Cluster Engineer to enhance and optimize their cutting-edge hyperscaler platform. You'll work with GPU computing and InfiniBand networks, focusing on performance tuning and automation. This role requires expertise in high-performance computing environments.

Hypervisor Engineer
Nebius AI is seeking a Senior Hypervisor Engineer to develop their hyperscaler platform, focusing on KVM hypervisor and QEMU device emulator. You'll optimize I/O for virtual machines and integrate hypervisor services. This role requires expertise in virtualization technologies.

Hpc Software Engineer
Canonical is hiring an HPC Software Engineer to deliver an outstanding HPC experience as part of the broader Ubuntu platform. You'll focus on Python software development for automation in the HPC sphere. This role requires strong mathematical and scientific skills.

Systems Engineer
Nebius AI is seeking a Systems Engineer to support benchmarking of GPU platforms for machine learning and AI workloads. You'll work closely with hardware and development teams to evaluate GPU performance using technologies like CUDA. This position requires expertise in AI and deep learning frameworks.

Systems Engineer
SpaceX is seeking a Senior HPC Systems Engineer to manage HPC clusters and provide application support across engineering disciplines. You'll work with Linux and virtualization technologies in Hawthorne, CA.