Crusoe

About Crusoe

Sustainable AI cloud solutions for a greener future

🏢 Tech👥 501-1000📅 Founded 2018📍 Denver, Colorado, United States

Key Highlights

  • Headquartered in Denver, Colorado
  • 501-1000 employees focused on AI and renewable energy
  • First vertically integrated AI cloud platform
  • Committed to sustainable computing practices

Crusoe is a pioneering AI cloud platform headquartered in Denver, Colorado, that utilizes clean, renewable energy to power its operations. The company focuses on providing scalable computing resources for AI and machine learning applications, serving a diverse range of clients across various industr...

🎁 Benefits

Crusoe offers competitive salaries, equity options, generous PTO, and a flexible remote work policy to support work-life balance....

🌟 Culture

Crusoe fosters a culture centered on sustainability and innovation, encouraging employees to contribute to environmentally friendly computing solution...

Crusoe

Infrastructure Engineer Senior

CrusoeSan Francisco - On-Site

Posted 3 months ago🏛️ On-SiteSeniorInfrastructure Engineer📍 San Francisco💰 $183,000 - $210,000 / yearly
Apply Now →

Overview

Crusoe is hiring a Senior Infrastructure Engineer to maintain and optimize high-performance GPU compute clusters. You'll work with NVIDIA and AMD technologies to ensure maximum uptime and reliability. This role requires hands-on experience with GPU troubleshooting and data center operations.

Job Description

Who you are

You have a strong background in infrastructure engineering with a focus on GPU technologies — your hands-on experience with high-performance GPU compute clusters has equipped you with the skills to diagnose and repair complex hardware issues effectively. You are familiar with NVIDIA and AMD GPUs and understand the intricacies of maintaining a scalable GPU fleet. Your troubleshooting skills are top-notch, allowing you to quickly identify and resolve hardware faults within GPU racks and high-density compute systems.

You thrive in collaborative environments, working closely with data center operations and engineering teams to ensure the health and performance of the infrastructure. Your ability to communicate effectively with vendors and internal teams makes you a key player in maintaining operational excellence. You are detail-oriented and committed to ensuring maximum uptime and reliability across the GPU fleet.

What you'll do

In this role, you will perform deep-level diagnosis and troubleshooting of hardware faults within GPU racks and high-density compute systems. You will be responsible for troubleshooting and supporting GPU platforms, including NVIDIA A100, H200, GB200, B200, and AMD 350X / 355X. Your expertise will be crucial in executing component-level diagnosis and remediation for failed or degraded hardware, ensuring that our GPU fleet operates at peak performance.

You will partner with data center operations to manage and perform field-replaceable unit (FRU) repairs for GPUs, power supplies, cooling systems, interconnects, and other critical components. Your role will involve collaborating with cross-functional teams to implement innovative solutions that enhance the efficiency and reliability of our infrastructure. You will also contribute to the development of best practices for GPU maintenance and operations, helping to shape the future of Crusoe's cloud infrastructure.

What we offer

At Crusoe, you will be part of a mission-driven team that is at the forefront of the AI revolution. We offer a competitive salary and benefits package, along with opportunities for professional growth and development. You will work in a dynamic environment where your contributions will have a tangible impact on the company's success and the advancement of sustainable technology. Join us in crafting the engine that powers a world where people can create ambitiously with AI.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Crusoe.

Similar Jobs You Might Like

Based on your interests and this role

Uber

Network Engineer

Uber📍 San Francisco - On-Site

Uber is hiring a Senior Infrastructure Engineer to develop intelligent network infrastructures and support cloud and on-prem networks. You'll work with GCP and automation tools to ensure network performance and scalability.

🏛️ On-SiteSenior
2 months ago
Hedra

Infrastructure Engineer

Hedra📍 San Francisco - On-Site

Hedra is seeking a Senior/Staff Infrastructure Engineer to ensure the reliability and operability of their core Python web services on AWS. You'll work with Kubernetes and PostgreSQL to build a resilient infrastructure. This role requires strong experience in cloud services and infrastructure management.

🏛️ On-SiteSenior
2 months ago
Chime

Software Engineering

Chime📍 San Francisco - On-Site

Chime is hiring a Senior Software Engineer for their Infrastructure Engineering team to develop tools for cloud infrastructure and automate service management. You'll work with Ruby, Go, and AWS to ensure platform stability and scalability. This position requires over 2 years of cloud and infrastructure experience.

🏛️ On-SiteSenior
1d ago
Sentry

Software Engineering

Sentry📍 San Francisco - Hybrid

Sentry is hiring a Senior Software Engineer for their Infrastructure team to design and maintain internal software systems that enhance development workflows. You'll work with technologies like Java, Python, and Docker in a hybrid work environment based in San Francisco.

🏢 HybridSenior
1 month ago
Baseten

Software Engineering

Baseten📍 San Francisco

Baseten is hiring a Senior Software Engineer - Infrastructure to architect and lead the development of their ML inference platform. You'll work with technologies like Kubernetes and AWS to optimize model serving. This position requires significant experience in infrastructure and machine learning.

Senior
11 months ago