
About Crusoe
Sustainable AI cloud solutions for a greener future
Key Highlights
- Headquartered in Denver, Colorado
- 501-1000 employees focused on AI and renewable energy
- First vertically integrated AI cloud platform
- Committed to sustainable computing practices
Crusoe is a pioneering AI cloud platform headquartered in Denver, Colorado, that utilizes clean, renewable energy to power its operations. The company focuses on providing scalable computing resources for AI and machine learning applications, serving a diverse range of clients across various industr...
🎁 Benefits
Crusoe offers competitive salaries, equity options, generous PTO, and a flexible remote work policy to support work-life balance....
🌟 Culture
Crusoe fosters a culture centered on sustainability and innovation, encouraging employees to contribute to environmentally friendly computing solution...
Skills & Technologies
Overview
Crusoe is hiring a Senior Infrastructure Engineer to maintain and optimize high-performance GPU compute clusters. You'll work with NVIDIA and AMD technologies to ensure maximum uptime and reliability. This role requires hands-on experience with GPU troubleshooting and data center operations.
Job Description
Who you are
You have a strong background in infrastructure engineering with a focus on GPU technologies — your hands-on experience with high-performance GPU compute clusters has equipped you with the skills to diagnose and repair complex hardware issues effectively. You are familiar with NVIDIA and AMD GPUs and understand the intricacies of maintaining a scalable GPU fleet. Your troubleshooting skills are top-notch, allowing you to quickly identify and resolve hardware faults within GPU racks and high-density compute systems.
You thrive in collaborative environments, working closely with data center operations and engineering teams to ensure the health and performance of the infrastructure. Your ability to communicate effectively with vendors and internal teams makes you a key player in maintaining operational excellence. You are detail-oriented and committed to ensuring maximum uptime and reliability across the GPU fleet.
What you'll do
In this role, you will perform deep-level diagnosis and troubleshooting of hardware faults within GPU racks and high-density compute systems. You will be responsible for troubleshooting and supporting GPU platforms, including NVIDIA A100, H200, GB200, B200, and AMD 350X / 355X. Your expertise will be crucial in executing component-level diagnosis and remediation for failed or degraded hardware, ensuring that our GPU fleet operates at peak performance.
You will partner with data center operations to manage and perform field-replaceable unit (FRU) repairs for GPUs, power supplies, cooling systems, interconnects, and other critical components. Your role will involve collaborating with cross-functional teams to implement innovative solutions that enhance the efficiency and reliability of our infrastructure. You will also contribute to the development of best practices for GPU maintenance and operations, helping to shape the future of Crusoe's cloud infrastructure.
What we offer
At Crusoe, you will be part of a mission-driven team that is at the forefront of the AI revolution. We offer a competitive salary and benefits package, along with opportunities for professional growth and development. You will work in a dynamic environment where your contributions will have a tangible impact on the company's success and the advancement of sustainable technology. Join us in crafting the engine that powers a world where people can create ambitiously with AI.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Crusoe.
Similar Jobs You Might Like
Based on your interests and this role

Network Engineer
Uber is hiring a Senior Infrastructure Engineer to develop intelligent network infrastructures and support cloud and on-prem networks. You'll work with GCP and automation tools to ensure network performance and scalability.

Infrastructure Engineer
Hedra is seeking a Senior/Staff Infrastructure Engineer to ensure the reliability and operability of their core Python web services on AWS. You'll work with Kubernetes and PostgreSQL to build a resilient infrastructure. This role requires strong experience in cloud services and infrastructure management.

Software Engineering
Chime is hiring a Senior Software Engineer for their Infrastructure Engineering team to develop tools for cloud infrastructure and automate service management. You'll work with Ruby, Go, and AWS to ensure platform stability and scalability. This position requires over 2 years of cloud and infrastructure experience.

Software Engineering
Sentry is hiring a Senior Software Engineer for their Infrastructure team to design and maintain internal software systems that enhance development workflows. You'll work with technologies like Java, Python, and Docker in a hybrid work environment based in San Francisco.

Software Engineering
Baseten is hiring a Senior Software Engineer - Infrastructure to architect and lead the development of their ML inference platform. You'll work with technologies like Kubernetes and AWS to optimize model serving. This position requires significant experience in infrastructure and machine learning.