
About Crusoe
Sustainable AI cloud solutions for a greener future
Key Highlights
- Headquartered in Denver, Colorado
- 501-1000 employees focused on AI and renewable energy
- First vertically integrated AI cloud platform
- Committed to sustainable computing practices
Crusoe is a pioneering AI cloud platform headquartered in Denver, Colorado, that utilizes clean, renewable energy to power its operations. The company focuses on providing scalable computing resources for AI and machine learning applications, serving a diverse range of clients across various industr...
🎁 Benefits
Crusoe offers competitive salaries, equity options, generous PTO, and a flexible remote work policy to support work-life balance....
🌟 Culture
Crusoe fosters a culture centered on sustainability and innovation, encouraging employees to contribute to environmentally friendly computing solution...
Skills & Technologies
Overview
Crusoe is hiring a Site Reliability Engineer to ensure the reliability and performance of their cloud infrastructure. You'll work with Linux, networking, and automation to maintain high service levels. This role requires experience in SRE practices and distributed systems.
Job Description
Who you are
You have a strong background in Site Reliability Engineering (SRE) practices, with a focus on maintaining high service levels through effective monitoring and automation. Your experience with distributed systems allows you to understand the complexities involved in ensuring reliability and performance. You are proficient in Linux and have a solid understanding of networking principles, which are crucial for troubleshooting and optimizing infrastructure. Your passion for automation drives you to seek out opportunities to improve processes and reduce manual intervention, ensuring that systems run smoothly and efficiently.
You thrive in a collaborative environment, working closely with engineering teams to advise on building resilient code. Your problem-solving skills enable you to anticipate potential issues and implement proactive measures to prevent them from impacting customers. You are committed to continuous improvement and conduct thorough post-mortems to learn from incidents, sharing insights with your team to enhance overall performance. You understand the importance of a customer-centric approach and strive to ensure that clients have reliable access to the virtual machines they depend on.
Desirable
Experience with cloud infrastructure and familiarity with various cloud service providers would be a plus. Knowledge of monitoring tools and practices, as well as experience with incident management, will further enhance your ability to contribute to the team's success. A background in software development can also be beneficial, as it allows for better collaboration with engineering teams.
What you'll do
In this role, you will be responsible for ensuring the reliability and performance of Crusoe's AI platform. You will work on automation and tool development to streamline routine processes, allowing for more efficient operations. Your expertise in SRE practices will guide you in detecting, analyzing, and preventing issues that could affect service levels. You will collaborate with various engineering teams to advise them on best practices for building resilient code, ensuring that systems are designed with reliability in mind.
You will also conduct thorough post-mortems following incidents, identifying root causes and implementing solutions to prevent recurrence. Your proactive approach will help anticipate issues before they impact customers, maintaining the high standards of service that Crusoe is known for. You will play a key role in driving continuous improvement initiatives, working to enhance the overall performance of the infrastructure.
What we offer
At Crusoe, you will be part of a mission-driven team that is dedicated to accelerating the abundance of energy and intelligence through sustainable technology. We offer a collaborative work environment where innovation is encouraged, and your contributions will have a tangible impact on the future of AI and cloud infrastructure. You will have opportunities for professional growth and development, as well as the chance to work on cutting-edge projects that are shaping the industry. Join us in our commitment to responsible and transformative technology solutions.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Crusoe.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
Klaviyo is hiring a Senior Site Reliability Engineer to ensure the reliability and scalability of their critical platforms. You'll work with technologies like AWS, Docker, and Kubernetes to solve complex operational challenges. This position requires strong experience in systems engineering and automation.

Site Reliability Engineer
Klaviyo is hiring a Lead Site Reliability Engineer to set technical direction and lead reliability strategy for critical platforms. You'll ensure systems are reliable and scalable while enabling rapid product development. This role requires strong technical leadership and experience with cloud infrastructure.

Site Reliability Engineer
Udemy is hiring a Staff Site Reliability Engineer to manage and evolve their infrastructure. You'll work with AWS, Kubernetes, and programming languages like Python and Golang. This role requires extensive knowledge of cloud technologies and infrastructure-as-code tools.

Site Reliability Engineer
Fivetran is seeking a Senior Site Reliability Engineer to ensure the performance and reliability of their data infrastructure. You'll collaborate with various teams to enhance the Fivetran Data Platform. This role requires expertise in AWS, Docker, and Kubernetes.

Site Reliability Engineer
Klaviyo is hiring a Site Reliability Engineer to ensure the reliability and scalability of their platforms. You'll work with AWS, Docker, and Kubernetes to solve complex operational challenges. This position requires experience in site reliability engineering.