
About Crusoe
Sustainable AI cloud solutions for a greener future
Key Highlights
- Headquartered in Denver, Colorado
- 501-1000 employees focused on AI and renewable energy
- First vertically integrated AI cloud platform
- Committed to sustainable computing practices
Crusoe is a pioneering AI cloud platform headquartered in Denver, Colorado, that utilizes clean, renewable energy to power its operations. The company focuses on providing scalable computing resources for AI and machine learning applications, serving a diverse range of clients across various industr...
🎁 Benefits
Crusoe offers competitive salaries, equity options, generous PTO, and a flexible remote work policy to support work-life balance....
🌟 Culture
Crusoe fosters a culture centered on sustainability and innovation, encouraging employees to contribute to environmentally friendly computing solution...
Skills & Technologies
Overview
Crusoe is hiring a Senior Software Engineer for their Cloud Availability Platform Engineering team to design and operate observability systems. You'll work with technologies like Kubernetes, Prometheus, and Grafana to ensure reliability and performance across Crusoe’s cloud infrastructure.
Job Description
Who you are
You have deep expertise in building and operating observability platforms at scale — your experience includes designing, developing, and running observability stacks that provide actionable insights into distributed systems. You understand the importance of metrics, logs, and traces in ensuring system reliability and performance.
You are skilled in architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization — your knowledge of tools like Prometheus, Grafana, and OpenTelemetry allows you to extend monitoring and alerting capabilities effectively. You have experience working with multi-datacenter Kubernetes environments, ensuring that observability systems are scalable and robust.
You are familiar with building scalable log collection and processing pipelines using Fluent Bit, Vector, Loki, or ELK/Opensearch stacks — your background includes implementing distributed tracing platforms such as Tempo and Jaeger, integrating them with service meshes, load balancers, and APIs. You are proactive in defining and driving the adoption of SLOs, SLIs, and error budgets across teams.
What you'll do
In this role, you will design and operate scalable observability systems that provide insights into the internal state of distributed systems — your work will enable engineers to understand system performance and reliability. You will architect telemetry pipelines that facilitate the collection and analysis of metrics, logs, and traces, ensuring that the observability stack meets the needs of Crusoe’s global infrastructure.
You will collaborate with cross-functional teams to extend monitoring and alerting capabilities, leveraging tools like Prometheus and Grafana to visualize system performance. Your responsibilities will include building scalable log collection and processing pipelines, ensuring that logs are efficiently ingested and processed for analysis.
You will implement distributed tracing platforms, integrating them with existing infrastructure to provide comprehensive visibility into system interactions. Your role will also involve defining SLOs and SLIs, working closely with teams to ensure that service reliability meets organizational standards.
What we offer
At Crusoe, you will be part of a mission-driven team focused on accelerating the abundance of energy and intelligence through sustainable technology. We offer a collaborative work environment where innovation is encouraged, and your contributions will have a tangible impact on the future of cloud infrastructure. You will have opportunities for professional growth and development, working alongside talented engineers who are passionate about their work.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Crusoe.
Similar Jobs You Might Like
Based on your interests and this role

Software Engineering
Crusoe is hiring a Senior/Staff Software Engineer for their Observability team to design and operate scalable observability systems. You'll work with technologies like Kubernetes and Prometheus to build a next-generation observability stack. This position requires deep expertise in observability platforms and experience in distributed systems.

Software Engineering
Crusoe is hiring a Staff+ Software Engineer to architect and develop Cloud Infrastructure management systems. You'll work with AWS, Docker, and Kubernetes to enhance the performance and reliability of Crusoe Cloud. This position requires significant experience in cloud technologies and software engineering.

Software Engineering
Together AI is hiring a Senior Software Engineer to build and scale foundational systems for their AI Acceleration Cloud. You'll work with AWS, Azure, and GCP to develop robust distributed storage solutions and observability platforms. This role requires 5+ years of experience in building large-scale systems.

Software Engineering
Apple is hiring a Software Engineer for their Cloud Infrastructure team to build observability services that empower developers. You'll work with large-scale data systems in San Francisco. This position requires a strong background in software development.

Software Engineering
Apple is hiring a Software Engineer for their Observability team to develop high-performance distributed systems. You'll work with large-scale data and collaborate with cross-functional teams. This role requires experience in software engineering and a strong understanding of system design.