About Crusoe

Sustainable AI cloud solutions for a greener future

🏢 Tech👥 501-1000📅 Founded 2018📍 Denver, Colorado, United States

Key Highlights

Headquartered in Denver, Colorado
501-1000 employees focused on AI and renewable energy
First vertically integrated AI cloud platform
Committed to sustainable computing practices

Crusoe is a pioneering AI cloud platform headquartered in Denver, Colorado, that utilizes clean, renewable energy to power its operations. The company focuses on providing scalable computing resources for AI and machine learning applications, serving a diverse range of clients across various industr...

🎁 Benefits

Crusoe offers competitive salaries, equity options, generous PTO, and a flexible remote work policy to support work-life balance....

🌟 Culture

Crusoe fosters a culture centered on sustainability and innovation, encouraging employees to contribute to environmentally friendly computing solution...

🌐 Website 💼 LinkedIn 𝕏 Twitter All 232 jobs →

Software Engineering • Senior

Crusoe • San Francisco - On-Site

Posted 4 months ago🏛️ On-Site Senior Software Engineering 📍 San Francisco💰 $166,000 - $201,000 / yearly

Apply Now →

Skills & Technologies

kubernetes prometheus grafana opentelemetry fluent bit elk stack jaeger tempo

Overview

Crusoe is hiring a Senior Software Engineer for their Cloud Availability Platform Engineering team to design and operate observability systems. You'll work with technologies like Kubernetes, Prometheus, and Grafana to ensure reliability and performance across Crusoe’s cloud infrastructure.

Job Description

Who you are

You have deep expertise in building and operating observability platforms at scale — your experience includes designing, developing, and running observability stacks that provide actionable insights into distributed systems. You understand the importance of metrics, logs, and traces in ensuring system reliability and performance.

You are skilled in architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization — your knowledge of tools like Prometheus, Grafana, and OpenTelemetry allows you to extend monitoring and alerting capabilities effectively. You have experience working with multi-datacenter Kubernetes environments, ensuring that observability systems are scalable and robust.

You are familiar with building scalable log collection and processing pipelines using Fluent Bit, Vector, Loki, or ELK/Opensearch stacks — your background includes implementing distributed tracing platforms such as Tempo and Jaeger, integrating them with service meshes, load balancers, and APIs. You are proactive in defining and driving the adoption of SLOs, SLIs, and error budgets across teams.

What you'll do

In this role, you will design and operate scalable observability systems that provide insights into the internal state of distributed systems — your work will enable engineers to understand system performance and reliability. You will architect telemetry pipelines that facilitate the collection and analysis of metrics, logs, and traces, ensuring that the observability stack meets the needs of Crusoe’s global infrastructure.

You will collaborate with cross-functional teams to extend monitoring and alerting capabilities, leveraging tools like Prometheus and Grafana to visualize system performance. Your responsibilities will include building scalable log collection and processing pipelines, ensuring that logs are efficiently ingested and processed for analysis.

You will implement distributed tracing platforms, integrating them with existing infrastructure to provide comprehensive visibility into system interactions. Your role will also involve defining SLOs and SLIs, working closely with teams to ensure that service reliability meets organizational standards.

What we offer

At Crusoe, you will be part of a mission-driven team focused on accelerating the abundance of energy and intelligence through sustainable technology. We offer a collaborative work environment where innovation is encouraged, and your contributions will have a tangible impact on the future of cloud infrastructure. You will have opportunities for professional growth and development, working alongside talented engineers who are passionate about their work.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Crusoe.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Software Engineering

Crusoe•📍 San Francisco - On-Site

Crusoe is hiring a Senior/Staff Software Engineer for their Observability team to design and operate scalable observability systems. You'll work with technologies like Kubernetes and Prometheus to build a next-generation observability stack. This position requires deep expertise in observability platforms and experience in distributed systems.

🏛️ On-SiteSenior

2w ago

Software Engineering

Crusoe•📍 San Francisco - On-Site

Crusoe is hiring a Staff+ Software Engineer to architect and develop Cloud Infrastructure management systems. You'll work with AWS, Docker, and Kubernetes to enhance the performance and reliability of Crusoe Cloud. This position requires significant experience in cloud technologies and software engineering.

🏛️ On-SiteSenior

2 months ago

Software Engineering

Together AI•📍 San Francisco

Together AI is hiring a Senior Software Engineer to build and scale foundational systems for their AI Acceleration Cloud. You'll work with AWS, Azure, and GCP to develop robust distributed storage solutions and observability platforms. This role requires 5+ years of experience in building large-scale systems.

Senior

2 months ago

Software Engineering

Apple•📍 San Francisco - On-Site

Apple is hiring a Software Engineer for their Cloud Infrastructure team to build observability services that empower developers. You'll work with large-scale data systems in San Francisco. This position requires a strong background in software development.

🏛️ On-SiteMid-Level

3 months ago

Software Engineering

Apple•📍 San Francisco - On-Site

Apple is hiring a Software Engineer for their Observability team to develop high-performance distributed systems. You'll work with large-scale data and collaborate with cross-functional teams. This role requires experience in software engineering and a strong understanding of system design.

🏛️ On-SiteMid-Level

1 month ago

Browse all jobs →