
About Scaleway
Your cloud ecosystem for sustainable growth
Key Highlights
- Over 38,000 customers in 160 countries
- Headquartered in Paris with 501-1000 employees
- Part of the Iliad Group with profit-sharing options
- Offers a Startup Program with cloud credits and expertise
Scaleway, headquartered in Paris, France, is a leading cloud computing provider that empowers over 38,000 businesses across 160 countries. With a focus on sustainability and flexibility, Scaleway offers a comprehensive cloud ecosystem, including bare metal, containerization, and serverless architect...
π Benefits
Scaleway offers a range of benefits including 75% reimbursement on public transportation, a β¬200 annual discount on Scaleway Elements, and profit-shar...
π Culture
Scaleway fosters a culture centered around sustainability and flexibility, making it an attractive choice for startups. The company emphasizes a smoot...
Skills & Technologies
Overview
Scaleway is hiring a Site Reliability Engineer to build and maintain reliable infrastructure for AI GPU clusters. You'll work with technologies like Kubernetes, Docker, and AWS in Paris. This position requires experience in managing production environments.
Job Description
Who you are
You have a strong background in site reliability engineering, with experience in building and maintaining reliable, observable, and secure infrastructure. Your expertise in managing production environments ensures optimal service availability for customers around the world. You are familiar with cloud computing technologies and have a passion for supporting AI initiatives. Your collaborative spirit allows you to thrive in diverse teams, and you are committed to technical excellence.
You possess a deep understanding of containerization and orchestration tools, particularly Kubernetes and Docker. Your experience with monitoring and alerting tools like Prometheus and Grafana enables you to proactively manage system performance and reliability. You are comfortable working in a Linux environment and have experience with cloud platforms such as AWS. Your problem-solving skills and attention to detail help you identify and resolve issues efficiently.
What you'll do
In this role, you will be responsible for building and maintaining the infrastructure that supports Scaleway's AI GPU clusters. You will work closely with engineering teams to ensure that the systems are reliable and scalable. Your mission will involve implementing best practices for infrastructure management, including CI/CD pipelines and automation. You will monitor system performance and respond to incidents to minimize downtime and ensure service availability.
You will collaborate with cross-functional teams to design and implement solutions that meet the needs of our customers. Your role will also involve capacity planning and optimizing resource usage to support Scaleway's growth. You will contribute to the development of documentation and training materials to help onboard new team members and improve team efficiency.
What we offer
Scaleway offers a dynamic work environment where you can contribute to shaping the future of cloud computing. You will have the opportunity to work on cutting-edge technologies and be part of a team that values collaboration and innovation. We encourage you to apply even if your experience doesn't match every requirement. Join us in building a sovereign cloud alternative that supports ambitious companies across Europe.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Scaleway.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
Scaleway is hiring a Site Reliability Engineer to build and maintain reliable, observable, and secure infrastructure. You'll work with technologies like Docker, Kubernetes, and AWS to ensure optimal service availability. This role requires experience in cloud computing and infrastructure management.

Site Reliability Engineer
Scaleway is hiring a Site Reliability Engineer to enhance the reliability and performance of their network products. You'll work with technologies like Linux, Docker, and Kubernetes to automate and monitor infrastructure. This role requires expertise in SRE practices and tools.

Site Reliability Engineer
amo is hiring a Lead Site Reliability Engineer (SRE) to ensure their systems handle high traffic and maintain performance and reliability. You'll work with technologies like ScyllaDB and focus on automation and system design. This role requires strong leadership and experience in distributed systems.

Site Reliability Engineer
Scaleway is hiring a Site Reliability Engineer to ensure the robustness and performance of their cloud services. You'll work with technologies like Linux, Docker, and Kubernetes in a collaborative environment based in Paris.

Site Reliability Engineer
Algolia is seeking a Senior Site Reliability Engineer to ensure the availability of their search products. You'll work with technologies like AWS, Docker, and Kubernetes to optimize performance at scale. This role requires experience in building and operating scalable architectures.