
About Krea
AI tools for instant image creation and enhancement
Key Highlights
- Headquartered in Stockholm, Sweden
- Specializes in real-time image generation and enhancement
- Offers tools for logo design, pattern generation, and photo upscaling
- Unique real-time 'canvas' feature for iterative design
Krea is an AI-powered platform based in Stockholm, Sweden, specializing in real-time image generation and enhancement. The platform provides tools for instant AI image creation, logo design, pattern generation, and photo upscaling, catering to designers and creatives. With its unique real-time 'canv...
🎁 Benefits
Krea offers competitive equity options, flexible remote work arrangements, and generous PTO policies to support work-life balance....
🌟 Culture
Krea fosters a culture of creativity and innovation, emphasizing rapid iteration and user feedback to enhance its AI tools. The team values collaborat...
Overview
Krea is hiring a Distributed Systems Engineer to design and maintain large-scale distributed infrastructure for AI research and real-time model serving. You'll work with technologies like Kubernetes and Python, and collaborate closely with ML engineers. This position requires experience in distributed systems and cloud deployments.
Job Description
Who you are
You have a strong background in distributed systems engineering, with experience designing and maintaining large-scale infrastructures that support AI research and real-time applications. Your expertise in Kubernetes allows you to manage multi-thousand-node GPU clusters efficiently, ensuring fault-tolerant operations and optimal performance.
You are proficient in programming languages such as Python, Golang, Ruby, and Rust, which you use to develop and optimize distributed systems. Your understanding of low-level Linux interfaces and administration enables you to debug complex issues in production environments effectively.
You thrive in collaborative settings, working closely with machine learning engineers and researchers to architect systems that facilitate rapid experimentation and deployment. Your experience with load balancing and network architecture helps streamline operations across multi-zone cloud deployments.
Desirable
Familiarity with Infrastructure as Code tools like Terraform is a plus, as it complements your skills in managing and automating infrastructure. You are always eager to learn and adapt to new technologies that enhance system reliability and performance.
What you'll do
In this role, you will own and manage a large-scale Kubernetes cluster designed for extensive machine learning training and inference workloads. You will architect fault-tolerant systems that ensure uninterrupted model training and real-time inference, even in the face of individual node failures.
Your responsibilities will include developing and implementing optimized load-balancing strategies to efficiently distribute workloads across different zones. You will also improve network architecture and streamline operational practices to enhance the overall performance of the distributed systems.
Collaboration is key in this position, as you will work closely with cross-functional teams to ensure that the infrastructure meets the needs of AI research and deployment. You will participate in design discussions and contribute to the technical direction of the projects.
What we offer
Krea is committed to building next-generation AI creative tools that empower human creativity. As part of our team, you will have the opportunity to work on innovative projects that impact millions of users. We offer a competitive salary and benefits package, along with a dynamic work environment that fosters creativity and collaboration.
Join us in our mission to make AI intuitive and controllable for creatives. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Krea.
Similar Jobs You Might Like
Based on your interests and this role

Software Engineering
OpenAI is hiring a Senior Software Engineer to design and build a load balancer for their research inference stack. You'll work with technologies like Java and Python, focusing on distributed systems and performance optimization. This role requires strong experience in building reliable and efficient systems.

Distributed Systems Engineer
Ritual is hiring a Distributed Systems Engineer to build and scale their blockchain infrastructure. You'll work on node specialization, consensus mechanisms, and distributed AI execution. This role requires deep expertise in Go and/or Rust.

Distributed Systems Engineer
Axiom is hiring a Distributed Systems Engineer to build Rust-based rollup nodes and sequencers for zero-knowledge proof based products. You'll work primarily in Rust and design secure systems for scaling smart contract applications. This position requires strong engineering design skills and experience with blockchain protocols.

Data Engineer
Exa is seeking a Data Engineer to architect and build the data infrastructure for their innovative search engine. You'll work with technologies like Rust, Kafka, and Flink to develop large-scale data processing systems. This role requires a deep understanding of lakehouse architectures and distributed data systems.

Software Engineering
OpenAI is hiring a Software Engineer for the Sora team to design and scale infrastructure for multimodal training and evaluation. You'll work with distributed data systems and collaborate closely with researchers. This position requires strong experience in building reliable infrastructure.