About Baseten

Simplifying machine learning for every organization

🏢 Tech👥 21-100 employees📍 Union Square, San Francisco, CA💰 $285m

B2BAnalyticsBusiness IntelligenceMachine Learning

Key Highlights

Headquartered in Union Square, San Francisco, CA
$285 million raised in Series C funding
Team growth of 3x over the last five years
Unlimited PTO with a company-wide holiday break

Baseten is a machine learning application builder headquartered in Union Square, San Francisco, CA. With $285 million in funding from investors like Coatue Management and Founders Fund, Baseten simplifies AI integration for businesses, enabling data scientists to deploy ML models without needing spe...

🎁 Benefits

Baseten offers a remote-first work environment with a $1,000 stipend for home office setup, unlimited PTO with a company-wide break during the holiday...

🌟 Culture

Baseten's culture emphasizes simplifying complex AI technologies for businesses, fostering a collaborative environment where team members can connect ...

🌐 Website 💼 LinkedIn 𝕏 Twitter All 38 jobs →

Software Engineering

Baseten • San Francisco

Posted 4 months agoSoftware Engineering 📍 San Francisco

Apply Now →

Skills & Technologies

Cuda Tensorrt Distributed systems Model serving

Overview

Baseten is hiring a Software Engineer focused on Model APIs to design and optimize infrastructure for AI models. You'll work with technologies like CUDA and TensorRT in San Francisco.

Job Description

Who you are

You have a strong background in software engineering with a focus on building and optimizing APIs for AI models. Your experience includes working with distributed systems and model serving, ensuring that the infrastructure you create is both reliable and efficient. You are familiar with advanced inference capabilities and understand the intricacies of structured outputs and multi-modal serving.

You possess expertise in performance profiling and optimization, particularly with TensorRT and CUDA. Your ability to analyze kernel performance and implement custom CUDA operators allows you to maximize throughput and optimize memory allocation patterns. You thrive in collaborative environments and are eager to contribute to a forward-thinking team.

What you'll do

In this role, you will design, build, and operate the Model APIs surface, focusing on advanced inference capabilities. You will profile and optimize TensorRT-LLM kernels, analyze CUDA kernel performance, and implement custom CUDA operators. Your work will involve tuning memory allocation patterns for maximum throughput and optimizing communication patterns across multi-GPU setups. You will collaborate closely with product and model performance teams to define how developers interact with AI models at scale.

What we offer

At Baseten, you will be part of a rapidly growing company that is at the forefront of AI technology. We provide a diverse and inclusive workplace where your contributions will have a significant impact. You will have the opportunity to work with cutting-edge technologies and be part of a team that is dedicated to pushing the boundaries of AI applications.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Baseten.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Software Engineering

Baseten•📍 San Francisco

Baseten is hiring a Software Engineer focused on Model Performance to enhance AI model inference. You'll work with technologies like Python, PyTorch, and TensorFlow in San Francisco.

1 year ago

Software Engineering

OpenAI•📍 San Francisco - On-Site

OpenAI is seeking a Software Engineer to design and build their API products. You'll work with the OpenAI API and collaborate with various teams to ensure high-quality API design. This role requires a strong appreciation for API design and development.

🏛️ On-SiteMid-Level

2 months ago

Software Engineering

Databricks•📍 San Francisco - On-Site

Databricks is hiring a Senior Software Engineer for their Model Serving team to design and build systems for deploying AI/ML models. You'll work with technologies like Python and focus on scalability and reliability in San Francisco.

🏛️ On-SiteSenior

1d ago

Software Engineering

Baseten•📍 San Francisco - On-Site

Baseten is hiring a Senior Software Engineer - Model Training to build infrastructure for large-scale training of foundation models. You'll work with technologies like Python and TensorFlow to optimize GPU utilization and create scalable pipelines. This position requires significant experience in software engineering and machine learning.

🏛️ On-SiteSenior

5 months ago

Software Engineering

OpenAI•📍 San Francisco - On-Site

OpenAI is hiring a Software Engineer for their Model Inference team to optimize AI models for high-volume production environments. You'll work with Azure and Python to enhance model performance and efficiency. This position requires 5+ years of experience in software engineering.

🏛️ On-SiteMid-Level

1 year ago

Browse all jobs →