About Sesame

Affordable healthcare access without insurance hassles

🏢 Retail👥 101-200 employees📅 Founded 2018📍 Canal Street, New York, NY💰 $76.1m⭐ 4.2

HealthcareB2CB2BMarketplaceeCommerceMedTech

Key Highlights

Headquartered in New York, NY on Canal Street
$76.1 million raised in Series B funding
101-200 employees, fostering a diverse workplace
Marketplace model for both B2C and B2B healthcare

Sesame is a healthcare marketplace platform headquartered on Canal Street in New York, NY, that enables patients to access high-quality medical care at affordable self-pay prices. With $76.1 million raised in Series B funding, Sesame has attracted significant investment, allowing users to search, co...

🎁 Benefits

Employees enjoy a flexible vacation policy, comprehensive health care coverage options, and the opportunity to work in a fun, international environmen...

🌟 Culture

Sesame fosters a unique culture focused on transparency and accessibility in healthcare, empowering patients to make informed decisions while simplify...

🌐 Website 💼 LinkedIn 𝕏 Twitter All 28 jobs →

Ml Model Serving Engineer • Senior

Sesame • San Francisco - On-Site

Posted 11 months ago🏛️ On-Site Senior Ml Model Serving Engineer 📍 San Francisco

Apply Now →

Skills & Technologies

pytorch

Overview

Sesame is hiring an ML Model Serving Engineer to enhance their serving layer for LLM, speech, and vision models. You'll work with PyTorch and optimize machine learning models for high throughput and low latency. This position requires significant systems programming experience.

Job Description

Who you are

You are an expert in differentiable array computing frameworks, preferably PyTorch, with a strong background in optimizing machine learning models for serving reliably at high throughput and low latency. Your significant systems programming experience means you are comfortable with the internals of high-performance server systems, and you have a knack for modifying and extending LLM serving frameworks like VLLM and SGLang to leverage the latest techniques in model serving. You understand the importance of reducing model initialization times without sacrificing quality, and you are familiar with techniques like in-flight batching, caching, and custom kernels to speed up inference.

What you'll do

In this role, you will turbocharge Sesame's serving layer, which consists of a variety of LLM, speech, and vision models. You will partner closely with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer that powers a new consumer product category. Your responsibilities will include identifying opportunities to produce faster models without sacrificing quality and implementing optimizations that enhance the overall performance of the serving layer. You will also collaborate with the training team to ensure that the models are not only efficient but also maintain high standards of accuracy and reliability.

What we offer

Sesame provides a full-time position with competitive benefits, including 401k matching, 100% employer-paid health, vision, and dental benefits, unlimited PTO, and flexible spending account matching. You will be part of a team that is shaping the future of lifelike computers, working alongside founders from Oculus and Ubiquity6, and leaders from Meta, Google, and Apple. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Sesame.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Software Engineering

Databricks•📍 San Francisco - On-Site

Databricks is hiring a Senior Software Engineer for their Model Serving team to design and build systems for deploying AI/ML models. You'll work with technologies like Python and focus on scalability and reliability in San Francisco.

🏛️ On-SiteSenior

1d ago

Software Engineering

Anyscale•📍 San Francisco

Anyscale is seeking a Software Engineer for their Model Serving Infrastructure team to develop high-performance machine learning serving systems. You'll work with Python and distributed systems to democratize AI applications. This role requires expertise in machine learning and a passion for scalable computing.

2 months ago

Staff Engineer

Databricks•📍 San Francisco - On-Site

Databricks is hiring a Staff Software Engineer for their Model Serving team to design and build systems for AI/ML model deployment. You'll work with Python and cloud technologies to ensure high-throughput, low-latency inference. This position requires significant experience in software engineering and machine learning.

🏛️ On-SiteSenior

1d ago

Engineering Manager

Databricks•📍 San Francisco - On-Site

Databricks is seeking a Senior Engineering Manager to lead the Model Serving team, focusing on product experience and foundational infrastructure. You'll work with technologies like Python, Java, and Kubernetes to enhance AI/ML model deployment. This role requires strong leadership and technical expertise.

🏛️ On-SiteSenior

1d ago

Machine Learning Engineer

Uber•📍 San Francisco - Hybrid

Uber is hiring a Senior ML Engineer to develop and productionize machine learning models for dynamic security systems. You'll work with Python, TensorFlow, and Kubernetes in San Francisco. This position requires 5+ years of experience in ML and security contexts.

🏢 HybridSenior

1 month ago

Browse all jobs →