
About Databricks
Empowering data teams with unified analytics
Key Highlights
- Headquartered in San Francisco, CA
- Valuation of $43 billion with $3.5 billion raised
- Serves over 7,000 customers including Comcast and Shell
- Utilizes Apache Spark for big data processing
Databricks, headquartered in San Francisco, California, is a unified data analytics platform that simplifies data engineering and collaborative data science. Trusted by over 7,000 organizations, including Fortune 500 companies like Comcast and Shell, Databricks has raised $3.5 billion in funding, ac...
🎁 Benefits
Databricks offers competitive salaries, equity options, generous PTO policies, and a remote-friendly work environment. Employees also benefit from a l...
🌟 Culture
Databricks fosters a culture of innovation with a strong emphasis on data-driven decision-making. The company values collaboration across teams and en...
Skills & Technologies
Overview
Databricks is hiring a Staff Software Engineer for GenAI inference to lead the architecture and optimization of their inference engine. You'll work with technologies like Python and TensorFlow to ensure high throughput and low latency. This position requires significant experience in machine learning and AI.
Job Description
Who you are
You have a strong background in software engineering with a focus on AI and machine learning — your experience includes leading architecture and development for large-scale systems. You are proficient in Python and have hands-on experience with machine learning frameworks like TensorFlow, which you have used to optimize inference engines for performance and scalability.
You understand the intricacies of building and maintaining high-performance systems — your expertise includes working with large language models (LLMs) and optimizing for latency, throughput, and memory efficiency. You are comfortable collaborating with researchers to integrate new model architectures and features into production systems.
You have a proven track record of driving cross-team collaboration — you thrive in environments where you can partner with platform engineers and cloud infrastructure teams to ensure seamless integration and performance of inference workloads. Your strong problem-solving skills enable you to define and guide standards for instrumentation and profiling to uncover bottlenecks.
Desirable
Experience with orchestration systems and distributed inference infrastructure is a plus — you are familiar with techniques for dynamic loading and scheduling of inference workloads. Knowledge of GPU and accelerator utilization will further enhance your contributions to the team.
What you'll do
In this role, you will own and drive the architecture, design, and implementation of the inference engine for Databricks Foundation Model API. You will collaborate closely with researchers to bring new model architectures and features into the engine, ensuring that the system is optimized for large-scale LLM inference.
You will lead the end-to-end optimization efforts for latency, throughput, and memory efficiency across various hardware platforms, including GPUs. Your responsibilities will include defining standards for building and maintaining instrumentation, profiling, and tracing tooling to guide optimizations and ensure reliability in the inference pipelines.
You will architect scalable solutions for routing, batching, and scheduling inference workloads, ensuring that the system can handle high demand while maintaining fault tolerance and reproducibility. Your role will also involve collaborating with cross-functional teams to integrate with federated and distributed inference infrastructure, balancing load and managing communication overhead effectively.
What we offer
At Databricks, you will be part of a dynamic team that is at the forefront of AI and machine learning technology. We offer competitive compensation and benefits, along with opportunities for professional growth and development. You will work in a collaborative environment that values innovation and encourages you to push the boundaries of what is possible in AI inference.
Join us in shaping the future of data and AI — your contributions will directly impact how organizations leverage machine learning to drive their business forward. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Databricks.
Similar Jobs You Might Like
Based on your interests and this role

Software Engineering
Databricks is hiring a Software Engineer for GenAI inference to design and optimize the inference engine for their Foundation Model API. You'll work with technologies like Python, Java, and TensorFlow in San Francisco.

Staff Engineer
Databricks is hiring a Staff Software Engineer for GenAI Performance and Kernel to lead the design and optimization of high-performance GPU kernels. You'll work closely with ML researchers and systems engineers to enhance inference performance. This role requires expertise in performance engineering and GPU optimization.

Staff Engineer
Plaid is hiring a Staff Software Engineer - AI to drive the integration of AI into financial tools and experiences. You'll work on innovative solutions that empower developers and consumers alike. This role requires expertise in AI and machine learning.

Staff Engineer
Cohere is hiring a Staff Software Engineer for their Inference Infrastructure team to build high-performance AI platforms. You'll work with technologies like Python and Docker to deploy optimized NLP models. This role requires experience in machine learning and scalable systems.

Machine Learning Engineer
Tonal is hiring a Staff Machine Learning Engineer to design and implement intelligent systems that enhance coaching and personalize workouts. You'll work with advanced AI technologies and large datasets in San Francisco.