Meta (Facebook)

About Meta (Facebook)

Connecting people through innovative technology

Key Highlights

  • Over 2.9 billion monthly active users across platforms
  • Headquartered in Menlo Park, California
  • Valued at over $800 billion
  • Significant investments in Oculus and AR/VR technology

Meta (formerly Facebook) is a leading technology company focused on building the metaverse, with over 2.9 billion monthly active users across its platforms, including Facebook, Instagram, and WhatsApp. Headquartered in Menlo Park, California, Meta has invested heavily in virtual reality and augmente...

🎁 Benefits

Meta offers competitive salaries, equity compensation, generous PTO policies, comprehensive health benefits, and a robust parental leave program. Empl...

🌟 Culture

Meta fosters a culture of innovation and experimentation, encouraging employees to take risks and explore new ideas. The company emphasizes a mission-...

Overview

Meta is hiring an AI/HPC Systems Performance Engineer to enhance their AI Training and Inference Infrastructure. You'll work with technologies like Linux, Python, and TensorFlow to ensure optimal performance of network systems. This role requires experience in high-performance computing and networking.

Job Description

Who you are

You have a strong background in high-performance computing and networking, with experience in building and optimizing systems that support AI workloads. Your expertise in Linux and Python allows you to troubleshoot and enhance system performance effectively. You understand the intricacies of RDMA workloads and are familiar with loss-less fabric interconnects, ensuring that network infrastructure meets stringent performance and availability requirements.

You are skilled in using TensorFlow and Kubernetes, which enables you to manage and deploy AI models efficiently. Your experience with Docker helps you create and manage containerized applications, ensuring smooth integration and deployment across various environments. You thrive in collaborative settings, working closely with cross-functional teams to address scaling challenges and improve system performance.

What you'll do

In this role, you will be responsible for building and evolving Meta's network infrastructure that connects various training accelerators like GPUs. You will ensure that the network operates smoothly and meets the performance requirements for AI workloads. Your daily tasks will involve identifying opportunities for performance improvements across the stack, including network fabric, host networking, and scheduling infrastructure.

You will collaborate with engineers to tackle scaling challenges and implement solutions that enhance the overall efficiency of the AI Training and Inference Infrastructure. Your role will also involve monitoring system performance and availability, troubleshooting issues, and optimizing configurations to meet the demands of rapidly growing AI use cases.

What we offer

Meta provides a dynamic work environment where innovation is encouraged. You will have the opportunity to work on cutting-edge technologies and contribute to projects that have a significant impact on the future of AI. The company offers competitive compensation and benefits, fostering a culture of collaboration and continuous learning. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Meta (Facebook).

✨

Similar Jobs You Might Like

Based on your interests and this role

Meta (Facebook)

Ai Engineer

Meta (Facebook)β€’πŸ“ Menlo Park - On-Site

Meta is hiring an AI Engineer to tackle scaling challenges in AI Training and Inference Infrastructure. You'll work with technologies like Python and TensorFlow to enhance network performance. This role requires a PhD and expertise in high-performance computing.

πŸ›οΈ On-Site
2w ago
Apple

Ai Engineer

Appleβ€’πŸ“ Santa Clara

Apple is hiring a Senior AI Infra Performance Engineer to tackle performance challenges in machine learning workloads. You'll work with technologies like C++, Python, and frameworks such as PyTorch and JAX. This position requires 7+ years of experience in large-scale distributed systems.

Senior
3w ago
Meta (Facebook)

Ai Engineer

Meta (Facebook)β€’πŸ“ Menlo Park - On-Site

Meta is hiring an AI Production Engineer to build and scale production-grade AI systems that enhance executive productivity. You'll work with Python and automation tools to design resilient systems. This role requires strong systems engineering skills.

πŸ›οΈ On-SiteMid-Level
2w ago
Meta (Facebook)

Product Designer

Meta (Facebook)β€’πŸ“ Menlo Park

Meta is hiring a Product Design Engineer to build cutting-edge AI developer tools. You'll work on tooling and platforms that power Meta’s AI efforts, contributing to product strategy and user experience design. This role requires a strong background in design and engineering.

Mid-Level
2 months ago
Meta (Facebook)

Ai Engineer

Meta (Facebook)β€’πŸ“ Menlo Park - On-Site

Meta is hiring an AI Capacity Planning Engineer to focus on AI strategy and planning projects. You'll work cross-functionally to optimize AI computing resources. This role requires experience in performance and capacity engineering.

πŸ›οΈ On-SiteMid-Level
3w ago