About AvePoint

Empowering organizations to manage Microsoft 365 data securely

🏢 Tech👥 1K-5K📅 Founded 2001📍 Jersey City, New Jersey, United States

Key Highlights

Over 16,000 customers including the UN and U.S. Department of Defense
Headquartered in Jersey City, New Jersey
Raised over $200 million in funding
Approximately 1,300 employees worldwide

AvePoint is the largest independent software vendor of SaaS solutions for Microsoft 365, specializing in data migration, management, and protection. Headquartered in Jersey City, New Jersey, AvePoint serves over 16,000 customers globally, including organizations like the United Nations and the U.S. ...

🎁 Benefits

AvePoint offers competitive salaries, equity options, generous PTO policies, remote work flexibility, and a comprehensive health benefits package....

🌟 Culture

AvePoint fosters a culture of innovation and collaboration, emphasizing a strong commitment to customer success and employee well-being. The company v...

🌐 Website 💼 LinkedIn 𝕏 Twitter All 230 jobs →

Site Reliability Engineer • Mid-Level

AvePoint • Singapore

Posted 1w agoMid-Level Site Reliability Engineer 📍 Singapore

Apply Now →

Skills & Technologies

AWS GitLab Kubernetes Ci/cd Incident management Observability Automation Security

Overview

AvePoint is seeking a Site Reliability Engineer to build and operate a Whole-of-Government runtime platform. You'll design and manage AWS and Kubernetes-based infrastructure while ensuring system stability and performance. This role requires experience with GitLab and CI/CD automation.

Job Description

Who you are

You have a strong background in Site Reliability Engineering, with experience in designing and operating cloud-based infrastructure, particularly using AWS and Kubernetes. Your expertise in GitLab and CI/CD pipelines allows you to automate repetitive tasks effectively, enhancing operational efficiency across cross-functional teams.

You understand the importance of observability in system health, implementing comprehensive solutions that monitor latency, traffic, errors, and saturation. Your proactive approach to system health assessments and self-remediation ensures that you maintain high availability and performance standards.

You are skilled in incident management, participating in on-call rotations and responding promptly to incidents to minimize mean time to recovery (MTTR). Your ability to conduct thorough post-incident reviews helps in implementing preventive measures that enhance system resilience.

Your knowledge of security and compliance is robust, as you design and implement secure solutions in collaboration with dedicated security teams. You conduct regular audits and integrate advanced vulnerability scanning tools to safeguard the infrastructure.

You are adept at identifying and resolving performance bottlenecks, defining and tracking key performance indicators (KPIs) such as system uptime and cost efficiency. Your focus on ongoing optimization drives improvements in operational processes.

You possess leadership and mentoring capabilities, contributing to the development of a resilient and collaborative team environment. You are passionate about sharing your knowledge and helping others grow in their roles.

What you'll do

In this role, you will be responsible for building and operating a Whole-of-Government runtime platform, ensuring its stability and scalability. You will design and manage AWS and Kubernetes-based infrastructure, leveraging GitLab for CI/CD automation to streamline deployment processes.

You will implement observability solutions that provide insights into system health, allowing for proactive monitoring and quick response to any issues that arise. Your work will involve collaborating with cross-functional teams to integrate automation that reduces manual intervention and enhances operational efficiency.

As part of your responsibilities, you will participate in incident management, responding to incidents swiftly to minimize downtime. You will conduct post-incident reviews to identify root causes and implement preventive measures to improve system resilience.

You will also focus on security and compliance, working closely with security teams to ensure that all solutions are secure and meet compliance standards. Regular audits and vulnerability scanning will be part of your routine to maintain a secure environment.

Your role will require you to continuously identify performance bottlenecks and operational issues, driving ongoing optimization efforts. You will define and track KPIs to measure the effectiveness of your initiatives and ensure that the platform operates efficiently.

What we offer

At AvePoint, you will be part of a dynamic team that values collaboration and innovation. We offer a competitive salary and benefits package, along with opportunities for professional growth and development. You will work in an environment that encourages knowledge sharing and mentorship, allowing you to enhance your skills and advance your career.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at AvePoint.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Site Reliability Engineer

Point72•📍 India

Point72 is hiring a Site Reliability Engineer to develop and maintain complex distributed systems for their Macro Technology team. You'll focus on optimizing operations and ensuring system reliability. This role requires a strong background in software and systems engineering.

Mid-Level

2w ago

Site Reliability Engineer

Apple•📍 San Diego - On-Site

Apple is seeking a Site Reliability Engineer to enhance cloud-based infrastructure for their Video Computer Vision organization. You'll work with technologies like AWS, Docker, and Kubernetes to ensure system reliability and scalability. This role requires experience in operational support and cloud applications.

🏛️ On-SiteMid-Level

2 months ago

Site Reliability Engineer

Scaleway•📍 Paris

Scaleway is hiring a Site Reliability Engineer to build and maintain reliable, observable, and secure infrastructure. You'll work with technologies like Docker, Kubernetes, and AWS to ensure optimal service availability. This role requires experience in cloud computing and infrastructure management.

1 year ago

Site Reliability Engineer

ThoughtWorks•📍 Singapore - On-Site

ThoughtWorks is hiring a Senior Site Reliability Engineer to ensure technical excellence and operational efficiency within the infrastructure domain. You'll specialize in reliability, resilience, and system performance while utilizing automation and monitoring tools. This role requires expertise in SRE principles and a commitment to continuous improvement.

🏛️ On-SiteSenior

1d ago

Site Reliability Engineer

TravelPerk•📍 Barcelona

TravelPerk is hiring a Site Reliability Engineer (SRE) to design, build, and maintain scalable cloud infrastructure. You'll work with AWS and serverless technologies to ensure reliability and performance. This role requires experience in monitoring and observability.

3 months ago

Browse all jobs →