
About Arcesium
Empowering asset managers with advanced fintech solutions
Key Highlights
- Spin-out from D. E. Shaw group, enhancing expertise
- Headquartered in New York City with a strong fintech focus
- Supports over 100 clients in the asset management sector
- Employs between 1,000 and 5,000 professionals
Arcesium, a spin-out of the D. E. Shaw group, specializes in software and services for asset managers, focusing on post-trade activities. Headquartered in New York City, Arcesium supports over 100 clients, including hedge funds and investment firms, with its comprehensive suite of financial technolo...
π Benefits
Arcesium offers competitive salaries, equity options, generous PTO policies, and a flexible remote work environment to support work-life balance....
π Culture
Arcesium fosters a culture centered on innovation and collaboration, emphasizing a strong engineering focus and a commitment to delivering high-qualit...
Skills & Technologies
Overview
Arcesium is hiring a Senior Site Reliability Engineer to ensure the stability and reliability of mission-critical production applications. You'll work with technologies like AWS, Docker, and Kubernetes in Lisbon.
Job Description
Who you are
You are an experienced Site Reliability Engineer with a strong background in maintaining and improving the reliability of complex systems. With 5+ years of experience in a similar role, you have a deep understanding of observability, monitoring, and incident management β you've successfully implemented tools and processes that enhance system stability and resilience. Your expertise in cloud platforms, particularly AWS, allows you to design and manage scalable infrastructure that meets the demands of high-traffic applications.
You possess strong troubleshooting skills and can quickly diagnose and resolve live production issues β your analytical mindset helps you to proactively detect potential problems before they escalate. You are comfortable working in a collaborative environment, where you can share knowledge and mentor junior engineers, fostering a culture of continuous improvement and learning.
Your technical skills include proficiency in containerization technologies like Docker and orchestration tools such as Kubernetes β you understand how to leverage these tools to streamline deployment processes and improve system performance. You are also well-versed in using monitoring and alerting tools like Prometheus and Grafana to ensure that systems are operating optimally.
Desirable
Experience with infrastructure as code tools such as Terraform or CloudFormation would be a plus, as would familiarity with CI/CD pipelines and automation frameworks. You are always eager to learn new technologies and methodologies that can enhance your team's capabilities and improve operational efficiency.
What you'll do
In this role, you will be a key member of the Platform Site Reliability Engineering (PSRE) team, responsible for ensuring the stability, reliability, and availability of mission-critical production applications on the Arcesium platform. You will implement observability practices, including monitoring, logging, and tracing, to proactively detect and prevent issues that could impact system performance.
You will build and maintain tools and infrastructure that enhance system stability and resilience, working closely with development teams to ensure that applications are designed with reliability in mind. Your responsibilities will include troubleshooting live production issues, focusing on rapid incident resolution, and conducting post-mortem analyses to identify root causes and prevent future occurrences.
You will collaborate with cross-functional teams to improve operational processes and contribute to the development of best practices for incident management and response. Your role will also involve mentoring junior engineers, sharing your expertise, and helping to cultivate a culture of reliability within the organization.
What we offer
Arcesium offers a dynamic work environment where you can make a meaningful impact from day one. We value intellectual curiosity and proactive ownership, providing opportunities for professional development and growth. You will be part of a collaborative team that is committed to innovation and excellence in the financial technology sector. We encourage you to apply even if your experience doesn't match every requirement β we believe diverse teams build better products.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Arcesium.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
Arcesium is hiring a Lead Site Reliability Engineer to ensure the stability and reliability of mission-critical production applications. You'll work with technologies like AWS, Docker, and Kubernetes in Lisbon.

Site Reliability Engineer
MoonPay is hiring a Senior Site Reliability Engineer to enhance their resilient and secure production platform. You'll work with AWS, Docker, and Kubernetes to ensure smooth deployment of applications. This role requires significant experience in site reliability engineering.

Site Reliability Engineer
GoCardless is seeking a Senior Site Reliability Engineer to enhance their platform's reliability and performance. You'll work in a global team focused on improving payment systems. This role requires extensive experience in site reliability engineering.

Site Reliability Engineer
PandaDoc is hiring a Senior Site Reliability Engineer to ensure reliable service with minimal downtime. You'll manage incident processes and contribute to service codebases using Python and Java. This role requires strong experience with AWS and Kubernetes.

Site Reliability Engineer
Iterable is seeking a Senior Site Reliability Engineer to enhance their cloud platform. You'll work with AWS, Docker, and Kubernetes to ensure system reliability and performance. This role requires strong experience in cloud infrastructure and operations.