Staff Software Engineer, Machine Learning Platform

Stripe

full-time lead San Francisco, Seattle

This position is sourced from Stripe's career page . Apply through Job-Scouts to track your application status.

Job Description

Who we are

About Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world's largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone's reach while doing the most important work of your career.

About the team

Stripe processes over $1.9T in payments volume per year, which is roughly 1.6% of the world's GDP, for millions of customers from startups to enterprises. The tremendous amount of data makes Stripe one of the best places to do machine learning. While being an integral part of almost every product line at Stripe (e.g., Payments, Radar, Capital, Billing, etc.), we have lots of exciting opportunities to innovate in ML Platform at Stripe.

The ML Platform team builds the platforms and services that enable ML engineers and data scientists across Stripe to take data and build features and models from prototype to production—reliably, at low latency, and at scale. Our scope spans ML training infrastructure, model serving and deployment, feature computation and online serving, observability and monitoring, and agentic AI capabilities. We work closely with product teams, data scientists, and platform infrastructure teams to build powerful, flexible, and user-friendly systems that substantially increase ML velocity across the company.

What you'll do

You'll serve as a technical lead across the ML Platform space and a key contributor to the evolution of the platforms that power Stripe's ML-driven products. As a Staff Engineer, you'll make decisions with a large impact on Stripe. You'll influence our investments and strategy while making our systems more reliable, secure, and a delight to use. You'll work cross-functionally with other technical staff, data science, product, and senior leadership to increase the impact of ML at Stripe.

You'll help define the long-term strategy and lead the technical direction for the next generation of ML infrastructure that powers Stripe's ML-driven products.

Responsibilities

• Take ownership of end-to-end architecture and system design for large, complex projects across ML Platform.

• Define technical direction for highly ambiguous projects, transforming complex user needs into long-lasting platform strategy.

• Design system architectures for the most challenging ML Platform problems in one or more areas, including AI and ML workflow orchestration, scalable CPU and GPU compute infrastructure, model training, LLM fine-tuning, low-latency model inference, large-scale feature stores, real-time monitoring, and LLM and agent orchestration.

• Turn high-leverage ideas into tangible, robust solutions that shape platform and product roadmap, combining technical excellence with creative problem-solving.

• Scope and lead large projects with significant business impact, driving them from requirements through design, implementation, and production operation.

• Work with ML engineers, data scientists, and product teams directly to translate their needs into functional requirements and scalable technical solutions.

• Arbitrate critical decisions that balance competing priorities while meeting latency, reliability, cost, and security constraints.

• Serve as a key engineering representative, engaging senior leaders across Stripe and advising the leadership team on key technical considerations related to the end-to-end ML lifecycle.

• Drive cross-team technical initiatives that improve ML development velocity and MLOps maturity across the company.

• Mentor and grow other engineers. Serve as a role model for designing, implementing, and operating great software systems.

Who you are

We're looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.

Minimum requirements

• 10+ years of professional software development experience, or equivalent domain expertise, with a solid background in service-oriented architecture and large-scale distributed systems.

• Track record of serving as a technical lead, with the ability to provide technical direction, lead multi-team initiatives, and mentor team members.

• Experience building and operating production ML platform in one or more areas such as model training, model serving, orchestration, or ML data systems, with requirements for performance, reliability, scalability, and cost efficiency.

• Strong product instincts and a deep understanding of the business context in which you operate.

• Strong communication skills with the ability to explain complex technical concepts to both technical and non-technical stakeholders.

• Demonstrated ability to work cross-functionally, collaborating effectively with ML engineers, data scientists, software engineers, product managers, and business stakeholders.

• The ability to thrive on a high level of autonomy and responsibility, and comfort operating in ambiguous environments.

• Hands-on experience using AI tools to accelerate how you work.

Preferred qualifications

• Experience building large-scale ML training, serving, or data infrastructure for machine learning use cases, such as distributed training, model inference, feature stores, real-time feature computation, and model registries.

• Experience with distributed ML training systems, accelerator-backed compute, training data pipelines, experiment tracking, and model evaluation.

• Experience rapidly developing prototypes and iterating based on user feedback.

• Experience training and shipping machine learning models to production to solve critical business problems.

• Familiarity with LLMs, LLM application frameworks, and agentic AI patterns (e.g., tool use, multi-agent orchestration, retrieval-augmented generation).

• Familiarity with cloud services (e.g., AWS) and cloud-based AI and ML services (e.g., SageMaker, Bedrock, Databricks, OpenAI).

• Ability to synthesize ideas across the organization while setting a compelling technical vision.

• Comfortable working with geographically distributed teams.

• Passion for side projects, open source, or self-driven technical initiatives.

Requirements

Department: 8212 ML Foundations