Principal Machine Learning Engineer
- US$240000 - US$280000 per annum + Bonus, benefits
- USA
- Permanent
Principal Machine Learning Engineer
Location: Remote - U.S Bases - not eligible for VISA transfer/sponsorship
Industry: Ad Tech
Salary: $240,000 - $280,000 base + benefits
About the Role
I have partnered with a leading AI platform in the programmatic ad buying space. A recognized leader in space; they are seeking a Principal Machine Learning Engineer to drive the strategy, design, and execution of next-generation ML systems across the company.
This is a hands-on, high-leverage role for a senior technical leader with deep experience in distributed ML, large-scale MLOps, and ad tech systems. You will define the ML roadmap, build scalable infrastructure, mentor senior engineers, and influence product and business strategy.
As a principal-level engineer, you will not only lead technical execution but also shape the vision for AI products, ML infrastructure, and observability frameworks. You will partner with leadership across Engineering, Product, and Data Science to deliver systems that directly powers their AI products.
Key Responsibilities:
Strategic Leadership & Technical Vision
- Define and drive the ML strategy across multiple product lines, aligning technical decisions with business objectives and KPIs.
- Set best practices for architecture, design, and deployment of distributed ML systems in a fast-growing startup environment.
- Lead technical design reviews, enforce engineering standards, and guide adoption of modular, maintainable, and scalable ML infrastructure.
- Mentor and grow senior and mid-level ML engineers; foster cross-functional collaboration with Product, Data Science, and Engineering teams.
Distributed ML Systems & Model Development
- Architect, implement, and optimize large-scale neural network systems for audience modeling, bid optimization, and real-time decisioning.
- Lead multi-GPU, distributed training pipelines using PyTorch + Ray (Train, Tune, DDP), including automated hyperparameter search (ASHA, early stopping).
- Design robust feature engineering pipelines with PySpark and embedding layers for categorical, behavioral, and contextual features.
- Establish system-wide standards for model evaluation, champion/challenger workflows, and performance benchmarking.
MLOps & Production Excellence
- Own the end-to-end ML lifecycle, from training and batch inference to monitoring, observability, and automated rollback/recovery.
- Architect fault-tolerant, reproducible ML pipelines leveraging Databricks, Delta Lake, Unity Catalog, MLflow, and cloud platforms (AWS S3, EC2).
- Define and implement model versioning, artifact management, experiment tracking, and observability standards across products.
- Collaborate with engineering teams to optimize production dataflows, ensure high availability, and scale infrastructure for multiple product lines.
Innovation & Future ML Capabilities
- Evaluate and integrate emerging ML technologies (LLMs, vector embeddings, reinforcement learning, large-scale ETL/ELT, feature stores).
- Explore new approaches to programmatic optimization, audience modeling, and AI-driven bid strategies.
- Provide technical leadership in cross-functional planning and product roadmap discussions; influence strategic decisions on ML infrastructure and AI products.
Required Qualifications:
- Master's or PhD in Computer Science, Statistics, Machine Learning, or related field with 10+ years of ML engineering experience, including distributed systems.
- Deep expertise in PyTorch (custom architectures, embedding layers, MLPs, binary classification heads).
- Proven experience designing and deploying production ML systems at scale (Databricks, Delta Lake, Unity Catalog, Ray, MLflow).
- Expert-level Python and PySpark skills; experience with large-scale feature engineering and batch inference pipelines.
- Strong MLOps knowledge: versioning, monitoring, reproducibility, model serving, observability (Prometheus, Grafana, Datadog).
- Cloud platform experience (AWS S3, EC2) and data warehousing (Snowflake).
- Experience building and mentoring teams, leading cross-functional projects, and influencing product and technical strategy.
- Strong communication skills and ability to work with senior stakeholders in fast-paced environments.
- AdTech/programmatic advertising experience (DSPs, bid optimization, lookalike modeling).
- Experience with LLMs, vector embeddings, reinforcement learning, feature stores, or clean rooms.
- Distributed training across multi-GPU clusters using Ray (Train, Tune, Datasets).
- Experience deploying ML in Kubernetes-based environments and integrating event-driven messaging systems (SQS, SNS, MSK, Red Panda).
- Experience in CI/CD automation, internal ML libraries, and observability tooling at scale.
Sphere Digital Recruitment currently have a variety of job opportunities across digital so feel free to get in touch with us to find out how we can help you. Please take a look at our website.
Sphere is an equal opportunities employer. We encourage applications regardless of ethnic origin, race, religious beliefs, age, disability, gender or sexual orientation, and any other protected status as required by applicable law.
If you require any adjustments or additional support during the recruitment process for any reason whatsoever, please let us know.