Job Details

Job Information

Governance & Operations Lead, Infrastructure & Planning
AWM-690-Governance & Operations Lead, Infrastructure & Planning
4/23/2026
4/28/2026
Negotiable
Permanent

Other Information

www.apple.com
Cupertino, CA, 95015, USA
Cupertino
California
United States
95015

Job Description

No Video Available
 

Role Number: 200659391-0836

Summary

Apple’s Platform Acceleration & Compute Efficiency (PACE) is a high-leverage team operating at the critical intersection of our ML organizations, underlying compute infrastructure, and core platform tooling. Our mission is to empower Apple’s software engineering teams with efficient, scalable compute. By driving out operational friction and optimizing the broader machine learning ecosystem, we directly accelerate the pace of development across the company.

We are seeking a founding Operations Lead with a passion for ML compute lifecycle management to build and own our governance and operations function. This is a ground-up opportunity to develop the roadmap and build a world-class, high level operations team. You will partner closely with tools and analytics leads to define the governance systems, telemetry frameworks, utilization dashboards, and analytical models that give PACE and Apple's ML leadership a clear, continuously updated picture of how compute is being used. Success for this role means zero project slips due to process, platform, or mis-prioritization. Mastery of this role means establishing and tracking key productivity metrics and proactively solving problems to keep those metrics consistently high. Your work will power core ML analytics both for internal development and inference serving.

Most operations roles involve endless runbooks. This role is for someone who is also an engineer and artist at heart. Your operations work will empower a high-leverage team to drive decisions that have a significant impact on Apple’s financial results.

Description

  • Own the daily operations of the systems you architect. You will design and oversee a scalable hub-and-spoke support model, spanning cross-functional tier-1 on-call teams, tier-2 team leads, and a dedicated tier-3 engineering escalation group that you will build and manage.
  • Own and evolve PACE's governance tooling and related systems, ensuring that compute resource requests, allocations, and utilization data are accurately captured to support rapid, at-scale analysis.
  • Bridge coverage gaps as Apple's ML ecosystem expands to new hardware (GPUs, TPUs, and custom silicon) and workloads (inference, on-device), balancing power, performance, cost, and compatibility.
  • Partner with the Data & Analytics Lead to maintain the analytical layer, building the dashboards, reports, and automated alerts that surface efficiency opportunities and track infrastructure savings.
  • Identify system anomalies and operational bottlenecks that degrade utilization and drive up costs, building financial impact models that translate technical metrics into actionable insights for leadership.
  • Partner with Apple's ML engineering teams, delivering data-driven analytics to optimize the foundation models, inference workloads, and platform tooling that rely on your data for success.
  • Design robust governance processes and automated operations engineered specifically to meet Apple-scale ML demands.
  • Partner to produce strategic analyses that inform executive decisions on ML compute investment, allocation, and strategy, directly influencing Apple's ML growth and feature development.

Minimum Qualifications

  • BS in Computer Science, Data Science, Computer Engineering, or equivalent practical experience

  • 5+ years in a governance/operations role, data engineering, analytics engineering, technical program management, or in a large-scale compute or cloud environment

  • Organized, process-oriented, and comfortable owning operational systems other people depend on daily

  • Strong cross-functional experience working with capable engineers, managers, EPMs, and leaders

  • Proven experience designing and operating complex systems and processes from the ground up

  • AI-fluent and capable of quickly adapting to AI workflows and empowerment

  • Direct experience managing SRE and hierarchical technical support systems

  • SQL and experience building analytical dashboards or data products (Tableau, Looker, Grafana, or similar)

  • Experience designing data models or telemetry schemas for infrastructure, capacity, or utilization data

  • Ability to translate raw technical metrics into clear business narratives for both engineers and executives

Preferred Qualifications

  • Experience with Python for data analysis (pandas, notebooks) or lightweight pipeline development

  • Familiarity with ML training infrastructure concepts: GPU utilization, training throughput, and scheduling efficiency mean, even if you have not optimized them directly

  • Prior experience in FinOps, capacity planning, cloud cost management, or IT governance

  • Experience building or operating data analytics systems

  • Background in automated alerting or anomaly detection for infrastructure metrics

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant (https://www.eeoc.gov/sites/default/files/2023-06/22-088_EEOC_KnowYourRights6.12ScreenRdr.pdf) .

Other Details

No Video Available
--

About Organization

 
About Organization