Job Details

Back to Search

Job Information

Job Title :

Sr. / Staff ML Engineer, FM Training Integration - ML Compute

Job Code :

AWM-1807-Sr. / Staff ML Engineer, FM Training Integration - ML Compute

Job Announced :

5/9/2026

Job Closed :

5/14/2026

Pay Rate:

Negotiable

Duration:

Permanent

Other Information

Organization Name:

Apple

Organization Url:

www.apple.com

Address :

Santa Clara, CA, 95054, USA

City :

Santa Clara

State :

California

Country :

United States

Zip Code :

95054

Job Description

Weekly Hours: 40

Role Number: 200661658-3760

Summary

We are a group of engineers to support training foundation models at Apple! We build infrastructure to support training foundation models with general capabilities such as understanding and generation of text, images, speech, videos, and other modalities and apply these models to Apple products. We are looking for engineers who are passionate about building systems that push the frontier of deep learning in terms of scaling, efficiency, and flexibility and delight millions of users in Apple products.

Description

We are looking for a ML Engineer to join our ML Compute team to help improve the efficiency, scalability, and reliability of model training and inference workloads in the cloud. In this role, you will lead the integration of large-scale ML workloads with cloud infrastructure, working cross-functionally with ML engineers, infrastructure engineers, and researchers to optimize performance, improve system efficiency, and drive high utilization of accelerator resources.

Minimum Qualifications

5+ years of experience in software engineering, ML infrastructure, or related domains.
Hands-on experience with machine learning workflows, including training, evaluation, and inference at scale.
Proficiency in Python and experience with at least one major ML framework (e.g., PyTorch or JAX).
Experience with cloud-based infrastructure and distributed systems (e.g., containers, orchestration, storage, and networking).
Bachelor’s degree in Computer Science, Engineering, or a related field.

Preferred Qualifications

Experience working with accelerator-based systems (e.g., GPUs/TPUs), including performance tuning and debugging of ML workloads.
Hands-on experience with distributed training or inference at scale (e.g., data, model, or pipeline parallelism).
Experience optimizing large-scale ML systems, including bottleneck analysis across compute, memory, and networking.
Familiarity with profiling, tracing, and benchmarking tools for ML workloads (e.g., PyTorch Profiler, NVIDIA Nsight).
Experience building or operating ML infrastructure using containerization and orchestration frameworks (e.g., Docker, Kubernetes).
Advanced degree in Computer Science, Engineering, or a related field.

Other Details

About Organization

Other Jobs

View other jobs from this employer

Apply Back