at Apple
Location
Sunnyvale, United States of America
Compensation
$212k–$318k USD
Type
full time
Posted
3 months ago
Market range · company + function + seniority
p25 · target · p75 · n=38
Posted $318k · well above market
Posting health
Aging · 65Tailor your résumé to this role in 30 seconds.
Free account · ATS keyword check · per-job bullet rewrite by Claude.
As a technical leader within the Cloud Networking organization, you will define and drive the reliability and resiliency architecture for Apple's network platform services. You will be responsible for establishing SRE and SWE best practices, architecting fault-tolerant network control and data planes, and championing data-driven decision-making through observability and automation.
You will drive resilient cloud networking solutions that operate reliably across multiple cloud providers and global regions, handling failures gracefully and maintaining service availability. Your technical leadership will ensure Apple's network services meet demanding availability, latency, resilience, and security requirements while continuously improving operational maturity.
We are looking for a technical expert who deeply understands cloud networking at scale, is passionate about operating mission-critical, globally distributed infrastructure, preventing outages through proactive engineering, and driving long-term reliability improvements through architectural excellence.
Define and drive the long-term technical vision, architecture, and reliability strategy for large-scale cloud networking platforms spanning control plane and data plane systems.
Architect and evolve fault-tolerant, highly available network services, ensuring graceful degradation and consistent performance under partial and systemic failure scenarios.
Establish platform-wide resiliency patterns including service discovery, health checking, automated failover, rate limiting, circuit breaking, and traffic management across multi-region and multi-cloud environments.
Lead the design of network configuration management, routing state distribution, traffic engineering, and capacity planning systems, balancing scalability, correctness, and operational simplicity.
Serve as a senior technical authority and architectural reviewer, influencing critical design decisions across multiple teams and ensuring network failure modes are explicitly addressed.
Build and champion automation-first reliability solutions, including topology discovery, deployment safety mechanisms, self-healing systems, and operational tooling that reduce toil and improve MTTR.
Define and own reliability metrics and observability standards (SLIs, SLOs, error budgets), using data to drive engineering trade-offs, reliability investments, and incident response improvements.
Multiply impact through cross-team technical leadership, embedding reliability early in design, mentoring engineers, and sharing deep technical knowledge through documentation and technical talks.
Extensive experience in software engineering, systems engineering, or infrastructure engineering.
Strong background in designing, operating, and supporting highly available, fault-tolerant distributed systems at hyper scale.
Strong systems programming skills including multi-threading, concurrency, caching, batching
Solid understanding of network infrastructure and software-defined networking (SDN).
Ability to lead cross-functional collaboration and influence technical decisions across teams.
Expert knowledge of API design and interface technologies (JSON, ProtoBuf, REST, RPC, XML, etc)
In depth knowledge of K8s, OpenStack, system virtualization, build systems and infrastructure as code
Strong knowledge of observability systems (metrics, logging, tracing) and qualification engineering.
Broad knowledge of networking solutions across OSI layers 3 through 7.
Excellent written and verbal communication skills with the ability to clearly articulate risk, reliability trade-offs, and operational priorities.
Proven ability to manage competing priorities, drive initiatives to completion, and deliver results in fast-paced environments.
Apple Cloud Networking team builds and operates large-scale, software-defined networking platforms that enable secure, resilient, and highly available multi-cloud connectivity with a global footprint. Our infrastructure powers critical Apple services such as iCloud, iTunes, Siri, and Maps.
We are seeking an experienced and visionary Cloud Network Reliability Engineer to drive the technical strategy and execution for ensuring the availability, performance, scalability, and resiliency of Apple's global network services. In this role, you will work as a technical leader solving complex networking challenges at massive scale, partnering with engineering, infrastructure, and operations teams across Apple to deliver reliable, fault-tolerant systems..
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant
At Apple, we believe accessibility is a fundamental human right. You’ll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong.
Learn about accessibility in Apple’s workplace
Learn about reasonable accommodations for job applicants
Apple accepts applications to this posting on an ongoing basis.
Open postings ranked by description similarity — useful if this role isn't quite right.