Apple Intelligence and On-Device AI: A Builder's Guide
- iOS
- Apple Intelligence
- On-Device AI
- Swift
Apple Intelligence arrived with a lot of marketing and not much guidance for the people who actually have to build on it. Two years in, the picture is clearer. For app teams, the most important piece is not Siri or the headline consumer features. It is the Foundation Models framework, which gives your app direct access to an on-device language model from Swift. This is a practical look at what that model can do, what it cannot, and how to decide when on-device AI is the right tool for an iOS feature.
What is actually on the device
Since iOS 26, Apple ships an on-device language model, around three billion parameters, that powers Apple Intelligence and is exposed to apps through the Foundation Models framework. It runs on Apple silicon across the CPU, GPU, and Neural Engine, which is why it can work with no network connection and no server bill. You reach it from Swift directly; there is no separate SDK to bolt on or API key to rotate.
The key mental model: this is not a general-knowledge chatbot. It is a compact, capable engine for language tasks inside your app: understanding text, classifying it, pulling structured data out of it, and generating short, well-formed output. Ask it to write an essay on world history and you will be disappointed. Ask it to turn a messy block of user text into a typed Swift value and it shines.
Guided generation and tool calling
Two features make the framework genuinely useful rather than a demo. Guided generation lets you constrain the model's output to a structure you define, so instead of parsing freeform prose and hoping, you get values that map onto your types. That single capability removes most of the fragile glue code that made early LLM integrations miserable.
Tool calling lets the model invoke functions you expose, such as fetching data, performing an action, or looking something up, so it can take part in real app workflows instead of only emitting text. Together they turn the model from a novelty into a component you can wire into a feature with predictable behavior.
On-device vs cloud: choosing per feature
On-device AI is the right default when the task is bounded, latency matters, the data is sensitive, or the feature has to work offline. Summarizing a note, tagging a transaction, drafting a reply, extracting fields from a document, ranking a short list: these belong on the device, where they are free, private, and instant.
Cloud models still win for tasks that need broad world knowledge, very large context, or the strongest possible reasoning. The mature approach is not on-device or cloud as a religion; it is routing each feature to the right place. A well-built app often does both: on-device for the common, privacy-sensitive path, with an optional cloud model (Apple's larger models via Private Cloud Compute, or a third-party provider) for the heavy cases. Designing that boundary deliberately is most of the engineering.
Privacy as a real feature, not a slogan
On-device processing changes what you can promise users. When a feature genuinely never transmits data, you can say so plainly, in onboarding, on the App Store page, and in a privacy review, and mean it. For categories like health, finance, legal, and anything touching personal notes or messages, that is not a nice-to-have; it is often what makes the feature shippable at all. Several early adopters of the framework chose it precisely because it let them add intelligence to sensitive data without building, and then defending, a pipeline to a server.
Where it goes wrong
The on-device model is small by design, and small models hallucinate, miss nuance, and fall over on tasks they were not meant for. Treating a three-billion-parameter model like GPT-class infrastructure is the most common mistake we see. The second is skipping evaluation: shipping an AI feature without a test set of real inputs and expected outputs is how you discover failures in App Store reviews instead of in development. On-device AI lowers the cost of adding a feature; it does not lower the cost of doing it responsibly.
Building on it, end to end
Getting this right is the work my team and I do day to day: deciding which features belong on-device, designing the on-device and cloud boundary, wiring up the Foundation Models framework with proper guided generation and evaluation, and shipping the result as a polished native app, from the first idea all the way to App Store approval. If you are weighing an Apple Intelligence feature and want a straight answer on whether it fits on-device, that is exactly the kind of question we like. Tell us what you are building on the contact page.
Frequently asked questions
Is the Foundation Models framework free to use?
Yes. Using the on-device model carries no per-request cost, because inference runs locally on the user's device. You are not paying an API bill for each call. It is available on Apple Intelligence-capable devices running iOS 26, iPadOS 26, and macOS 26 or later.
How big is Apple's on-device model, and what is it good at?
It is roughly a three-billion-parameter model tuned for on-device language tasks: summarization, classification, extraction, structured output, and tool calling. It is deliberately not a general-knowledge chatbot, so it excels at focused in-app tasks rather than open-ended questions about the world.
Can I still use ChatGPT, Claude, or Gemini in my iOS app?
Yes. On-device and cloud models are not mutually exclusive. A common pattern is on-device for fast, private, offline tasks and a cloud model for the heavy cases that need more reasoning or world knowledge. The right split depends on the feature.
Which devices support Apple Intelligence and on-device AI?
The Foundation Models framework runs on Apple Intelligence-capable devices with Apple silicon when Apple Intelligence is enabled. For older devices you will want a graceful fallback, either a reduced feature or a cloud path, which is part of designing the feature properly.