Most "how to add AI to your iOS app" tutorials get you to the first API call and stop. The actual work is the next ten things. This is a practical walkthrough of shipping an iOS app that calls a hosted LLM, from API key to live App Store, based on what we did for Antique Identifier and Minutelore.
1. Do not put the API key in the app
We see this every week in indie projects on GitHub. Embedding your OpenAI key in the iOS bundle exposes it to anyone with thirty minutes and a binary inspector. They will use it. Your monthly bill will explode.
Proxy every model call through a small server. We use Cloudflare Workers for ours — three minutes of setup, free tier covers most apps until they are profitable. The Worker validates a per-user JWT, applies a per-user rate limit, and forwards the request to OpenAI with the real key.
2. Decide what is on-device vs server-side
Apple's on-device frameworks (Vision, Natural Language, the new Foundation Models in Apple Intelligence) are good and getting better. Use them when:
- Latency must be sub-200ms.
- The data is genuinely private and should not leave the device.
- The cost per call is high and you cannot reasonably charge for it.
Use a hosted LLM when:
- You need accuracy that on-device models cannot yet provide.
- You want centralised control of the prompts and outputs.
- You can pass the cost through pricing.
In Antique Identifier, the photo classification runs partly on-device and partly through GPT-4o vision. The split is not ideological; it is economic.
3. Watch the per-call cost like a hawk
Before you build anything, calculate:
- Average tokens per request.
- Average requests per active user per month.
- Cost per million tokens for your model.
Multiply. The number you get is your monthly cost per active user. If it is more than 30% of your subscription price, you have a problem. Either the prompt is too long, the model is too big, or the pricing is too low.
For Antique Identifier, we got the per-call cost down by ~70% across the first three months by:
- Cropping images on-device before sending.
- Tightening the system prompt by hundreds of tokens.
- Caching common-object recognitions on the server.
4. Handle the slow path
An LLM call takes 1–6 seconds. Your UI must handle this gracefully. Streaming responses where applicable. A genuine, reassuring loading state. A clean retry on failure. Do not show a hung spinner — it loses users every time.
5. App Review will look hard at AI apps
Apple's reviewers are now well-versed in AI apps. They look for:
- Clear disclosure that AI is involved.
- A reasonable content moderation policy.
- Age-appropriate content for the rating.
- A working refund / report mechanism.
We got rejected on Minutelore once for not explaining the content filter clearly enough in the App Review notes. We re-submitted with a one-page memo and got approved within a day. Write the memo proactively next time.
6. Set up the kill switch
From day one, build a remote kill switch that disables the AI features without requiring an app update. If a model behaves badly, if costs spike, if there is a regulatory issue — you want to be able to turn it off in five minutes, not five days.
7. Ship it
The first version will not be perfect. Ship it. The data you get from real users in the first two weeks is worth more than another month of internal testing.
If you want to discuss a specific AI app idea, write us. Happy to give an honest engineering opinion.