If you ship an iOS app with AI in it in 2026, the question is no longer "can we?" but "can we do it for a cost that lets us charge a fair price?" Here are the working economics from a studio that ships four AI-assisted apps.
The 2026 model lineup
The cheap-and-good frontier moves every quarter. As of this writing, the practical options are:
- GPT-4o-mini and 4o for general-purpose text, multimodal vision, and reasoning at moderate quality.
- Claude Haiku 3.5 for fast, accurate classification and short-form generation.
- Gemini Flash for high-throughput, low-cost text tasks.
- Apple's on-device Foundation Models for genuinely small tasks — summarisation, simple Q&A — at zero marginal cost.
- Open-weights models on Replicate or Together AI for image generation and speech synthesis at predictable per-call cost.
The right choice is not a single model. It is a routing layer that picks the cheapest model that meets the quality bar for each task.
A real cost calculation
For Antique Identifier, the per-photo cost in production breaks down approximately:
- On-device crop, classification, hash → free.
- Server-side: GPT-4o vision call → ~$0.012 per request.
- Comp lookup against our cached database → ~$0.001.
- Logging, infra → ~$0.001.
- Total: ~$0.014 per appraisal.
At our average of ~30 appraisals per active user per month, that is ~$0.42 per user per month in variable cost. At a $4.99 monthly subscription, the gross margin is healthy.
What is now cheap enough
- Image classification with explanation — under $0.02 per call.
- Short text generation (1–3 paragraphs) — under $0.005 per call.
- Speech-to-text for short clips — under $0.01 per minute.
- Custom image generation — $0.03–$0.15 per image depending on quality and provider.
- Long-form text generation — variable, control via output token limits.
What is still too expensive for indie pricing
- Frequent multi-turn agentic flows with large context windows.
- Video generation at meaningful quality.
- High-quality voice synthesis at scale (still ~$0.10–0.30 per minute on the best providers).
If your app needs any of these, the math only works at a higher price point or a low usage cap.
Three rules we follow
- Always have a cheaper model in the routing layer. A GPT-4o-mini fallback for when the primary model is overkill. A small percentage of requests routed to the cheap model is real money saved.
- Cache aggressively. The user just photographed the same teacup three times. The second and third should not pay for a fresh model call.
- Cap usage with a soft limit and a paywall, not a hard error. Let the user understand what they hit and what unlocking gets them.
A note on Apple Intelligence
Apple's on-device Foundation Models are genuinely useful for a narrow band of tasks: summarisation, simple Q&A, structured generation. Free per-call cost. The trade-off is quality and the requirement of a recent device. We use them where they fit and route to a hosted model where they do not.
Building AI into an iOS app is no longer exotic. It is just engineering with a new cost variable. Watch the cost variable like you would any other line item, and the math works.