AI inside your iOS app — what is cheap enough to ship in 2026

If you ship an iOS app with AI in it in 2026, the question is no longer "can we?" but "can we do it for a cost that lets us charge a fair price?" Here are the working economics from a studio that ships four AI-assisted apps.

The 2026 model lineup

The cheap-and-good frontier moves every quarter. As of this writing, the practical options are:

GPT-4o-mini and 4o for general-purpose text, multimodal vision, and reasoning at moderate quality.
Claude Haiku 3.5 for fast, accurate classification and short-form generation.
Gemini Flash for high-throughput, low-cost text tasks.
Apple's on-device Foundation Models for genuinely small tasks — summarisation, simple Q&A — at zero marginal cost.
Open-weights models on Replicate or Together AI for image generation and speech synthesis at predictable per-call cost.

The right choice is not a single model. It is a routing layer that picks the cheapest model that meets the quality bar for each task.

A real cost calculation

For Antique Identifier, the per-photo cost in production breaks down approximately:

On-device crop, classification, hash → free.
Server-side: GPT-4o vision call → ~$0.012 per request.
Comp lookup against our cached database → ~$0.001.
Logging, infra → ~$0.001.
Total: ~$0.014 per appraisal.

At our average of ~30 appraisals per active user per month, that is ~$0.42 per user per month in variable cost. At a $4.99 monthly subscription, the gross margin is healthy.

What is now cheap enough

Image classification with explanation — under $0.02 per call.
Short text generation (1–3 paragraphs) — under $0.005 per call.
Speech-to-text for short clips — under $0.01 per minute.
Custom image generation — $0.03–$0.15 per image depending on quality and provider.
Long-form text generation — variable, control via output token limits.

What is still too expensive for indie pricing

Frequent multi-turn agentic flows with large context windows.
Video generation at meaningful quality.
High-quality voice synthesis at scale (still ~$0.10–0.30 per minute on the best providers).

If your app needs any of these, the math only works at a higher price point or a low usage cap.

Three rules we follow

Always have a cheaper model in the routing layer. A GPT-4o-mini fallback for when the primary model is overkill. A small percentage of requests routed to the cheap model is real money saved.
Cache aggressively. The user just photographed the same teacup three times. The second and third should not pay for a fresh model call.
Cap usage with a soft limit and a paywall, not a hard error. Let the user understand what they hit and what unlocking gets them.

A note on Apple Intelligence

Apple's on-device Foundation Models are genuinely useful for a narrow band of tasks: summarisation, simple Q&A, structured generation. Free per-call cost. The trade-off is quality and the requirement of a recent device. We use them where they fit and route to a hosted model where they do not.

Building AI into an iOS app is no longer exotic. It is just engineering with a new cost variable. Watch the cost variable like you would any other line item, and the math works.