I have a backend I call Free Advice. The name is a joke that’s also a spec: it cycles through free LLM tiers to do its work, and the advice it returns is, structurally, free. It powers a small family of iOS reflection apps — you speak a thought aloud, and you get back a single relevant passage: a Bible verse in Freely Spoken, a line from the Dhammapada in Idle Ashes.

It’s a deliberately humble service. But the constraints I built it under — no budget, no user data, no room to be reckless about people in distress — made it a more interesting piece of engineering than a fatter version would have been.

Constraint 1: it should cost nothing

These apps are non-commercial. A non-commercial app should not run up an API bill. So instead of one paid provider, Free Advice is a resilient multi-provider chain over free tiers: it tries one, and on rate-limit or failure or timeout it falls to the next, and the next. Free tiers are flaky by nature — that’s the price of free — so the architecture treats flakiness as the default case rather than the exception. The reliability doesn’t come from any one provider being good; it comes from never depending on one.

This is a pattern worth internalizing well beyond hobby apps: resilience is cheaper than reliability. You don’t need a provider with five-nines if you can fail over across three providers with two-nines each, fast and silently.

Constraint 2: it should know almost nothing about you

The thing you say to a reflection app is about as personal as input gets. So the design rule was: the backend should be structurally incapable of holding your secrets.

The apps do the sensitive work on-device. Speech is transcribed locally with Apple’s Speech framework. Sentiment analysis and anonymization run locally through Apple Foundation Models. Only sanitized text plus a little emotion metadata ever crosses the network to Free Advice — no raw audio, no full transcript, no device identifiers, no session history. Each request is single-turn and consent-based; the backend has no memory of you because it was never given the materials to build one.

Privacy-by-architecture, not privacy-by-policy. A promise in a policy can be broken quietly. A backend that never receives the data can’t leak what it doesn’t have.

Constraint 3: it shouldn’t make things up, or make things worse

Two failure modes matter more than latency here.

The first is hallucinated scripture. An LLM asked for “a verse about perseverance” will cheerfully invent one that sounds right. That’s unacceptable when the whole point is to return a real passage. So the model’s job is to select, not to author — and the selected text is then verified against canonical source text via an API before it’s returned. The LLM picks; the canon confirms.

The second failure mode is the one that actually keeps me honest: someone reaching for this app in a genuine crisis. A verse is not the right response to “I want to hurt myself.” So there’s a crisis gate in front of the reflection flow — when input reads as crisis content, the app diverts from returning a passage and points toward real support instead. It’s a small amount of code carrying a large amount of responsibility, and it’s the part I’d insist on in any product that invites people to say how they’re really feeling.

What the constraints bought

No budget forced a fail-over architecture that’s genuinely more robust than a single paid endpoint. No user data forced the sensitive computation onto the device, which is where it belonged anyway. And taking seriously that someone might be in pain forced a gate that any app in this space should have.

“Free Advice” started as a pun about LLM pricing. It ended up describing the whole posture of the thing: give something useful, ask for as little as possible in return, and be careful with the person on the other end.