Back
Mar 2026

Grounding AI in the Real World

How our team at Foursquare built a conversational place search that actually works.

Most AI demos look magical. You type a natural language question, and you get a natural language answer. But the moment you point that same model at messy, real-world data, the magic breaks. Millions of places, each with incomplete attributes, ambiguous names, and constantly shifting hours. The model hallucinates confidently. It returns places that closed two years ago. It confuses a taco truck in Austin with a fine dining restaurant in Austin, Texas.

This is the core problem our team set out to solve at Foursquare: how do you let someone ask "find me a romantic Italian restaurant in Chicago" and return real, verified, relevant places instead of a plausible-sounding fabrication?

The answer isn't just "use a better model." It's an architecture problem.

Place data is not static. A restaurant changes its hours seasonally. A cafe closes permanently on a Tuesday and nobody updates the listing for weeks. A bar adds a rooftop patio, starts hosting live music on Thursdays, raises its prices. The real world is in constant motion, and any system that tries to answer questions about it needs to be anchored to data that reflects that motion, not to a language model's frozen training snapshot. This is what makes place search fundamentally different from general question answering. The facts change every day.

So grounding is the whole game. The model can never be the source of truth. It has to be the interface to a source of truth. Every answer has to trace back to verified, continuously updated place data, not to whatever the model learned during training.

We built a system with two execution paths. Simple queries, the ones that map cleanly to a known concept like "coffee shops" or "sushi near me," take a fast path. The system classifies the query, matches it against a structured index of place concepts, and executes a direct search. No LLM reasoning required. These resolve in milliseconds.

Complex queries like "a quiet place to work with good wifi and outdoor seating" take a different path. Here, an AI agent reasons over the query step by step, calling into Foursquare's structured place data using tools for geocoding, concept lookup, and venue search. The key insight is that the model never answers from its own knowledge. It's grounded. Every claim maps back to indexed, verified data.

We built a concept index, a searchable layer of place attributes like "outdoor seating," "good for dates," or "live music," each represented as a high-dimensional vector embedding. When a user says "romantic dinner spot," the system doesn't just keyword-match. It finds that the query is semantically close to a curated concept like "Date Night," which carries its own structured filters: Italian or French or Japanese restaurants, with a romantic or intimate atmosphere, boosted by wine selection. The AI doesn't invent this. Humans curate the concept definitions. The AI just finds the right one.

This hybrid of semantic understanding and structured retrieval is what makes it work. Vector search finds the meaning. Structured filters enforce the constraints. The model reasons about how to combine them, but never fabricates a venue, never guesses at a rating, never makes up an address.

We also had to solve for speed. An API that takes five seconds to answer a simple question isn't useful, no matter how smart it is. So we built a classification layer that looks at the query first and decides whether it can be answered with a direct index lookup or needs multi-step reasoning. Most queries take the fast path. The slow path is reserved for genuinely complex requests, and even there, we cap the number of reasoning steps to prevent runaway latency.

Underneath all of this is what we call the taste system. Foursquare has decades of place data: tips, reviews, menu items, photos, check-ins. We extract structured "tastes" from all of that. Attributes like "good cocktails" or "cozy atmosphere" that are tied to specific venues with affinity scores. These aren't keywords. They're weighted, provenance-tracked signals that tell you not just that a place has outdoor seating, but how strongly that attribute is associated with it and where that signal came from. A mention in 50 user tips is a stronger signal than a single menu item description. And because these signals are recomputed regularly from fresh data, they stay current as places evolve.

The result is an API where AI does what it's good at (understanding natural language, resolving ambiguity, reasoning about intent) and the structured data layer does what it's good at: being correct and being current. The model is the interface. The data is the source of truth. Neither works without the other.

Let's chat