I built a "Predatory Game Detector" (and tricked a lazy AI to make it work)

This post was originally posted on Vaporlens Patreon

Hey folks!

Last week, I asked you which feature you wanted next for VaporLens. The winner, by a landslide, was the MTX Index (aka the "Casino Score").

The goal sounded simple: Scan 100,000+ Steam reviews and tell me if a game is a rip-off.

I thought this would be a weekend project. I was wrong. It turns out that teaching a computer to tell the difference between "This game is a gem" and "This game costs 500 gems" is a bit of a technical nightmare.

Here is the deep dive into how I built it - and the specific "JSON trick" I used to make a cheap AI model outperform a human analyst.

The Problem: The "Needle in the Haystack"

For a game like ARC Raiders, there are 150,000+ reviews. Maybe 1% of them actually talk about monetization.

If I sent all 150k reviews to an LLM, it would cost me ~$50 per game even on the cheapest models. Since I’m running this on a weekend budget, that wasn’t happening. I needed a "sieve" - a cheap, fast way to filter 150,000 reviews down to a ~thousand that actually matter, before sending them to the LLM.

Attempt 1: The "Greedy" Trap

I started by just searching for keywords like money, pay, greedy, and scam across all languages.

The results were... hilarious. And useless.

"Greedy": I wanted to catch "Greedy Devs." Instead, I caught 500 reviews of players saying "I got greedy for loot and died."
"Robo": I added this to catch Spanish reviews about theft ("robo"). Instead, I flagged 2,000 reviews about "Giant Robots."
"Vicio": In Spanish gaming slang, this means "Addictive/Fun." My filter marked the most positive reviews as "Predatory Addiction."

I realized I couldn't search for words. I had to search for context.

I ended up building a "Polyglot Compound Dictionary." I stopped looking for "Greedy" and started looking exclusively for "Greedy Devs," "Greedy Company," or "Greedy Pricing." I did this across 14 languages.

The result: 150,000 reviews -> filtered down to a couple thousand high-signal complaints in under a second.

Attempt 2: The "Literalist" AI (aka The Lobotomy Incident)

Just when I thought I had the filtering solved, I hit a second, weirder wall: AI doesn't understand gamer slang. To test the system, I ran it on Lobotomy Corporation, a cult-classic single-player management game with absolutely zero microtransactions. The AI gave it a "Predatory Score" of 85/100. Why? Because the AI takes everything literally. The reviews were full of players saying things like "The creature spawns are pure gacha" or "This game is a slot machine of suffering." To a human, that means "The RNG is brutal." To the AI, it meant "This is an online casino." It couldn't distinguish between Gacha as a Gameplay Mechanic (randomness) and Gacha as a Business Model (gambling). I had to patch in a "Real Money Gate" - a logic layer that forces the AI to ask: "Can I use a credit card to solve this?" before flagging a keyword. If the answer is No, the AI now knows it's just a Roguelike, not a scam.

The "Lazy AI" Problem

Now that I had a few thousand valid complaints, I fed them into a standard, cost-effective LLM to get a 0-100 "Predatory Score."

But I hit a second wall: The AI was lazy.

When I asked it for a score, it would just hallucinate a generic number like "70/100" and give a vague reason like "Some users complained about price." It wasn't actually reading the reviews, it was just skimming them and guessing.

This is a known issue with non-reasoning models. They try to complete the task as fast as possible. If you ask for the score first, they guess the score before they've analyzed the evidence.

The Solution: "Structural Reasoning" (JSON Chain-of-Thought)

I fixed this using a trick I call "Structural Reasoning."

LLMs generate text linearly, token by token. They can't go back and change what they wrote. If the score field is at the top of your JSON object, the AI has to guess the number immediately.

But... if you force the AI to write the evidence before the score, you force it to "think."

I re-architected the output structure to force the AI to do the homework first:

Step 1 (The Force): The first field in my JSON is now analysis_points. I force the AI to extract direct quotes for every complaint category. It has to read the text to fill this field.
Step 2 (The Synthesis): The next field is summary. It has to summarize what it just found.
Step 3 (The Verdict): The last field is score.

By the time the AI reaches the score field, it has already "forced itself" to read the evidence and write a summary. It literally cannot hallucinate a "Safe" score if it just spent 500 tokens writing about "$20 Skins."

The Result: What it found in ARC Raiders

Once I swapped the JSON order, the quality skyrocketed.

In ARC Raiders, the "Lazy AI" suddenly became a detective. It didn't just say "Pricing is bad." It successfully identified a specific launch-day pricing bug where the Premium Edition currency was miscalculated in the Asian market - a detail hiding in Chinese reviews ("364 vs 3.64") that I never would have found by reading English reviews alone.

Here is the actual result from the new engine:

Score: 75/100 (higher is worse)
Verdict: "Mixed, Leans Predatory"
Summary: "The monetization in ARC Raiders has sparked significant controversy... The most critical issues revolve around overpriced cosmetics in a paid game, limited avenues to earn premium currency, and predatory practices such as the initial pricing bug that devalued purchases for early adopters."

It even caught the nuance that while the game isn't technically Pay-to-Win (Positive severity), the pricing model feels "High Severity" to users because it's a paid game with F2P pricing.

Why I didn't stop there (The Mega Update)

The "MTX Index" was just the proof of concept. Once I realized this "Audit Engine" actually worked, I went a little overboard.

If I can train an AI to detect Hidden Costs, surely I can train it to detect other hidden headaches?

So, I’m not just launching the MTX Index today. I’m launching the entire "Hidden Cost" Suite.

Starting right now, every game on VaporLens has three new indices:

The Mod Tax Index: Does this game work out of the box, or is it "Mixed" because you need 40GB of Nexus Mods to fix the stuttering?
The Wiki Index: Is this game intuitive, or is it unplayable unless you have a Wiki open on a second monitor?
The Steam Deck Index: Does "Verified" actually mean verified, or does the text size require a magnifying glass?

The goal is simple: No more buying a game only to realize the "Real Price" (in money, time, or mods) is higher than the sticker price.

All four features are live on VaporLens right now. Go break them.

← Back to Blog