This post was originally posted on Vaporlens Patreon
Hey folks!
Last week, I asked you which feature you wanted next for VaporLens. The winner, by a landslide, was the MTX Index (aka the "Casino Score").
The goal sounded simple: Scan 100,000+ Steam reviews and tell me if a game is a rip-off.
I thought this would be a weekend project. I was wrong. It turns out that teaching a computer to tell the difference between "This game is a gem" and "This game costs 500 gems" is a bit of a technical nightmare.
Here is the deep dive into how I built it - and the specific "JSON trick" I used to make a cheap AI model outperform a human analyst.
For a game like ARC Raiders, there are 150,000+ reviews. Maybe 1% of them actually talk about monetization.
If I sent all 150k reviews to an LLM, it would cost me ~$50 per game even on the cheapest models. Since I’m running this on a weekend budget, that wasn’t happening. I needed a "sieve" - a cheap, fast way to filter 150,000 reviews down to a ~thousand that actually matter, before sending them to the LLM.
I started by just searching for keywords like money, pay, greedy, and scam across all languages.
The results were... hilarious. And useless.
I realized I couldn't search for words. I had to search for context.
I ended up building a "Polyglot Compound Dictionary." I stopped looking for "Greedy" and started looking exclusively for "Greedy Devs," "Greedy Company," or "Greedy Pricing." I did this across 14 languages.
The result: 150,000 reviews -> filtered down to a couple thousand high-signal complaints in under a second.
Just when I thought I had the filtering solved, I hit a second, weirder wall: AI doesn't understand gamer slang. To test the system, I ran it on Lobotomy Corporation, a cult-classic single-player management game with absolutely zero microtransactions. The AI gave it a "Predatory Score" of 85/100. Why? Because the AI takes everything literally. The reviews were full of players saying things like "The creature spawns are pure gacha" or "This game is a slot machine of suffering." To a human, that means "The RNG is brutal." To the AI, it meant "This is an online casino." It couldn't distinguish between Gacha as a Gameplay Mechanic (randomness) and Gacha as a Business Model (gambling). I had to patch in a "Real Money Gate" - a logic layer that forces the AI to ask: "Can I use a credit card to solve this?" before flagging a keyword. If the answer is No, the AI now knows it's just a Roguelike, not a scam.
Now that I had a few thousand valid complaints, I fed them into a standard, cost-effective LLM to get a 0-100 "Predatory Score."
But I hit a second wall: The AI was lazy.
When I asked it for a score, it would just hallucinate a generic number like "70/100" and give a vague reason like "Some users complained about price." It wasn't actually reading the reviews, it was just skimming them and guessing.
This is a known issue with non-reasoning models. They try to complete the task as fast as possible. If you ask for the score first, they guess the score before they've analyzed the evidence.
I fixed this using a trick I call "Structural Reasoning."
LLMs generate text linearly, token by token. They can't go back and change what they wrote. If the score field is at the top of your JSON object, the AI has to guess the number immediately.
But... if you force the AI to write the evidence before the score, you force it to "think."
I re-architected the output structure to force the AI to do the homework first:
By the time the AI reaches the score field, it has already "forced itself" to read the evidence and write a summary. It literally cannot hallucinate a "Safe" score if it just spent 500 tokens writing about "$20 Skins."
Once I swapped the JSON order, the quality skyrocketed.
In ARC Raiders, the "Lazy AI" suddenly became a detective. It didn't just say "Pricing is bad." It successfully identified a specific launch-day pricing bug where the Premium Edition currency was miscalculated in the Asian market - a detail hiding in Chinese reviews ("364 vs 3.64") that I never would have found by reading English reviews alone.
Here is the actual result from the new engine:
It even caught the nuance that while the game isn't technically Pay-to-Win (Positive severity), the pricing model feels "High Severity" to users because it's a paid game with F2P pricing.
The "MTX Index" was just the proof of concept. Once I realized this "Audit Engine" actually worked, I went a little overboard.
If I can train an AI to detect Hidden Costs, surely I can train it to detect other hidden headaches?
So, I’m not just launching the MTX Index today. I’m launching the entire "Hidden Cost" Suite.
Starting right now, every game on VaporLens has three new indices:
The goal is simple: No more buying a game only to realize the "Real Price" (in money, time, or mods) is higher than the sticker price.
All four features are live on VaporLens right now. Go break them.