This post was originally posted on Vaporlens Patreon
Hey folks!
Last update shipped the new extraction layer (price, playtime, time-to-fun, archetypes), and I wanted to do a proper deep dive into how this thing actually works under the hood.
Short version: I didn't build "one more score". I built a mini pipeline that turns chaotic Steam review text into structured signals you can actually use.
Long version below.
A single 0-100 number is useful for scanning. But when you're deciding whether to buy a game, you usually want answers like:
So I needed to move from "is this generally good/bad" to "extract concrete player-facing facts".
That sounds easy until you remember the input is 100k+ multilingual reviews containing memes, sarcasm, build guides, and 17 ways to write "this game is mid".
Before I can ask an LLM anything, I need to find the few hundred lines that actually talk about the target topic.
So the extraction flow starts with a keyword miner that:
20h, 100hrs, 30min,This stage is cheap and fast, and it turns entire Steam reviews firehose into topic-specific candidate set.
The key change in this release is that scans and extractions now share the same review cleanup layer.
That means before anything hits the model, I run:
The practical effect: less garbage in, fewer weird stuff out.
Each extraction has its own extraction schema and hard constraints:
priceRangeUsd.min/max, evidence quotes, reasoning, confidence.Important detail: if there isn't enough evidence, extraction returns null instead of making something up.
So "I don't know" is a valid output. That is a feature, not a bug.
Archetypes are my favorite in this batch because they answer a question review scores never can:
"Who is this actually for?"
Instead of "mixed reviews," you now get something closer to:
That's way more actionable than a single aggregate sentiment number.
I also started prototyping a community toxicity scan.
This worked exactly like the other feature scans. Same basic scan pipeline, just a new vocabulary + audit prompt focused on community behavior.
Then I tested it on my default set of testing games (stuff like Warframe, BG3, Hades, etc).
And yes, horny Baldur's Gate 3 fans broke my approach.
The scan kept mixing genuine toxicity signals with giant volumes of "kinky in-game chaos" chatter, which is not the same thing at all. So this is mostly a vocabulary/context disambiguation problem for that domain, not a core pipeline problem.
Toxicity scan is postponed for now. It's still coming, but I'd rather delay than ship a dumb classifier that labels thirsty roleplay threads as "community abuse."