I tried to teach VaporLens to answer the hardest Steam question: "Will I like this?"

This post was originally posted on Vaporlens Patreon

Hey folks!

Last update shipped the new extraction layer (price, playtime, time-to-fun, archetypes), and I wanted to do a proper deep dive into how this thing actually works under the hood.

Short version: I didn't build "one more score". I built a mini pipeline that turns chaotic Steam review text into structured signals you can actually use.

Long version below.

The Problem: Scores are cute, decisions are hard

A single 0-100 number is useful for scanning. But when you're deciding whether to buy a game, you usually want answers like:

"What do players think is a fair price?"
"How many hours am I actually getting?"
"Does this click instantly, or after 6 hours of pain?"
"Is this game for my kind of player brain?"

So I needed to move from "is this generally good/bad" to "extract concrete player-facing facts".

That sounds easy until you remember the input is 100k+ multilingual reviews containing memes, sarcasm, build guides, and 17 ways to write "this game is mid".

Step 1: Build a multilingual "sentence miner" first

Before I can ask an LLM anything, I need to find the few hundred lines that actually talk about the target topic.

So the extraction flow starts with a keyword miner that:

works across language-specific vocab files,
uses a segmenter to split text by sentence/word boundaries,
handles CJK text differently (because word boundaries are not the same),
catches unit-glued tokens like 20h, 100hrs, 30min,
extracts short local context windows around matches,
and balances selected sentences across languages so one language doesn't dominate the context window.

This stage is cheap and fast, and it turns entire Steam reviews firehose into topic-specific candidate set.

Step 2: Run the same cleanup gauntlet everywhere

The key change in this release is that scans and extractions now share the same review cleanup layer.

That means before anything hits the model, I run:

minimum-quality checks,
repetitive spam filtering,
normalized near-duplicate removal,
and hard token truncation to stay inside model limits.

The practical effect: less garbage in, fewer weird stuff out.

Step 3: Force structure with strict schemas (not vibes)

Each extraction has its own extraction schema and hard constraints:

Price extraction: priceRangeUsd.min/max, evidence quotes, reasoning, confidence.
Playtime extraction: completion/story/session/endgame hours, nullable when evidence is weak.
Time-to-fun extraction: summary, anchor, minutes-to-anchor, stance, friction, unlock drivers, conditions.
Archetypes extraction: 1-4 personas with motivation/playstyle/experience/recommendation stance.

Important detail: if there isn't enough evidence, extraction returns null instead of making something up.

So "I don't know" is a valid output. That is a feature, not a bug.

Why archetypes were the fun one

Archetypes are my favorite in this batch because they answer a question review scores never can:

"Who is this actually for?"

Instead of "mixed reviews," you now get something closer to:

this works for veterans who like build optimization,
this is sale-only for story-first players,
this is probably a no-buy for newcomers.

That's way more actionable than a single aggregate sentiment number.

The feature I tried to ship, but got absolutely body-slammed by Baldur's Gate 3 horny fans

I also started prototyping a community toxicity scan.

This worked exactly like the other feature scans. Same basic scan pipeline, just a new vocabulary + audit prompt focused on community behavior.

Then I tested it on my default set of testing games (stuff like Warframe, BG3, Hades, etc).

And yes, horny Baldur's Gate 3 fans broke my approach.

The scan kept mixing genuine toxicity signals with giant volumes of "kinky in-game chaos" chatter, which is not the same thing at all. So this is mostly a vocabulary/context disambiguation problem for that domain, not a core pipeline problem.

Toxicity scan is postponed for now. It's still coming, but I'd rather delay than ship a dumb classifier that labels thirsty roleplay threads as "community abuse."

← Back to Blog