← Back to Blog

I tried to teach VaporLens to answer the hardest Steam question: "Will I like this?"

This post was originally posted on Vaporlens Patreon

Hey folks!

Last update shipped the new extraction layer (price, playtime, time-to-fun, archetypes), and I wanted to do a proper deep dive into how this thing actually works under the hood.

Short version: I didn't build "one more score". I built a mini pipeline that turns chaotic Steam review text into structured signals you can actually use.

Long version below.

The Problem: Scores are cute, decisions are hard

A single 0-100 number is useful for scanning. But when you're deciding whether to buy a game, you usually want answers like:

So I needed to move from "is this generally good/bad" to "extract concrete player-facing facts".

That sounds easy until you remember the input is 100k+ multilingual reviews containing memes, sarcasm, build guides, and 17 ways to write "this game is mid".

Step 1: Build a multilingual "sentence miner" first

Before I can ask an LLM anything, I need to find the few hundred lines that actually talk about the target topic.

So the extraction flow starts with a keyword miner that:

This stage is cheap and fast, and it turns entire Steam reviews firehose into topic-specific candidate set.

Step 2: Run the same cleanup gauntlet everywhere

The key change in this release is that scans and extractions now share the same review cleanup layer.

That means before anything hits the model, I run:

The practical effect: less garbage in, fewer weird stuff out.

Step 3: Force structure with strict schemas (not vibes)

Each extraction has its own extraction schema and hard constraints:

Important detail: if there isn't enough evidence, extraction returns null instead of making something up.

So "I don't know" is a valid output. That is a feature, not a bug.

Why archetypes were the fun one

Archetypes are my favorite in this batch because they answer a question review scores never can:

"Who is this actually for?"

Instead of "mixed reviews," you now get something closer to:

That's way more actionable than a single aggregate sentiment number.

The feature I tried to ship, but got absolutely body-slammed by Baldur's Gate 3 horny fans

I also started prototyping a community toxicity scan.

This worked exactly like the other feature scans. Same basic scan pipeline, just a new vocabulary + audit prompt focused on community behavior.

Then I tested it on my default set of testing games (stuff like Warframe, BG3, Hades, etc).

And yes, horny Baldur's Gate 3 fans broke my approach.

The scan kept mixing genuine toxicity signals with giant volumes of "kinky in-game chaos" chatter, which is not the same thing at all. So this is mostly a vocabulary/context disambiguation problem for that domain, not a core pipeline problem.

Toxicity scan is postponed for now. It's still coming, but I'd rather delay than ship a dumb classifier that labels thirsty roleplay threads as "community abuse."

← Back to Blog