How I Broke (and Fixed) Game Discovery on Vaporlens: A Devlog

This post was originally posted on Vaporlens Patreon

As you might've noticed - I’ve recently upgraded the recommendation engine for Vaporlens. Until recently, it relied on standard 768-dimension embeddings from google's text-embedding-004 model to calculate game similarity.

The old logic was simple: I took 5 categories from review summary (Positive Points, Negative Points, Gameplay, and Misc; omitting performance as it's not really relevant for similarity), generated an embedding for each of those, and then concatenated them into one massive vector with 768x5 = 3840 dimensions.

It worked well enough, but google decided to deprecate their model, so I needed to change something. At first, I've played around with local (ONNX) embeddings models to see how far those could take me. And for the size of models - results were honestly pretty solid, but not quite as good as I wanted them to be.

So I decided to switch to Mistral embeddings. They are 1024 dimensions, offer a much larger context window, and generally "understand" nuances better than older models.

But there was a catch. 1024x5 = 5120 dimensions. If I stuck to my old concatenation strategy, my vectors would blow past the limits of pgvector indices (which cap out at 2000 dims for HNSW halfvecs). I literally couldn't store them efficiently.

Here is how I tried to fix it, how I broke it, and how I finally got it right.

Attempt 1: The "Let the LLM Handle It" Approach

My first thought was: Why am I still chopping this up manually?

Mistral has a huge context window. Instead of generating 5 separate vectors, I could just format the entire game summary into one structured Markdown document and generate a single 1024-dimension embedding.

I wrote code that produced a markdown with headers plus content text (e.g. "Positives: [point] - [point description], ...").

The logic seemed sound. The vector would be smaller (1024 dims), fits perfectly in Postgres vectors, and the model would theoretically understand the relationship between "Good Graphics" and "Bad UI" holistically.

I deployed it. The vectors were generated. I checked games I've played to see if new similarity results were any good.

The results were... weird.

Games that had nothing to do with each other were suddenly 95% similar. ARC Raiders was matching with Rocket League (I could see where it was coming from - both online, session, multiplayer, complaints about toxicity, but gameplay wasn't even close), and Elden Ring was matching with distinctively non-Soulslike indie games. The recommendations felt off.

The Silent Killer: Common Header Noise

It turns out, I had introduced a massive amount of noise into the dataset.

By structuring every single game with the exact same headers ("## Positive Aspects", "## Negative Aspects", etc), I was forcing the model to embed those specific tokens over and over again.

When you compare Game A and Game B, the model sees:

Game A: "## Positive Aspects... ## Negative Aspects..."

Game B: "## Positive Aspects... ## Negative Aspects..."

The embeddings were clustering based on the structure of my prompt, not the content of the game. I had inadvertently trained vaporlens similarity engine to find "Text formatted like a list," which, unfortunately, was every single game in my database.

On top of that, there was "information smearing." A game with a 2,000-word "Story" section and a 50-word "Gameplay" section produced a vector dominated entirely by the story. I lost the ability to enforce that gameplay is always more important than everything else.

Attempt 2: Back to Math (The Fix)

I needed to go back to the original idea - treating categories as distinct signals - but I needed to keep the vector size at 1024.

The solution was Weighted Averaging.

High-dimensional vectors are additive. If you take a vector representing "Good Combat" and average it with a vector representing "Bad Story," you get a vector that points to the region of space containing games with both those traits. More importantly, if you average five 1024-dim vectors, you still get a 1024-dim vector.

So, I rewrote the pipeline to:

Isolate the signals: Take the text for "Positive Points" (and only the text - no headers this time around).
Batch Embed: Send all 5 categories to Mistral (in a single API call which was much faster than before).
Weighted Mean: Combine the 5 resulting vectors mathematically.

This allows me to explicitly tell the engine: "Gameplay is 1.5x more important than everything else, and Recommendations are 0.8x less important."

The Result

The "odd" similarity results vanished immediately.

No Header Noise: The model only sees the raw semantic content of the reviews.
No Size Explosion: The vector stays at 1024 dimensions, keeping Postgres (and my tiny VPS) happy.
Better Signals: "Gameplay" is once again a distinct, heavily weighted factor in the similarity search.

So, turns out sometimes the "smarter" way (shoving everything into a massive Context Window) isn't actually better. Sometimes you just need a little bit of linear algebra.

And now back to work on that most upvoted MTX index feature (expect a write-up about it as well, because "smarter" way is too damn expensive) 👀

← Back to Blog