Color memory test: what your score actually says about how you remember color

Search for “color memory test” and the first page mixes three completely different things. A grid of coloured tiles you flip over in pairs. A row of hue chips you have to drag into order. And the kind of test we run here, where a colour flashes for a few seconds and you have to recreate it on a slider. All three get called a color memory test, but they measure different abilities, and a strong score on one says almost nothing about the others.

This piece is an attempt to sort the three apart. What each one actually tests, what counts as a good score on each, and where the version we built fits in the picture.

The three things people mean by “color memory test”

Tile-match memory. The familiar concentration grid you grew up with — pairs of coloured tiles face-down, you flip two at a time and try to remember where each colour is. The colour itself is almost irrelevant. What you are training is spatial working memory: where things are, not what shade they are. The colour just gives each pair a unique label that is faster to recognise than a number or a letter would be.

Hue ordering. The clinical version. The Farnsworth-Munsell 100 Hue test, used since 1943, gives you a row of small colour chips that step gradually around the hue circle. You drag them into the correct order. This is a test of discrimination, not memory — the chips stay on screen while you sort them. The score (the “total error”) tells you which parts of the spectrum your eyes are weakest at, and it is sensitive enough that opticians and colourists use it professionally. It cannot tell you anything about memory because nothing is ever hidden from view.

Exact-colour reproduction. A colour appears, then disappears, and you have to dial it back in. This is the test format used in cognitive-psychology studies of visual working memory, and it is what our Solo game and Daily challenge measure. The colour is shown for a fixed window (four seconds in our default), then the slider screen replaces it, and your guess is compared to the original with a perceptual distance formula. This is the only one of the three that actually tests colour memory in the psychological sense of the word.

What “exact-colour reproduction” actually measures

Visual working memory has a small budget — three to four items at a time, decaying within seconds — and the contents are not a photograph. They are a compressed sketch. When you try to recreate the colour you saw, you are not pulling up a raw pixel value. You are reaching for a label-plus-gist (“a fairly bright teal, leaning blue”) and decoding that label back into a slider position.

The reason this format makes a sharper test than the other two is that it forces you to commit to a specific colour rather than choose between a small set of visible options. There is no recognition shortcut. If you are off by twenty units of hue, your guess will show it, and the scoring formula will price it accordingly. We dug into the cognitive machinery behind this in more detail, but the short version is: this format is deliberately uncomfortable, because the discomfort is the test.

How the score is computed, and what numbers to expect

Most casual colour games score guesses with RGB distance, which looks objective but gets the perceptual maths wrong (a small numerical gap in a green region feels much larger than the same gap in a blue one). The industry-standard replacement is CIEDE2000, a formula published by the International Commission on Illumination in 2001 that is built to make a distance of n in greens feel like a distance of n in blues. We use CIEDE2000 to convert the raw colour-difference into a 0–10 round score.

Some rough benchmarks for what a CIEDE2000 distance feels like to a human observer:

ΔE ≈ 1. A trained observer can just barely tell the two colours apart side-by-side. This is the perceptual just-noticeable-difference threshold most graphics references quote. From memory, after a four-second delay, almost no one is ever this close. ΔE ≈ 2–3. The two colours are clearly different in a direct comparison but feel like the same colour family. On our scoring this comes out as a 9.6–9.8 per round and is what very strong players hit on their best rounds. ΔE ≈ 5. A confident, attentive guess after a four-second flash. The colour is the right family, the saturation is in the right region, and the brightness is roughly right. A little over 9 per round — the curve is deliberately flat near the top, so the last few tenths are the hard-won ones. ΔE ≈ 10. Right hue family, wrong saturation or brightness. Around 8 per round, and the most common pattern for new players who have not yet learned to commit to a label during the flash. ΔE ≈ 20+. Wrong hue family. The colour name you gave it during the flash was off, and from there the slider could not recover. Below 6 per round, falling fast as the distance grows.

What a “good score” looks like across the three tests

Because the three test types measure different abilities, “a good score” is a different number on each:

Tile-match. Good performance is measured in moves or time. A grid of twelve pairs in fewer than around twenty-five moves is a strong run, and the ceiling improves quickly with practice — it is a working-memory test, and the working-memory store is genuinely trainable up to its biological ceiling.

Farnsworth-Munsell 100 Hue. Total error scores below 16 are classified as superior discrimination, 16–100 as average, and 100+ as low discrimination. Most untrained observers fall in the 40–80 range. Designers, photographers, and dyers who handle colour daily tend to land under 30 (Farnsworth, 1943). This is largely about your eyes, not your practice.

Exact-colour reproduction. On our 0–10 per-round scale, an average run lands around 5–6 per round for new players. Experienced players settle at 7–8. The top of the global Daily leaderboard sits consistently above 8.5 per round. Nobody scores 10s in any volume, because that would require sub-JND recall from memory, which is biologically out of reach.

Why the three tests don’t correlate as much as you’d expect

A strong tile-match player is good at spatial working memory and at recognising paired colour labels — neither of which requires precise colour perception. A strong Farnsworth-Munsell scorer has excellent retinal hue discrimination but does not necessarily have good working memory; the chips stay visible throughout. A strong exact-reproduction scorer is good at building a verbal gist of a colour quickly, at retaining that gist for a few seconds, and at decoding it back into slider values — which is partly perception, partly memory, and partly translation between the two.

The result is that someone who tops the leaderboard on one of the three formats can be mediocre at the other two. The closest thing to a single “colour memory ability” that survives across all three is verbal colour vocabulary. Knowing the difference between teal and seafoam, between magenta and fuchsia, between burnt orange and rust, helps every one of them: it gives you faster, more discriminating labels, and labels are how colour survives the trip into and out of memory.

What you can train, and what you can’t

Retinal hue discrimination — the thing Farnsworth-Munsell measures — is mostly fixed by your cone biology, with some marginal improvement from prolonged exposure to a wide colour palette. Spatial working memory has a fixed ceiling but reaches that ceiling quickly with practice. The thing that genuinely trains is the third one: building a richer colour vocabulary and getting faster at attaching a confident label to a colour during the brief window you have to look at it.

In practice, this is what improves over a few weeks of daily play. The hue calls become more reliable, the saturation estimates become less drifty, and the gap between “the colour you saw” and “the colour you reproduced” narrows. That gap, closing, is the entire skill the format trains.

Where to take each test

For tile-match: any free online concentration game does the job. It is the least interesting of the three and the most widely available.

For Farnsworth-Munsell: the official test is paywalled, but several research-grade web reproductions exist and are useful for a rough screen. Treat low scores as suggestive, not diagnostic — a proper optician’s setup uses calibrated chips under controlled lighting.

For exact-colour reproduction: this is the one we built. The Daily challenge gives you five rounds with the same target colours every other player sees that day, so scores are directly comparable. Solo lets you run unlimited rounds for practice, and the Memory Stack variant pushes the working-memory limit by chaining four colours into one slider screen.

References

Farnsworth, D. (1943). The Farnsworth-Munsell 100-Hue and Dichotomous Tests for Color Vision. Journal of the Optical Society of America, 33(10), 568–578. The original publication describing the hue ordering test and its scoring brackets, still the basis for the modern clinical version.
Sharma, G., Wu, W., & Dalal, E. N. (2005). The CIEDE2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observations. Color Research & Application, 30(1), 21–30. The reference implementation of the CIEDE2000 formula used in the scoring of the exact-reproduction format.
Bae, G.-Y., Olkkonen, M., Allred, S. R., & Flombaum, J. I. (2015). Why some colors appear more memorable than others: A model combining categories and particulars in color working memory. Journal of Experimental Psychology: General, 144(4), 744–763. Source for the verbal-label gist mechanism that dominates exact-colour recall performance.