"Good Article. 4 out of 5 starts" or What is wrong with the star rating system


Imagine you want to buy a new vacuum cleaner. You go to Amazon, search for vacuum cleaner, set some filtering options like: brand, price, color, shape, and you get… tens, if not hundreds, of results. How do you pick the one that satisfies you the most?

Of course, the good old “star rating” system. Its everywhere. TripAdvisor has it for restaurants and attraction; IMDB got it for movies; Amazon for consumer products; Goodreads for books. Its the base line by which we can compare one product to another.

But the star rating system is not about the product, its more about the people.

Lets start with a question

Is product A with 2/5 stars, always worse than product B with 5/5 stars (assuming products A and B belongs to the same product category, for the sake of science behind my words)?

Well, the answer as always: It depends. If the same people who rated product A with 2 stars, rated product B with 5 stars — then yes, we can assume that product B is better than A *to them*.

But very few of us have the hobby of purchasing two different products just for the sake of finding the best one. Yes there are people who get paid to do this and they usually write articles which can be summarized in the following pseudo-quote:

Here is product A with its cons and pros. Here is product B with its cons and pros. Which one should you buy? Well that depends..

So most of the time we find ourselves comparing product A with the rating of 2/5 which was given to it by 3629 people versus product B with the rating of 4/5 which was given to it by 79 people. And while we can read 3–5 short reviews about each product we compare, it is almost impossible to read tens, hundreds or even thousands of reviews about each product we are comparing. This is why the star rating system was invented. We ain’t got no time to read those reviews. 2/5 versus 4/5? 4/5 is clearly better. Done.

Take up my money
Take up my money

But.. Wait! I’ve got a story to tell.

Every idea stars with a personal experience (or just pops up randomly while you are showering), and the idea of how broken the star rating system came to me very long ago when I was shopping for a beard trimmer. I had my budget set, I had my needs outlined and I went to the battle ground in the lands of the internet reviewers. I’ve read reviews, watched reviews, asked friends for opinions and eventually found the one!

So I went to my local Amazon-like website to see where I can buy that awesome beard trimmer, that was the best of the best in its class based on my needs and budget, and I was shocked to see that it got only 3/5 stars!

I was mentally broken. “How come?!” I wondered. “Why?!” I asked myself. “How is it possible that after hours of research, the best I can get is only 3/5 stars?!”. So I’ve read the reviews, there weren’t too many of them. I concentrated on the negative ones, because I already knew that my product was the best of the best, I wanted to know why some people found it to be bad. So there was one guy, who broke for the me the concept of star rating system. His review went like this (I’m rephrasing, it was long time ago):

Very good trimmer. Trims the beard nicely, battery time is long enough and the provided stand is good. Unfortunately can not be used as hair cutting machine. 2/5

Can not be used as hair cutting machine” — random guy on the internet about a beard trimmer.

This was the most negative review, others were good. I was baffled for few minutes. Then went to the store and years after I still have this beard trimmer that still works perfectly.

The star rating system is broken

Humans like visual things. This is why we prefer powerpoint presentations with pie charts and graphs as opposed to boring talks with numbers. And this is why we prefer 5/5 stars as opposed to reading 3681 reviews. And statistically, the more people rate the product higher — the more chances that we will be satisfied by it. But statistics works good on objective data, i.e. “If you will be moving at 10km/h you will arrive in one hour, so in order to arrive in 30 minutes, you need to move 20km/h”, and works very bad on subjective data: “This restaurant sucks. They have only spicy food and I’m allergic to spicy food. 1/5”.

Of course such reviews are the minority, and we can treat them as measurement errors because they are subjective and influenced by the human factor: personal preferences, emotional state at the time of the review, having high expectations from the products prior to purchasing it and etc. And the more people review the product, the less important those measurement error review becomes, because statistics! But this brings me to the first problem.

1. Most reviews are subjective and simple

i.e. “This vacuum cleaner is good, because it cleans the vacuum. Is it the best? No idea, this is the only one I’ve owned”. There is not objective data in this review. I don’t know the size of your apartment, how dirty it is, how frequently you use your vacuum cleaner and etc. It is not possible to make a decision between 2 vacuum cleaners that way, because chances are low that the same group who reviewed the first one, also reviews the second one, therefor rendering the reviews useless.

But honestly, you don’t have to make that decision. Chances are, if both of them are rated 5/5 by hundreds of people, and you’ve outlined your needs correctly, you will be satisfied with any of them. Assuming you will be able to avoid the afterthought of “did I choose the right one?”, which brings me to the second problem.

2. Lets go to watch “The Movie!” — “But it got only 5.4 on IMDB”

Star rating system ruins our experience. Ever checked the rating of the movie on IMDB prior to watching it? Or the rating of the book on Goodreads prior to reading it? If its low, you wont like it, no matter how good it was. If the rating is high, you will come with high expectations and most likely will be disappointed, because the human anticipation is always bigger than the reality. Which in turns brings me to the third problem.

3. What does 5 stars mean?

When I tell you, that this restaurant is 1/5 stars, what do you imagine about it? That its dirty, the waiter takes hours to bring you your food, and when your food arrives its cold, unfresh, and by the time you finish to eat it you want to puke and on top of all that, they make a mistake and charge your credit card 3 times with 25% tip even though you did not want to leave tip at all?!

And what if I tell you its 5/5 stars? You imagine that the food is fresh, the waiter brings you your order even before you finished to think about what you want to order, and they don’t charge you any money at all?

We like to exaggerate our expectations. If its bad — it must be the absolute worst, and if its good — it must be the absolute best. There is no in between. The in between is reserved for 2, 3 and 4 stars. 1 is absolute worst, 5 is absolute best. But is it?

For me, as a reviewer, it is very hard to give either 1 star or 5 stars on a 1-to-5 star scale, to a product / place / item, because I always have this question of “1 star compared to what? Am I sure that this is the absolute worst vacuum cleaner I’ve ever tried? And what happens if the next one will be worse than this one, should I just give this one 2 stars and reserve 1 star for the absolute worst?”. Which in turns brings me to the fourth problem.

4. Reviews are damn hard

Reviewing things is a hard process. If you don’t do it professionally, its hard to avoid subjectivity in the review, in the end you’ve spent your time and your money therefor making bad decision about the particular product — painful and sometimes even expensive.

Most of the reviews are written within a short period from purchase date of the product, and in that time you either experience good feelings from your new fresh product and you unconsciously deny any negative facts; or the opposite, you are so dissatisfied with your product so you continue to search for the bad things in it and just want to bash it over the internet.

But even if you were satisfied with the product, and you left a positive review, but then it suddenly broke after few months of careful usage, you just throw it away and buy a new one, any very few of us go back to their original review to alter it: “Was good, but broke pretty fast”, so the review stays there no longer providing the real picture. And star rating systems do not reflect that information. They only reflect the emotional state and satisfaction level of the individual person at the time of the review.

To sum up

If you don’t have access to high volume market stores like Amazon, where products can get thousands or ten thousands amateur and professional reviews, you are most likely exposed to those “measurement error reviews” where four people are satisfied with their vacuum cleaner, and the fifth person expected it to make him coffee every morning so “1/5”, thus no longer providing the real picture. In addition, people tend to ignore good things and concentrate on the bad ones, therefor if you are satisfied with the product, you rarely go and review it. But if it was terrible, boy you are going to bash about it in every possible corner of the internet because the company that made this piece of shit, must be burnt.

Star rating system was introduced to help us as, consumers, make better choices. Whether it is what movie to watch, what book to read, what restaurant to visit, which hotel to stay in or what vacuum cleaner to buy. But the truth is that star rating systems are based on the subjectivity of each individual towards the product, rather than the objectivity of the product itself.

And subjectivity of each individual towards a specific product or service, is not something that we can map onto a mathematical model because it depends on so many things: emotional state, personal preferences towards things like shape / color / size / manufacturer / taste / smell, the usage environment (robot vacuum cleaners are probably bad for very small, dense apartments for example), prior-to-purchase-expectations and etc.

So what can be done?

Honestly? Nothing. Like many things in our world, its not perfect, but it works most of the time. Chances are that a product with the rating of 3.8/5 that was rated by 3780 people is better choice than a product with rating of 5 that was rated by one person. There are exceptions, in the end its all statistics and people have different needs and tastes.

One possible solution is what YouTube does. If you remember, back in the days, YouTube had ratings for their videos, then they switched to Like / Dislike button. It is still based on statistics, but less prone to measurements errors. However you know what they say:

The greater the number of people who find an idea correct, the more the idea will seem to be correct to us.

- Influence, the psychology of persuasion, Robert Cialdini

Another possible solution would be to split the 5 star rating system into categories. Lets continue with our vacuum cleaner examples. Instead of rating individual vacuum cleaner on a scale of 1 to 5, we can rate it in a scale of 1 to 5 in different categories: how noisy it is, how light it is, how compact it is, how good it cleans, how portable it is and etc. Each category is still exposed to the problems I’ve outlined, but since you split the eggs to different baskets, there is less space for measurement errors.

But the most important thing is to remember that sometimes it doesent really matter. If you are having difficultly to decide between a 4.5 and 4.8 star product, if the both have hundreds of reviewers, chances are, you will be satisfied with both.

You might also like...