The above headline seems relatively straight-forward: If I eat more fish, I will become smarter and sleep better.
However, headlines like this are missing two important words – ‘on average’
When scientists do research, even when the research is done very well (e.g. randomly assigned participants, double blind protocols, proper controls, etc.), they are still only looking at averages. They examine how people behaved without the treatment (in this case those who did not eat fish) and comparing it to people who received the treatment (those who ate fish). Then, if the people who received the treatment, on average, became smarter and slept better, then one can reasonable conclude that eating fish makes you smarter and helps you sleep better…on average.
The ‘on average’ is very important, as this above finding tells you nothing about any individual person eating fish. For example, if, for example, 40% of people did get smarter and have better sleep, 50% of people showed no effect, and 10% of people actually got less smart and had worse sleep – the conclusion that, that eating fish makes you smarter, helps you sleep better (on average) would still be true (or at least supported by the evidence[i].
You may think this is due to the finding/research I chose – and that it was correlational – but this would be true about many findings. “New drug treats chronic pain” “Going outside increases concentration in children” “Women show higher agreeableness than men”.
All of these need to be qualified by the words ‘on average’. That is, a drug like Tylenol might actually increase pain in a small subset of people, but on average, it reduces pain.
However, it becomes even more problematic than the above fish example. While the new drug, for example, may suffer from the same issues as the above fish example (most people get better, but some show no effect, and some actually may have a negative response), even if the new drug was 100% effective and everyone who took it improved the exact same amount, it still wouldn’t tell you anything about an individual person. The person who took the drug, but was experiencing a lot of pain before, would still likely be experiencing more pain than someone who didn’t take the drug but was initially was experiencing little to no pain. Similarly, going outside may help everyone’s concentration – but people who can naturally concentrate extremely well are still likely going to be better concentrators than most, even if they didn’t go outside. Lastly, women may be, on average, higher in agreeableness than men, but men who are extremely high in agreeableness are going to be more agreeable than most women.
This last example, and ones like it, are often becomes the one where people get heated and may even seek to dismiss the findings. One possible issue is that unlike the drug or the going outside example, men and women treatments cannot be randomly assigned in a double-blind experiment with controls. However, this only becomes a problem when want to know why there are differences (which is going to be Part 2 of this blog post). Does higher agreeableness in women reflect, even in part, some inherent genetic disposition or is it exclusively the product of societal norms? These are good and important questions, but regardless of the answer it doesn’t change the descriptive reality. As I’ll detail in part 2, it also provides important information on what to do about it.
Nevertheless, the descriptive reality of statistics like this are just as valid as the descriptive reality of the pain drug or going outside example. Similarly, it tells you nothing about an individual man or an individual woman – it only tells you about averages
Further, comments like the one above is no different than saying that men are stronger than women, on average. Are all men stronger than all women? Of course not. There are relatively weak men and relatively strong women. If I needed to hire someone for a construction job that demanded a lot of physical labour I only had two applicants, a very strong woman and a very weak man, I would of course be sexist and discriminatory to hire the very weak man (assuming that virtually the only real attribute of interest was physical strength and that they didn’t differ in other attributes). But it’s not unlikely that my construction site is mostly or even all men, given the fact that, on average, men are stronger than women.
The problem is that between group differences are not the same (and often much smaller) than within group differences. The difference between individuals on two tails of a distribution almost always dwarfs the difference between the averages of the two groups.
The graph above explores the difference between height between a sample of men and women. Clearly, men are taller than women (on average). The graph above suggests that men have an average height of around 69 inches (5”9 or 175 cm) and women have an average height of approximately 64 inches (5”4 or 162.5 cm) – an average difference of 5 inches. And yet men range (in this sample) from 57 inches (4”9 or 145 cm) to 80 inches (6”8 or 203 cm) – a difference of 23 inches (58 cm) from the shortest to the tallest, while women range from about 53 inches (4”5 or 135 cm) to about 74 inches (6”2 or 188 cm) – a difference of 21 inches (53 cm). Importantly, many of the women in this sample are taller than many of the men. The finding that men are taller than women (on average), tells you virtually nothing about any individual man or woman.
Further, I did not and do not want this to be a men vs. women argument/blog post. I use men and women because it is something that virtually everyone can identify with, but any individual differences can be used to the same effect.
For a less controversial example: Children born into rich families end up better financially as adults than children born into poor families (on average). Do some poor children rise above and do very well for themselves? Of course! Do some children born into rich families end up in relatively poor? Of course! However, the tendency for people to stay in their socioeconomic status, (particularly in the United States, but in general, everywhere) is strong (and something I wrote about here).
For a more controversial example – black men are more likely to be incarcerated than white men (on average). Does that mean that all black men are criminals? Of course not! Does that mean that no white men are criminals. Of course not! Does this stat say anything of note about an individual or even about race in general. Of course not. Is the statistic true. Of course it is.
Ultimately, these descriptive statistics are important because they tell us something about our world. But, equally importantly, they don’t tell us why they exist – Why does the drug reduce pain? Why do children concentrate better after being outside? Why does fish make you smarter? Why are women more agreeable than men? Why do rich children do better financially than rich children? Lastly, they don’t tell us what to do about it and perhaps most importantly they don’t tell us anything about an individual person.
Despite this, people will hear these statistics and they will either deny that they are true (e.g. women and men are not different), or if accepted, they may assert their own inferred causation (e.g. differences between men and women are due solely to genetics or due solely to discrimination), they may misinterpret the statistics and categorize all individuals in the same box (e.g. all women are X and all men are Y) and/or then they may confidently assert what to do about it.
At this point, you may be thinking, this is all obvious or ‘you needed a PhD to learn this?’ While it may be obvious, there are a few reasons to re-iterate it, not to mention a few places where the obvious intuition breaks down.
First, while you may consciously be able to reason and agree with the above, our implicit (e.g. automatic, associative) processes can’t. We inherently categorize people (or things, or any concept). As such, our natural tendency is to automatically associate people into a single group or category. It’s what helps breed stereotyping, tribalism, and us vs. them thinking. It’s not until you realise you’re doing it, that you can take a step back and apply the above reasoning. Constantly reminding yourself of these ideas helps prevent these dangerous human tendencies.
Second, it is clear from our political landscape that people do this all the time. On the left, you have individuals (sometimes with PhDs themselves) arguing that someone deserves certain benefits or advantages simply because they come from a group that is typically less privileged than other groups. And while it’s true that, on average, some groups of people have a better start to life, more advantages, etc., – it is not true about everyone in those groups. In other words, while it’s true that on average, white people are more privileged than black people, you cannot make any strong predictions about one’s privilege based only on the color of their skin. Programs designed to target and assist those less privileged based on a specific ethnicity, may succeed on average, but it is a blunt instrument, allowing for those who are still advantaged to be unfairly helped, while leaving others more disadvantaged to be left behind.
Similarly, on the right, you have people advocating and supporting policies based on their group affiliation – such as the ‘Muslim ban’ – whereby people from specific countries were not permitted to come into the United States. It didn’t matter if you had been living there for the past 5 years and had recently gone home for a visit or if you had been offered to fly-in for a job interview – everyone from these countries was off-limits. Now, while it is likely the case that on average, people from these countries were more likely to be terrorists than people from other countries, you cannot make any strong predictions based only on where they were born.
One of those two examples might have ‘ruffled your feathers’ and you may begin to rationalize it with something like, ‘it’s better than having terrorists in our country’ or ‘it’s to help end/mitigate systemic oppression’. And while these utilitarian arguments may allow for you to feel justified with stereotyping and discriminating based on a group, you still have to acknowledge and accept that it is stereotyping and discriminating. To put this perspective, consider someone suggesting that we should put all men in jail because that would minimize crime. The argument is the same (although you may not agree with utilitarian cost-benefit, particularly if it applies to you).
Lastly, there is at least one way, these ‘obvious ideas’ are not as obvious as they may seem. Consider for a moment the above height graph. If I were to tell you that someone was 66 inches (5″6), it would basically be a coin flip of whether they were a man or woman. However, as you move to the extremes, the percentages begin to change. In this sample, while there are a number of men that exceed 74 inches, none of the women do, and so you could be certain that anyone above 7″2 is a man.
The same is true among traits like aggression and impulsivity. The difference may not be huge between men and women and the within-group variance is much higher than the between-group variance. However, those on the tail-end are the people most likely to end up in jail – and about 90% of inmates are men [ii]
Like this height and personality, example, whenever there is a difference between groups, even if it is small, it is at the extremes where we will see large deviations. Recently, James Damore was fired from Google for outlining differences in men and women that could influence why there were more men at Google. The critics who at least agreed with the science on the differences, argued that the magnitude of the differences he cited couldn’t explain the gap at Google.
But I think it’s fair to say that Google is at the extreme end of the computer-science and tech job world, if not the job world all-together and thus people who work there are likely to be at the extreme end of certain traits as well. Thus, if we can agree that these differences exist, and if we can agree that the differences predict both interest and performance, we would expect large differences at the tail end. Further, if we look at start tech-startup founders (arguably another example of being on the extreme-end of tech-job world), women make up 17% of founders. Tech-startup founders provide a window into understanding the levels of highly motivated people who are interested in tech while eliminating discrimination at the organization/firm level. And while, I concede that there are likely some systemic factors that push women out of tech throughout their lives, this number is the same as women in tech at Google. This equal representation suggests no discrimination on the part of Google.
As noted, this does not provide evidence that there is no systemic bias in society, but I’d argue it is not the responsibility of Google or any company to help rectify the systemic biases in our society is an open question. These issues need to be solved earlier on in life, whether that be marketing towards younger women, programming camps for girls, etc. (things Google could help provide). Otherwise, it is like putting a weight on a runner at the beginning of the race and then taking time off his time at the end of it. Rather, the correct solution to this problem is to take the weight off the runner at the beginning to ensure a fair and equitable race .
Clearly, there is important nuance in any average finding. However, for whatever reason, many people tend to use the averages as ultimate sources of differences, particularly in political discourse. It is sometimes hard to resist, even when you think about these statistical issues all the time. A few years ago, I heard Milo Yiannopolous on Real Time with Bill Maher make the argument that we shouldn’t let trans people use their preferred bathrooms because they are more likely than non-trans people to be involved in sex crimes. I remember quickly seeking to fact check it, implicitly feeling that if true, would be a reason to not allow these preferred bathrooms. However, it is important that even if true (it is not, unless you creatively use involved to also reflect being the victims of a sex crime), the idea that trans people commit more sex-crimes (which to reiterate again – is false) tells you nothing about an individual trans person. The notion that we should fear or discriminate trans people because of some general average is ludicrous. Just as using the stat that men are 10x more likely to commit homicide compared to women, doesn’t tell you that you ought to think men are murderers.
The immediate inclination to quickly take averages and assume they reflect a truth that applies to everyone is something we need to be very careful about. Whether it be with the latest scientific finding that a new drug, exercise, or food helps achieve some desired result or that one group of people are higher or lower on some specified attribute – these findings only tell us about averages and may or may not apply to you or the people you know.
If you enjoyed (or hated) this post, feel free to share it, leave a comment, and/or subscribe. You can also email me personally at email@example.com – I engage with every thoughtful comment.
[i] It is challenging to be able to study whether something has an effect for an individual person. For example, even if you used a repeated measures design at two different times and some people showed less sleep after eating fish, you could not draw the conclusion that it was due to the fish. People will vary considerably between the times (e.g. people who did not get the fish will also show more/less sleep between the two sessions), and thus, any individual who showed less sleep may have done so for a variety of different reasons.
[ii] Obviously, personality traits are not the only predictor. Cultural norms, systemic bias, etc. all play a part. This is a question of ‘why’ the statistics are the way they are and will be addressed in an upcoming post.