## Wednesday, November 11, 2009

### Mutual Independence

Suppose you did a statistical survey, and you found out that heart disease and diabetes had nothing to do with each other.

That's not publishable, so suppose you did another. To your despair, you found out that diabetes and overweight had no relation either.

In desperation you tried looking for correlations between overweight and heart disease.

But there was no correlation.

According to your data, diabetes and heart disease are independent, diabetes and overweight are independent, and heart disease and overweight independent.

How surprised would you be if your research assistant later showed, from the same data, that the three were never found together?

I.e. that heart disease, diabetes and overweight are not contracted independently. If they were, you'd expect some poor sod to get all three!

What do we mean by independence?

If you take 100 people at random out of the phone book and ask them whether they've got diabetes, you'll get a certain number, let's say it's 33.

Now let's take 100 people with heart disease and ask them. The answer's also 33.

It looks as though heart disease occurs independently of diabetes. Diabetes attacks one third of people, and whether you've got heart disease makes no difference as to whether you've got diabetes.

Same for overweight vs diabetes, and same for heart disease vs overweight.

Now if A is independent of B, B is independent of C, and C is independent of A, then surely the things are about as unconnected as they can get.

So how many people would you expect to have all three?
Well, a third of people get diabetes, a third of those are overweight, and a third of overweight people have heart disease.

Three threes are twenty-seven, so you'd imagine that one in every twenty seven people are unlucky enough to have all three.

But the real answer is that you have no idea.

Imagine three lights: red, green and blue.

Imagine that there's a mechanism that turns them on and off.

It decides, quite at random, whether it is going to light one, two, or no lights.

Once it decides that, then it decides, again completely at random, which light is going to be the odd one out.

One third of the time, the red light is on.
One third of the time the red light is on, the blue light is also on.

And since the situation is symmetrical, that's true blue vs red, red vs green, and green vs blue.

Which is exactly the situation we had above.

But the three lights are never all on at the same time.