Monday, 10 October 2011

Mean, median, mode... geometric average?

From Acadametrics:They helpfully add a table which solemnly list the following:
Mean price (i.e. 'average' in common parlance): £230,950
Median price: £188,300
Mode price: £150,000
Geometric average: £196,300
Minimum price £37,250
Maximum price £8.9 million


According to Wiki, the correct term is geometric mean: "It is similar to the arithmetic mean, except that the numbers are multiplied and then the nth root (where n is the count of numbers in the set) of the resulting product is taken", I didn't realise that anybody had a calculator big enough to work that out, but helpfully, Wiki adds that "the log of the geometric mean is the arithmetic mean of the logs of the numbers", which is probably not too difficult, i.e. you average out the logarithms of all selling prices and then do ten to the power of whatever the result is (assuming you took the base ten logarithm in the first place, it doesn't seem to matter which base you use).
--------------------------------------
Either way, I had always assumed that the 'average' UK house price was more in the region of £160,000 to £180,000. If it is indeed £230,000, then we can promptly revise down my fag packet estimate of the full-on LVT rate we'd need to replace all existing taxes from 'about 8%' to 'about 6%' of current selling prices, and that's assuming that pensioners' main residences are quite simply exempted completely (because I'm sick and tired of hearing about Poor Widows In Mansions, I'd rather just cave in and move on, and frankly, I'd rather the Poor Widows enjoyed the rental income than all these Homey-BTL-banker types).

Without such a political compromise, the rate would be more like 4.4% (assuming that we double Business Rates and that a council house or Housing Association house is worth on average half as much as a privately owned one etc etc).

27 comments:

James Higham said...

Long time since I did mean, median, mode. In fact it was in university doing eco stats. I understood that but not accounting. In my accounting exam, I drew up a balance sheet and put a debit column on the left and a credit on the right.

In the left I put all I knew about accounting and in the right - all I should have known. In the balance I wrote - doesn't balance.

Mark Wadsworth said...

JH, this whole accounting and bookkeeping malarkey is much easier if you forget about "debit" and "credit" and focus on "to" and "from".

That deals with your trial balance and regular bookkeeping, then after that you sort it into "money that went out or came in during the year but we're not sure what there is to show for it" and "assets and liabilities still existing at the end of the year".

With a bit of luck, the surplus "money in minus money out" is equal to "increase in net assets" and Bob's your uncle.

Anonymous said...

Pretty chart;but, is that the prices paid for homes sold or just the asking price for homes for sale?

I rather doubt that there are many million pound homes being sold.

Gary K.

Mark Wadsworth said...

GK, they just recycle prices paid per HMLR:

"The above graph shows the distribution of the prices paid for all houses purchased in England and
Wales for the period April – June 2010, as recorded by the Land Registry. "

dearieme said...

What purpose might be served by calculating the geometric mean price? Why not calculate the Harmonic Mean while they're at it? Or the logarithmic mean of the median and the mode? Endless fun for obsessives, but to what end?

Woodsy42 said...

I think using any of mean, median or mode is potentially misleading because house prices do not conform to any sort of simple or symetrical distribution. There is a floor price, most houses are within (say) 3 times that floor price, but there is effectively no maximum, so some houses are hundreds of times more than the floor. These are of no interest to most buyers, but nevertheless skew the results significantly.
An instructive example was once given to me of what happens when there is a large low data bunching and a high outlier.
Fred runs a business and employs 9 people. He pays all his employees 11k per year each and pays himself 100k.
Mean salary is 19,900
Modal salary is 11k
Median salary is virtually impossible to calculate - there is a complex formula for gaps with uneven weightings and it would give about 17.5 if I remember it correctly
My point being that none of those numbers are helpful, the mean is especially useless becaue 90% of the people barely get half that.
You really need to confine your calculations to only the lower price range of housing stock, excluding gross outliers.

Richard Allan said...

Woodsy42, I calculate the geometric mean of the numbers you've given as 13.72 which means the geometric mean is significantly better than the other examples you've cited.

If you're lucky, the geometric mean will be mentioned at least once during your economics course, when they tell you that price indices are bollocks because they use the arithmetic and not the geometric mean, except Tornqvist/Divisia which no-one uses anyway.

Richard Allan said...

Actually I hate to make lazy posts like this but basically everything Woodsy42 has said is totally wrong. For instance the median of the numbers he gave isn't "virtually impossible to calculate... about 17.5", it's just 11. Arrange the points in order and pick the middle one, or the mean of the middle two. Even if you treat his numbers as a sample of a wider probability distribution, say for example a lognormal distribution, the median is £13717 and the mean is £17079. I mean your figure for the median is higher than the figure for the mean, in a distribution which is massively skewed in the opposite direction, sanity test right there!

Anonymous said...

Woodsy:Median salary is virtually impossible to calculate - there is a complex formula for gaps with uneven weightings and it would give about 17.5 if I remember it correctly

??? The Median is the value in the middle (person number 5 and 1/2 in this instance - house number 78811 and 1/2 in the house price sample) if you put them all in order which makes it 11k. I have no idea what calculation you're thinking of.

My point being that none of those numbers are helpful, the mean is especially useless becaue 90% of the people barely get half that.

The mode looks pretty useful to me...in any case, the answer is not to simply ignore data that is 'troublesome'.

You really need to confine your calculations to only the lower price range of housing stock, excluding gross outliers.

You need some methodological basis to start excising data.

DeariemeWhat purpose might be served by calculating the geometric mean price?

From scanning the article, they mention that the Land Registry uses the geometric mean. Why the LR think it's more useful, I don't know.

Mark Wadsworth said...

D, see what RA says.

W42, what I'm really interested in is the total value of all UK housing, because that is the tax base if you are looking at LVT (subject to adjustments).

It's no different with income tax, they don't decide the income tax rate based on the average income and ignore outliers, they decide how much tax they want to raise (£x), tot up all incomes to find the tax base (£y) and then divide x by y to give the tax rate.

RA, F, ta for back up.

We appear to be dealing with a Pavlovian reflex here, the post is mainly about means, but I give a mention to LVT at the end, so people pile in and tell me that all theses ways of calculating means are a load of rubbish.

Rational Anarchist said...

I was always told that if you want to reasonably exclude outliers, just take the full data set, work out the mean and standard deviation, then exclude any values that are more than two standard deviations from the mean and recalculate averages.

For the data given by Woodsy above, this would give an adjusted mean, median and mode of £11k (as you'd only exclude the person on £100k) which thus represents the bulk of the data.

Anyone know where we can find source data for the house prices so we can try the same?

(not that I'm entirely sure what we aim to achieve...) :-P

Ian B said...

So, you're actually advocating a Property Value Tax, not a Land Value Tax, then?

Robin Smith said...

Nice work. But once again you are showing an effect in the current context not the new enhanced one. You are discounting your own good work. . .

You forgot to mention that productivity will increase in unison with the rent for tax swap.

That will increase rent up to a new equilibrium. I have no science for this except to say it will increase for certain.

Whats more wages will also increase in proportion instead of decrease as the rent rises.

This means folks will suddenly find they can start work for themselves instead of an employer if they choose.

Those today on welfare, willing and able to work, yet unable to due to high tax and rent, will start working and stop welfare.

Further still employers will be competing with each other for staff, instead of staff competing with staff competing all wages towards welfare levels. Wages would rise further still.

Thus proving the fallacy that jobs are created by entrepreneurs or government.

Thank you for providing the data behind this Good News. Its not a pipe dream. Discount it though is a sign of submission to modern day slavery.

Mark Wadsworth said...

RA, try emailing Acadametrics or HMLR. I'm not too fussed as to what the 'average' is, I'm interested in the total.

IanB, nope. I used to say that, but what I actually said above was: "that is the tax base if you are looking at LVT (subject to adjustments)", I've explained how to adjust available data (house prices, rebuild costs, plot sizes, existing Council Tax bills etc) to arrive at a fair stab at relative land rental values for each smaller area (such as each postcode sector).

See also my reply to RS.

RS, amen to all that, but that's not the point here.

The point is that people often ask me "That's all well and good, but how much would my LVT be?" and because I don't know the precise answer for 27 million different houses or flats, I think it's only fair to express it as an approx. percentage of current selling prices to give people a rough idea.

So if I say 'about 6%' and your house is currently worth £400,000, then your LVT bill will be 'about £24,000 a year' (plus/minus twenty per cent - it might be a splendid mansion on a small plot in a rough area, in which case it'll be less; it might be a tiny bungalow on a massive plot overlooking the park and near the station etc, in which case it'll be more).

Either way, we know perfectly well that any tax on (or subsidy to) housing (however calculated) always primarily reduces (or increases) the selling price or rental value of land and has hardly any impact on the construction cost or rental value of the actual bricks and mortar (why would it?).

Woodsy42 said...

"in order and pick the middle one, or the mean of the middle two."

No it isn't Richard, that simplification only strictly applies if the median is in a 'symetrical' data gap.
When one side has multiple values and the other side has has a single value there are weighting rules, I honestly can't even remember them, I researched it years ago to write a program that satisfied a statistician, and it was a bastard job.

, sanity test right there!
Exactly! The numbers don't demonstrate what you expect.

I also know that in some research circles when dealing with spread data they sometimes throw away data that lies more than 3 standard deviations from the simple mean as 'artifact' and then recalculate. Don't blame me, I didn't make the rules, but have written the software to do it. That would do what I suggested - disregard very high value properties. While those do affect Mark's tax valuations, they don't affect most people saving for a deposit, hence different treatments for different purposes.

Mark Wadsworth said...

W42: " in some research circles when dealing with spread data they sometimes throw away data that lies more than 3 standard deviations from the simple mean as 'artifact' and then recalculate."

For sure, when we are trying to work out the average rental value in a smaller area i.e. a postcode sector, a shopping centre, whatever, we'd ignore unusually expensive/high rent or unusually cheap/low rent houses/shops, that is a separate topic, there's all sorts of skullduggery going on with mortgage fraud, over- or underdeclared rents (because you are either trying to con the bank or con HM Revenue & Customs), some house are sold quite legitimately for much less than their market value between connected parties etc.

But what struck me was the £230,000 figure.

Old BE said...

£230k sounds about plausible. When I was contemplating appealing my Council Tax band when I first bought my flat I did some research into what average property values were in 1991 (wasn't it?) compared with now and guesstimated what my flat would have been worth. It turned out that my flat sold at about that time for about the same as the 1991 mean and again in 2007 (to me) at about the 2007 mean and my current fag-packet guess is that I could sell it for about £230k.

Whew that was a long sentence.

That would give me an annual tax bill of just under £14k which is about as much as I pay now under PAYE and NI but would save me loads because CT and VAT would be gone!

Bring it on.

Richard Allan said...

Woodsy42: "No it isn't Richard, that simplification only strictly applies if the median is in a 'symetrical' data gap."

Whatever, anyone can check the wikipedia article on "median" and see for himself, so I won't continue to argue this with you. I mean, that has to be another sanity test for you, "Is my knowledge of the subject less than the average idiot who contributes to wikipedia?"

And don't be the "exactly" man, like in a Dilbert cartoon. If I point out that your numbers fail a simple sanity test, you don't get to say "exactly" to try and make it look like you've proved me wrong and not vice-versa.

Also Mark, I am planning to get back to you with worked examples of VAT vs. corporation tax.

Mark Wadsworth said...

B, splendid, assuming £230,000 is the true average, under a full-on LVT with exemptions for pensioners, your LVT bill would be +/- £14,000 a year, from which knock off your Citizen's Income of about £3,500 a year.

RA, W42, this is all getting a bit heated :-(

It's more of a philosophical point, which is the 'truest' average, it all depends on the data we are looking at.

In other words, for house prices, "mode" is probably nonsense, as it depends how you band prices, but "mode" is very important when looking at how many feet people have and deciding "how many socks to include in one packet of socks".

Anonymous said...

RA: "in order and pick the middle one, or the mean of the middle two."
W42: No it isn't Richard, that simplification only strictly applies if the median is in a 'symetrical' data gap.

The 'simplification' strictly applies whenever you have a finite sample size. You only need to calculate the median (indeed, it's only *meaningful* to calculate the median) when your sample size is infinite. If your statistician friend had a finite sample size whatever he asked you to calculate was NOT the median.

Derek said...

Fraggle is right about the finite set simplification. My guess is that Woodsy was being asked to use a finite number of samples drawn from a continuum to effectively estimate the shape of its curve and then base the median on that.

That would make sense for something like median daily temperature where you measured the temperature several times during the day with varying intervals between the measurements instead of measuring the temperature continuously. If you measure every 50ms then you can use the simplification. If you measure 8 times spread out through the day at random intervals, you're going to need the more complex method.

Mark Wadsworth said...

F, D, I'm not sure you could measure temperature "continuously", or certainly you could, but how would you record the data? You'd have literally infinite data points for every single second of the day. There's no spreadsheet big enough to cope with an infinite amount of numbers (I don't think, anyway).

Unless there is indeed such a thing as a 'shortest unit of time' which I once read about somewhere, in which case would the readings still be continuous?

Derek said...

Well, fair enough you can't measure "continuously", which is why I gave an example of measuring every 50ms which is about as near to continuously as is practically possible. The point is that you are trying to track the curve as closely as possible and if you only have a few points at odd intervals through the day, you may need to do more maths to estimate the curve before calculating the median, depending upon just how accurately and/or precisely you want to estimate it.

Anonymous said...

But houses aren't ALL land there's commercial property too.

AC1

Anonymous said...

'shortest unit of time' is the Plank time. The time it takes light to travel the Plank Distance.

AC1

Anonymous said...

Planck DOH!

Mark Wadsworth said...

AC1: "But houses aren't ALL land there's commercial property too"

Correct, my draft budget requires about £300 billion from land values, Business Rates is already £25 billion so let's double that for luck and raise £250 billion from the 3/4 of housing which isn't pensioners' sole residences, call it £4,000 billion at current selling prices = about 6%, job done. This would give us much the same £/sq yard for commercial and residential.