There are lies, damned lies, and there are statistics: About the 1000 pitfalls of statistic analyses in the markets (4/2005)

For many statistics seems to be dull and incomprehensible which is a matter of opinion but it's undisputed that in the markets you only have probabilities and nothing but probabilities, that's why the understanding of statistic probabilities is essential for everyone. Thinking in statistic categories is rather difficult and requires hard training and a lot of discipline, however it is crucial for your market success!

Sometimes I get inquiries of people (especially beginning or would-be financial astrologers) who are looking for the 100%-method. My answer is always the same: someone who wants to have 100% profits (and no losses) will almost always lose 100% of his capital. The reason is obvious: if you only expect profits and never losses you will not "waste" a thought on money management, yet without money management it is only a question of time until the capital is completely gone.

I wrote this article because most of my colleagues (financial professionals) do make appalling errors (that happened to me at the beginning, too) that I want like to explain so that the next time you are able recognize them at a glance and understand the factual significance of a statistic analysis. Finally I would like to touch the wide field of the statistic manipulations. I try to keep the explanations as simple as possible so they are understood by as many readers as possible.

The basic problem: the human perception and psychology

Good statistic analyses are absolutely necessary because the human psyche simply is not objective at all and subject to enormous distortions. The human perception is mainly anecdotal and not statistical, therefore anecdotal "proofs" are normally given much more weight if they are the result of pure chance or many distortions. Innumerable social psychology studies are confirming that.

Example: Suppose you want to buy a car and have read in a magazine that after 10,000 tests the brand XY was by far the winner in the category that interests you. You already want to buy the car but then your neighbor tells you about problems with this vehicle. It would be rational to conclude that the sample size has risen to 10,001 cars which changes nothing at all, however most people react "irrational" in such situations and give too much weight to little stories from their personal environment (and don't buy the car).

Some of my colleagues produce quite good and useful statistic analyses yet the justify other forecasts only with their experience. When I examined that with a quantitative approach I mostly arrived at the disappointing result that the claimed correlation did not exist at all! It doesn't play a role if this happens in a market or astrological environment, the very deceitful human perception is a phenomenon deeply rooted in each of us; and even frequent statistical workings are not a guarantee for overcoming the basic problems of the human perception.

Historical (observed) and predictive probabilities

Often analyst statements read approximately like that: "The indicator X caused  the outcome Y in 4 of 5 times which translates into a probability of 80%." This number is nearly always given without a further comment so the reader implicitly gets the impression that he can expect this indicator to predict the market with a probability P=80%. Sometimes it is even explicitly claimed that the probability of this outcome in the future (predictive probability) equals 80% which is sheer nonsense.

With such a small sample (in the example N=5) one has to factor in the strong influence of random errors that's why the actual prognostic probability is almost certainly much smaller than the observed probability. You can't say exactly how large this divergence is as that depends on many factors, it could however be as low as 20-30%! Needless to say that it makes a huge difference for your trading and your purse whether the odds are 80% or 20%.

This error being committed by so many analysts (even of some prominent and appreciated ones) is very alarming and embarrassing for the whole profession. Admittedly, with rising number (N) the divergence between the predictive and the observed probability becomes smaller and smaller, for example if the indicator X predicted Y in 800 from 1000 cases then the predictive probability of this indicator will not be much less than 80%. However, so large samples are very rare as normally one works with 10

Is 1+1 really 2? The connectedness of the universe

The arguments above are still in full agreement with conventional statistic theories, but the next point is much harder to digest. I again begin with an example: suppose an astrological and a technical indicator at the same time forecast the same event with a predictive probability of P=70% each. Since the logic says that both indicators "must" be independent because they are deduced completely different (technically and astrologically) the error probability according to the Bernoulli experiment is P = 0,3 * 0,3 = 0,09 = 9%. In reality however, I have found out after 1000-2000 hours of research that the observed error probability is clearly higher, perhaps about P=20%.

I could not understand that divergence for a long time and at first tried to find the error in my calculations until I realized that it's the fundamental assumptions of our causalistic-mechanistic world view are incorrect! The typical example for the Bernoulli experiment is chuck-farthing, i.e. the probability to throwing heads or tails is - theoretically - always 50%, and it doesn't matter if there was 10 times heads or 10 times tails in a row. The same applies to the numbers in the casino. However, these assumptions do not consider that the universe is one pulsing organism where everything is connected with everything else, and the independence of events is a construction that actually doesn't exist.

That's also the reason why astrology and numerology do work. A well-known European astrologer made a fortune in the casino since he knew that certain constellations and "energies" (for example, if there is a quarrel on the table) are correlated with certain numbers (when I was a student I managed to earn an additional income with similar methods).

The consequences of this philosophical insight are unusually far-reaching and mean that simply following the text book is wrong but one must try to quantify this "universal interconnectedness effect", for if you don't do this you get unrealistic high (wrong) odds.

The Amanita models like the polarity model do factor this in with the aid of estimated parameters (as the exact predictive probabilities can't be calculated just based on past observations).

Soft and hard statistic manipulations

As we are exposed to statistic manipulations very frequently I want to explain when which manipulations are still acceptable and when they aren't.

In science, manipulations should not occur at all, however, soft manipulations still do happen quite often. In an economic and business context they are adequate to support arguments (e.g. in marketing), and sometimes they are even necessary. The selection of the database for statistic analyses is the most frequent form of soft manipulations, e.g. nobody will protest if a commercial only mentions the - statistically provable - advantages of a product and not comes up with a "total product analysis in relation to competitor products" (impossible).

An example from my own entrepreneurial practice: the Amanita trades were officially started on 11/23/2001 which resulted in questions whether this date was arbitrarily selected to make the performance look better. So I changed the official start date to "since 1/1/2002" as this date looks "unsuspicious" and since that time I have never received a similar feedback.

Hard manipulations are all methods that are pure fraud, e.g. many official economic parameters, particularly from the USA. Hard manipulations, even if cleverly hidden and hard to detect, are never acceptable.