Thursday, February 26, 2009

Information Theory and TA Indicators

Information theory is a branch of math that deals with determining how random something is. As such, it is useful to think about information theory when trying to find patterns in mostly random data (like stock prices).

In finance, the basic problem is determining what the next value in a time series is. With technical analysis, this problem is solved by looking for historical patterns that are similar to the current time series and seeing what the next value was after the match. In other words, TA is basically the practice of putting each day into a bin of quantitatively similar days and then prediciting that the next return will be close to the mean value of returns for that bin.

An important question to consider is how many bins to sort your data into. If you take the RSI indicator, it produces a number from 0-1. You can clculate that number to a fairly large number of decimal places. But realistically it won't matter past at most 2 (which is one reason it makes sense to scale it to 0-100). And it mostly won't matter beyond high, low, and neutral. You might be able to add very high and very low. But you don't actually have enough reliable information to split it into more than maybe 5 or 6 bins.

In information theory, a common way to express the amount of information you have is in bits. A bit basically represents a 50% chance of guessing what the information is when you haven't actually received it yet. More bits of information represent a lower probability of randomly guessing correctly. So with 2 bits of information, you would have about a 25% chance of guessing correctly. Note that in information theory, bits are not discrete, so it is perfectly valid to say you have 2.5 bits of information.

My feeling is that most technical analysis indicators give you at most 2-3 bits of information. That is equivalent to saying that you could divide the numbers into 4-8 bins where the different bins actually mean something, but that if you try to go further than that you aren't actually improving your sorting.

Furthermore, most TA indicators are fairly highly correlated, which means you can't improve your predictions much by adding more indicators. So if you have 3 oscillators operating on the same time scale which all provide 2 bits of information, you will probably get about 2.2-3 bits by using 2 of them and 2.5-3.5 bits by using all three of them. The reason for this is that if one of your oscillators is oversold, it is likely that the other two are as well (and so on). I would guess that the most information you can get from technical analysis is probably on the order of 4-5 bits. With 4 bits of information, you would expect to get the market direction right at least 80% of the time, which is more than enough to make money.

One problem that many system developers run into is that they pretend to have more information than they really have. One example is curve-fitting. Let's say you have an oscillator and you are choosing the period, overbought, and oversold levels. You are introducing about 5-10 bits of information based on what levels you choose. That makes you indicator appear to produce substantially more information than it really does. The best setting will produce phenomenal results, but they aren't likely to continue out of sample.

Another way to produce imagined information is by using overly complicated indicators. For example, it has been widely reported that you can make a decent trading system based on looking at whether today was an up day or a down day. This produces almost 1 bit of information (slightly less because it isn't a 50/50 split). If you look at 2 day patterns of up or down, you get about 1.8 bits, and you can make a better prediction. But if you keep adding more days, eventually you aren't adding useful information any more. If you go out to 20 days, you have about 16-19 bits of information, but most of it won't help you predict what will happen the next day. In fact I would expect the returns from the strategy to start dropping as you add more days past a certain point (probably 3-5 days).

One way of knowing when you have overestimated your information level is by looking at your sample sizes. 5 bits of information gives you 32 bins. You probably want at least 50 samples per bin, so you need about 1600 days to reliably get 5 bits of predictive information (and that's under ideal conditions). If you look at 20 days of up/down data, you have just over 1 million bins, so you need about 50 million days of data to expect your information to be reliable. When a system is curve-fit, varying the parameters just a little bit will move only a small number of observations from one bin to another. In terms of information, the two systems are different by only a small fraction of a bit. If that produces large changes in your system's performance, then you are using unreliable information to produce the extra performance. Larger sample sizes would increase the number of observations sorted differently, which would increase the information difference and increase your confidence that the difference is real.

Labels: ,

Tuesday, February 10, 2009

The Economy

For at least the past 20 years, the US and world economies have been based on borrowing money in order to spend. GDP growth has been based on the consumer class buying enough to keep everyone employed, but that required spending more money than the consumers were earning. Borrowing was done to make up the difference. This meant that the world economy could grow as long as bankers could come up with ways to lend out more and more money to consumers.

But eventually a debt-based economy will collapse. I keep seeing people say that the economy collapsed because of housing. But the housing market isn't the source of this trouble and fixing housing won't help. The housing bubble was created as a means of justifying writing such huge loans to consumers - the loans were considered safe because they were backed by houses. The mortgages didn't go bad because the housing market collapsed. The housing market collapsed because the banks couldn't come up with loans creative enough to push house prices higher while keeping the monthly payments low enough. In other words, the banks had pushed lending as far as it could go and even their insanely low lending standards were too restrictive to allow further lending.

So now we have an economy that relies on consumer spending for growth, but consumers are too scared of the future to spend any more than they have to. You don't want to see what the equilibrium point is for that. It involves everyone learning how to grow their own vegetables. And having enough land to grow their own vegetables. Which doesn't work very well for a society that has many people living in cities and suburbs.

This isn't a problem that can be fixed by getting the banks loaning money again. That is probably necessary in the short-term while the structural problems are fixed. And this isn't a problem that can be fixed with tax cuts. Tax cuts allow people who are earning money to keep more of it. But the problem is the people who aren't currently earning money, and the people who are concerned that sometime soon they won't be earning money. Cutting taxes doesn't help someone who is laid off. And people know that, so it has no psychological benefit of convincing people to continue spending while they still have a job.

If the government spends enough money to get us back towards full employment, then people will go back to spending. But there has been a large psychological shift now. People are not going to load up on debt the way they did before. And even if they were willing to, the banks aren't going to write loans as freely. We need to fix the income distribution problem. It is actually likely that raising taxes (particularly on the wealthy) will help solve the problem, by pulling money away from people who aren't spending it and transferring it to people who will spend it. Raising taxes on the wealthy will almost definitely be part of the long-term solution, but the reasons for that deserve a separate post.

The Senate has removed most of the stimulus from Obama's stimulus bill and replaced it with useless tax cuts. Hopefully the House will have the sense to insist on the stimulus being put back in (the tax cuts aren't terribly relevant right now, just a waste of money).

And for longer term solutions, we need to get to work on the income distribution problem. Which for the most part hasn't even been mentioned yet.

Labels: