Information Theory and TA Indicators
Information theory is a branch of math that deals with determining how random something is. As such, it is useful to think about information theory when trying to find patterns in mostly random data (like stock prices).
In finance, the basic problem is determining what the next value in a time series is. With technical analysis, this problem is solved by looking for historical patterns that are similar to the current time series and seeing what the next value was after the match. In other words, TA is basically the practice of putting each day into a bin of quantitatively similar days and then prediciting that the next return will be close to the mean value of returns for that bin.
An important question to consider is how many bins to sort your data into. If you take the RSI indicator, it produces a number from 0-1. You can clculate that number to a fairly large number of decimal places. But realistically it won't matter past at most 2 (which is one reason it makes sense to scale it to 0-100). And it mostly won't matter beyond high, low, and neutral. You might be able to add very high and very low. But you don't actually have enough reliable information to split it into more than maybe 5 or 6 bins.
In information theory, a common way to express the amount of information you have is in bits. A bit basically represents a 50% chance of guessing what the information is when you haven't actually received it yet. More bits of information represent a lower probability of randomly guessing correctly. So with 2 bits of information, you would have about a 25% chance of guessing correctly. Note that in information theory, bits are not discrete, so it is perfectly valid to say you have 2.5 bits of information.
My feeling is that most technical analysis indicators give you at most 2-3 bits of information. That is equivalent to saying that you could divide the numbers into 4-8 bins where the different bins actually mean something, but that if you try to go further than that you aren't actually improving your sorting.
Furthermore, most TA indicators are fairly highly correlated, which means you can't improve your predictions much by adding more indicators. So if you have 3 oscillators operating on the same time scale which all provide 2 bits of information, you will probably get about 2.2-3 bits by using 2 of them and 2.5-3.5 bits by using all three of them. The reason for this is that if one of your oscillators is oversold, it is likely that the other two are as well (and so on). I would guess that the most information you can get from technical analysis is probably on the order of 4-5 bits. With 4 bits of information, you would expect to get the market direction right at least 80% of the time, which is more than enough to make money.
One problem that many system developers run into is that they pretend to have more information than they really have. One example is curve-fitting. Let's say you have an oscillator and you are choosing the period, overbought, and oversold levels. You are introducing about 5-10 bits of information based on what levels you choose. That makes you indicator appear to produce substantially more information than it really does. The best setting will produce phenomenal results, but they aren't likely to continue out of sample.
Another way to produce imagined information is by using overly complicated indicators. For example, it has been widely reported that you can make a decent trading system based on looking at whether today was an up day or a down day. This produces almost 1 bit of information (slightly less because it isn't a 50/50 split). If you look at 2 day patterns of up or down, you get about 1.8 bits, and you can make a better prediction. But if you keep adding more days, eventually you aren't adding useful information any more. If you go out to 20 days, you have about 16-19 bits of information, but most of it won't help you predict what will happen the next day. In fact I would expect the returns from the strategy to start dropping as you add more days past a certain point (probably 3-5 days).
One way of knowing when you have overestimated your information level is by looking at your sample sizes. 5 bits of information gives you 32 bins. You probably want at least 50 samples per bin, so you need about 1600 days to reliably get 5 bits of predictive information (and that's under ideal conditions). If you look at 20 days of up/down data, you have just over 1 million bins, so you need about 50 million days of data to expect your information to be reliable. When a system is curve-fit, varying the parameters just a little bit will move only a small number of observations from one bin to another. In terms of information, the two systems are different by only a small fraction of a bit. If that produces large changes in your system's performance, then you are using unreliable information to produce the extra performance. Larger sample sizes would increase the number of observations sorted differently, which would increase the information difference and increase your confidence that the difference is real.
In finance, the basic problem is determining what the next value in a time series is. With technical analysis, this problem is solved by looking for historical patterns that are similar to the current time series and seeing what the next value was after the match. In other words, TA is basically the practice of putting each day into a bin of quantitatively similar days and then prediciting that the next return will be close to the mean value of returns for that bin.
An important question to consider is how many bins to sort your data into. If you take the RSI indicator, it produces a number from 0-1. You can clculate that number to a fairly large number of decimal places. But realistically it won't matter past at most 2 (which is one reason it makes sense to scale it to 0-100). And it mostly won't matter beyond high, low, and neutral. You might be able to add very high and very low. But you don't actually have enough reliable information to split it into more than maybe 5 or 6 bins.
In information theory, a common way to express the amount of information you have is in bits. A bit basically represents a 50% chance of guessing what the information is when you haven't actually received it yet. More bits of information represent a lower probability of randomly guessing correctly. So with 2 bits of information, you would have about a 25% chance of guessing correctly. Note that in information theory, bits are not discrete, so it is perfectly valid to say you have 2.5 bits of information.
My feeling is that most technical analysis indicators give you at most 2-3 bits of information. That is equivalent to saying that you could divide the numbers into 4-8 bins where the different bins actually mean something, but that if you try to go further than that you aren't actually improving your sorting.
Furthermore, most TA indicators are fairly highly correlated, which means you can't improve your predictions much by adding more indicators. So if you have 3 oscillators operating on the same time scale which all provide 2 bits of information, you will probably get about 2.2-3 bits by using 2 of them and 2.5-3.5 bits by using all three of them. The reason for this is that if one of your oscillators is oversold, it is likely that the other two are as well (and so on). I would guess that the most information you can get from technical analysis is probably on the order of 4-5 bits. With 4 bits of information, you would expect to get the market direction right at least 80% of the time, which is more than enough to make money.
One problem that many system developers run into is that they pretend to have more information than they really have. One example is curve-fitting. Let's say you have an oscillator and you are choosing the period, overbought, and oversold levels. You are introducing about 5-10 bits of information based on what levels you choose. That makes you indicator appear to produce substantially more information than it really does. The best setting will produce phenomenal results, but they aren't likely to continue out of sample.
Another way to produce imagined information is by using overly complicated indicators. For example, it has been widely reported that you can make a decent trading system based on looking at whether today was an up day or a down day. This produces almost 1 bit of information (slightly less because it isn't a 50/50 split). If you look at 2 day patterns of up or down, you get about 1.8 bits, and you can make a better prediction. But if you keep adding more days, eventually you aren't adding useful information any more. If you go out to 20 days, you have about 16-19 bits of information, but most of it won't help you predict what will happen the next day. In fact I would expect the returns from the strategy to start dropping as you add more days past a certain point (probably 3-5 days).
One way of knowing when you have overestimated your information level is by looking at your sample sizes. 5 bits of information gives you 32 bins. You probably want at least 50 samples per bin, so you need about 1600 days to reliably get 5 bits of predictive information (and that's under ideal conditions). If you look at 20 days of up/down data, you have just over 1 million bins, so you need about 50 million days of data to expect your information to be reliable. When a system is curve-fit, varying the parameters just a little bit will move only a small number of observations from one bin to another. In terms of information, the two systems are different by only a small fraction of a bit. If that produces large changes in your system's performance, then you are using unreliable information to produce the extra performance. Larger sample sizes would increase the number of observations sorted differently, which would increase the information difference and increase your confidence that the difference is real.
Labels: indicators, information theory