From: John Conover <john@email.johncon.com>
Subject: Re: entropy spreadsheet
Date: 18 Dec 2000 07:43:44 -0000
Hi Jeff. Good question. The answer is that P, and thus G, have fractal characteristics, and measuring them has to take this issue into account. Any time one runs a metric on a fractal system, data set size is an important consideration-its not so much having a lot of data as it is data over a long enough time, (although the concept of self-similarity is how data over short time intervals and long time intervals are related-an extrapolation that is commonly used.) That's what the tsshannoneffective program is all about-it defines the minimum time interval for measuring a financial variable; additionally, it can make a quantitative statement about the risk of using a shorter interval, (the tsshannoneffective program is a cut-and-stick from the tsinvest sources, BTW, where their usage is controlled by the -c, and -C options.) In some sense, it is kind of a statistical estimation technique, (which is actually the default used in tsinvest-and can be disabled with the -c argument,) and a similar method which deals with run lengths of "bubbles," (which can be enabled with the -C argument.) Both methods use the error function, and I'll give an example of how it works-the run lengths of bull or bear times have a chance of continuing past n many days of erf (1 / sqrt (n)), which for n >> 1, is about 1 / sqrt (n). What this means is that if a bull, (or bear,) market has run 15 days, the expectations of it continuing at least one more day is about 25%. For 24 days, about 20%, and so on. Fractals are made up of "bubbles", (at all scales, too-it works for minutes, days, years, decades, etc.; kind of "bubbles" made up of "minibubbles", which in turn are made up of "microbubbles," and so on,") with these kinds of statistics, so one has to be concerned-as you are judging by the question you ask-about making a measurement of P, and, by serendipity, the measurement being misleading since it was made in a "bubble." I suppose you are considering a long term investment, (i.e., using P = ((avg / rms) + 1) / 2, e.g., the -d1 option to tsinvest, and not the "trader" arguments, -d4 and -d5,). Note that the chances of the "bubble" continuing at 350 days is also the chance one would take by betting on the value of P measured at 350 days, (its a subtle concept-think of it as how many times you would loose, doing the same "bet" in an iterated game-how would P have to modified to accommodate the times you lost do to data set size considerations,) so, I can multiply the two probabilities together to get a compensated, or effective, value of P. In other words, the value of P = 0.526, measured with a data set size of 350, would be known only to a factor of 1 +/- 1 / sqrt (350) = 0.946547752 to 1.05345224838, or the compensated, or effective value of, P would be between .497884117 and 0.554115885. (And, tsinvest would not bet on that, unless over ridden with the -D option, which requires P > 0.5, i.e., other stocks with a higher P, or a larger data set, or both, would be more desirable.) Note that, in some sense, it is kind of like a low-pass filter to keep tsinvest from "betting" on things where the metrics may have been distorted by being measured during a "bubble". Or, from tsshannoneffective, (using avg = 0.0016, and rms = 0.04, for a value of P = 0.52, for 350 days): john@john:~ 685% tsshannoneffective 0.0016 0.04 350 For P = (sqrt (avg) + 1) / 2: P = 0.520000 Peff = 0.401709 For P = (rms + 1) / 2: P = 0.520000 Peff = 0.518002 For P = (avg / rms + 1) / 2: P = 0.520000 Peff = 0.479763 and the last number is close, (about 18 parts in 500, or so,) to what we did in our head, above. However, note that the minimum time interval requirements for the metrics also depends on the value of P, too-a larger value of P will permit investing, (i.e., Peff > 0.5,) much quicker, for example, P = 0.6: john@john:~ 690% tsshannoneffective 0.04 0.2 40 For P = (sqrt (avg) + 1) / 2: P = 0.600000 Peff = 0.527700 For P = (rms + 1) / 2: P = 0.600000 Peff = 0.579606 For P = (avg / rms + 1) / 2: P = 0.600000 Peff = 0.500024 requires a data set size of only 40 days. Bottom line, tsinvest, using the -d1 option, didn't get suckered into the dot-com craze, since that is a long term investment command line option, and the numbers just were not there for that style of investment. However, the -d5 option, (which is a trading option that exploits short term market inefficiency at the daily level,) did quite well with the dot-coms because, unlike long term investments, volatility is desirable, and the market can be left quickly when day trading. So, it kind of depends on what one want's to do-its an engineered solution. John BTW, the above was kind of "watered down" as a tautology. In reality, the compensation techniques used in tsinvest/tsshannoneffective are a little more complicated since: P = ((avg / rms) + 1) / 2 and: G = (1 + rms)^P * (1 - rms)(1 - P) so not only does P have to be compensated for an effective value, but avg and rms too since G is what one wants to bet on. That is why the values of Peff are different for the 3 methods of calculating P in tsinvest/tsshannoneffective. As a note, I recently added a new last paragraph on http://www.johncon.com/ntropix/ to relate the historical perspective of the compensation techniques used in tsinvest/tsshannoneffective-they are not new, and were in the formalization to the Gaussian/normal bell done in the early 1700's. The sample-average in the repeated trial convergence is a fixed increment fractal, which was the essence of the derivation, (although de Moivre didn't know it.) Whether one utilizes this tidy bit of information to do statistical estimation, or the same thing as run length phenomena, is not material-they are both the same. Using the default method in tsinvest is statistical estimation; the -c -C uses the methodology of run-lengths, and ends up with the same answer. Its a conceptual issue, only. If you want to "play" with it, use the tscoins program to generate a time series, (use -p 0.51, which is a "typical" value for stocks on the US exchanges, as was used in http://www.johncon.com/john/correspondence/990204020123.28039.html,) of about a million days. Graph that information, and pick a big "bubble", that is about 10X from the average, (i.e., G^n.) Cut that "bubble" into a new time series, and see how the -c, -c -C, and -C options to tsinvest handles it with the -d1 option. Note that the value of P over this interval is quite high 0.55-0.6, and the duration of the "bubble" will be in years-a simulated dot-com scenario. Its an interesting concept that fractals can go 10X away from where they should be, for years. The bubbles-of-bubbles concept is a useful tautology. Jeff Haferman writes: > Very nice work Ron, and thanks a lot. > > Now, I would like to pose a question that I have pondered for > quite some time. Let me give an example: > > Consider symbol "LLTC". If I use data going back 60 days (eg > using Ron's spreadsheet, or tsinvest), I get values of approximately > P = 0.459 and G = 0.993 for the Shannon probability and gain, > respectively. > > If I go back 350 days for the same symbol, I get P = 0.526 and > G = 1.001. I know tsinvest can account for uncertainty due > to data set size, but as a practical matter, which set of > (P,G) should I "believe" for wagering purposes? > > Ronald McEwan wrote: > > > >Here is a spreadsheet with the formulas form John's emails. It includes a > >utility for downloading daily, weekly and monthly data from Yahoo. You > >will have to manually re-scale the y axis on the chart depending on the > >price range of what you are looking at. This spreadsheet only looks at 60 > >days worth of data. It should be easy enough to modify it for your own > >needs. > > -- John Conover, john@email.johncon.com, http://www.johncon.com/