From: John Conover <john@email.johncon.com>
Subject: Re: entropy spreadsheet
Date: 18 Dec 2000 08:07:31 -0000
BTW, this stuff is quite useful, and can be done in your head. See: http://www.johncon.com/ntropix/FAQs.html#calculation for 2, (soon to be 3,) years of prognostications about the markets. I could use no data, use no programs, spreadsheets, or a calculator-all I could do was look at the graphs for the US exchanges on http://www.stockmaster.com, where I worked at the time, (and the only time scale I could use on the graphs was the time scale used for the prognostication.) I had to work out the square root function in my head, (which I don't have to do any more, since the last thing I did there was to make the graphs available on log scales.) There is some magic about fractal systems. For example, the duration of civilizations follows a fractal agenda, see: http://www.johncon.com/john/correspondence/980116192623.28859.html which was used to predict the daily P and G that we seen in the US markets in the last half of the twentieth century. Its rather amazing when one considers that the 2 minute ticker and daily values for the variables used to calculate P are related by the square root function, (most programmed trading programs depend on it,) and the daily values relate to about 7,000 years of civilization, too, to within reasonable bounds, (about a +/- 7.5% accuracy.) That's a 10 order-of-magnitude scalability. John BTW, what's the chances that the "new economy" is a bubble? It's lasted 8 years, so far, so it has a chance of one in three of continuing through the end of 2001, (1 / sqrt (9)). John Conover writes: > > Hi Jeff. Good question. > > The answer is that P, and thus G, have fractal characteristics, and > measuring them has to take this issue into account. > > Any time one runs a metric on a fractal system, data set size is an > important consideration-its not so much having a lot of data as it is > data over a long enough time, (although the concept of self-similarity > is how data over short time intervals and long time intervals are > related-an extrapolation that is commonly used.) > > That's what the tsshannoneffective program is all about-it defines the > minimum time interval for measuring a financial variable; > additionally, it can make a quantitative statement about the risk of > using a shorter interval, (the tsshannoneffective program is a > cut-and-stick from the tsinvest sources, BTW, where their usage is > controlled by the -c, and -C options.) > > In some sense, it is kind of a statistical estimation technique, > (which is actually the default used in tsinvest-and can be disabled > with the -c argument,) and a similar method which deals with run > lengths of "bubbles," (which can be enabled with the -C argument.) > > Both methods use the error function, and I'll give an example of how > it works-the run lengths of bull or bear times have a chance of > continuing past n many days of erf (1 / sqrt (n)), which for n >> 1, > is about 1 / sqrt (n). > > What this means is that if a bull, (or bear,) market has run 15 days, > the expectations of it continuing at least one more day is about 25%. > For 24 days, about 20%, and so on. > > Fractals are made up of "bubbles", (at all scales, too-it works for > minutes, days, years, decades, etc.; kind of "bubbles" made up of > "minibubbles", which in turn are made up of "microbubbles," and so > on,") with these kinds of statistics, so one has to be concerned-as > you are judging by the question you ask-about making a measurement of > P, and, by serendipity, the measurement being misleading since it was > made in a "bubble." > > I suppose you are considering a long term investment, (i.e., using P = > ((avg / rms) + 1) / 2, e.g., the -d1 option to tsinvest, and not the > "trader" arguments, -d4 and -d5,). Note that the chances of the > "bubble" continuing at 350 days is also the chance one would take by > betting on the value of P measured at 350 days, (its a subtle > concept-think of it as how many times you would loose, doing the same > "bet" in an iterated game-how would P have to modified to accommodate > the times you lost do to data set size considerations,) so, I can > multiply the two probabilities together to get a compensated, or > effective, value of P. > > In other words, the value of P = 0.526, measured with a data set size > of 350, would be known only to a factor of 1 +/- 1 / sqrt (350) = > 0.946547752 to 1.05345224838, or the compensated, or effective value > of, P would be between .497884117 and 0.554115885. (And, tsinvest > would not bet on that, unless over ridden with the -D option, which > requires P > 0.5, i.e., other stocks with a higher P, or a larger data > set, or both, would be more desirable.) > > Note that, in some sense, it is kind of like a low-pass filter to keep > tsinvest from "betting" on things where the metrics may have been > distorted by being measured during a "bubble". > > Or, from tsshannoneffective, (using avg = 0.0016, and rms = 0.04, for > a value of P = 0.52, for 350 days): > > john@john:~ 685% tsshannoneffective 0.0016 0.04 350 > For P = (sqrt (avg) + 1) / 2: > P = 0.520000 > Peff = 0.401709 > For P = (rms + 1) / 2: > P = 0.520000 > Peff = 0.518002 > For P = (avg / rms + 1) / 2: > P = 0.520000 > Peff = 0.479763 > > and the last number is close, (about 18 parts in 500, or so,) to what > we did in our head, above. > > However, note that the minimum time interval requirements for the > metrics also depends on the value of P, too-a larger value of P will > permit investing, (i.e., Peff > 0.5,) much quicker, for example, P = > 0.6: > > john@john:~ 690% tsshannoneffective 0.04 0.2 40 > For P = (sqrt (avg) + 1) / 2: > P = 0.600000 > Peff = 0.527700 > For P = (rms + 1) / 2: > P = 0.600000 > Peff = 0.579606 > For P = (avg / rms + 1) / 2: > P = 0.600000 > Peff = 0.500024 > > requires a data set size of only 40 days. > > Bottom line, tsinvest, using the -d1 option, didn't get suckered into > the dot-com craze, since that is a long term investment command line > option, and the numbers just were not there for that style of > investment. However, the -d5 option, (which is a trading option that > exploits short term market inefficiency at the daily level,) did quite > well with the dot-coms because, unlike long term investments, > volatility is desirable, and the market can be left quickly when > day trading. > > So, it kind of depends on what one want's to do-its an engineered > solution. > > John > > BTW, the above was kind of "watered down" as a tautology. In reality, > the compensation techniques used in tsinvest/tsshannoneffective are a > little more complicated since: > > P = ((avg / rms) + 1) / 2 > > and: > > G = (1 + rms)^P * (1 - rms)(1 - P) > > so not only does P have to be compensated for an effective value, but > avg and rms too since G is what one wants to bet on. That is why the > values of Peff are different for the 3 methods of calculating P in > tsinvest/tsshannoneffective. > > As a note, I recently added a new last paragraph on > http://www.johncon.com/ntropix/ to relate the historical perspective > of the compensation techniques used in > tsinvest/tsshannoneffective-they are not new, and were in the > formalization to the Gaussian/normal bell done in the early > 1700's. The sample-average in the repeated trial convergence is a > fixed increment fractal, which was the essence of the derivation, > (although de Moivre didn't know it.) Whether one utilizes this tidy > bit of information to do statistical estimation, or the same thing as > run length phenomena, is not material-they are both the same. Using > the default method in tsinvest is statistical estimation; the -c -C > uses the methodology of run-lengths, and ends up with the same > answer. Its a conceptual issue, only. > > If you want to "play" with it, use the tscoins program to generate a > time series, (use -p 0.51, which is a "typical" value for stocks on > the US exchanges, as was used in > http://www.johncon.com/john/correspondence/990204020123.28039.html,) > of about a million days. Graph that information, and pick a big > "bubble", that is about 10X from the average, (i.e., G^n.) Cut that > "bubble" into a new time series, and see how the -c, -c -C, and -C > options to tsinvest handles it with the -d1 option. Note that the > value of P over this interval is quite high 0.55-0.6, and the duration > of the "bubble" will be in years-a simulated dot-com scenario. > > Its an interesting concept that fractals can go 10X away from where > they should be, for years. The bubbles-of-bubbles concept is a useful > tautology. > > Jeff Haferman writes: > > Very nice work Ron, and thanks a lot. > > > > Now, I would like to pose a question that I have pondered for > > quite some time. Let me give an example: > > > > Consider symbol "LLTC". If I use data going back 60 days (eg > > using Ron's spreadsheet, or tsinvest), I get values of approximately > > P = 0.459 and G = 0.993 for the Shannon probability and gain, > > respectively. > > > > If I go back 350 days for the same symbol, I get P = 0.526 and > > G = 1.001. I know tsinvest can account for uncertainty due > > to data set size, but as a practical matter, which set of > > (P,G) should I "believe" for wagering purposes? > > > > Ronald McEwan wrote: > > > > > >Here is a spreadsheet with the formulas form John's emails. It includes a > > >utility for downloading daily, weekly and monthly data from Yahoo. You > > >will have to manually re-scale the y axis on the chart depending on the > > >price range of what you are looking at. This spreadsheet only looks at 60 > > >days worth of data. It should be easy enough to modify it for your own > > >needs. > > > -- John Conover, john@email.johncon.com, http://www.johncon.com/