Go to class

My head is about to explode and I like it. I’m piling on with distance-learning courses and I feel like a kid in college again. Staying up all night, getting through a Shakespeare play while cramming for an Organic Chemistry mid-term. This after having worked a four-hour shift at UPS loading boxes in the back of a truck. Ah yes, the good old days.

Of course I’m older now and have three school-aged kids. And a wife, a full-time job, a retarded dog (no offense, Boethius), a squeaky guinea pig and five small fish in a tank that is weeks past the time it needed to be cleaned. Also, I have to leave room for golfing with the boys and brewing beer with the daughter, which are important matters. (My daughter doesn’t really brew beer)

The courses I’m taking are mostly at coursera.com, a massive open-source online courses site hosted by a couple of Stanford professors. My current course list includes Andrew Ng’s Machine Learning, Eric Zivot’s Introduction to Computational Finance and Financial Econometrics and Daphne Koller’s Probablistic Graph Models. More or less, these cover graduate-level topics, but excellent teaching makes the material approachable.

I’m signed up for Neural Networks for Machine Learning (starting next week) and Financial Engineering and Risk Management, which starts in a few months.

Also, I’m taking some more programming courses to fill out my deficiencies there. Besides the excellent screencasts I subscribe to on Peep Code and Destroy All Software, I’m taking a C course by the guy who likes you to learn it the hard way. I’m also starting to noodle around with Julia (the language). It’s pretty much an open source Matlab. Unlike the likable Octave, it’s high-powered with built in support for C code and native parallel processing.

I’m thinking of exploring J too, but I may run out of time. It’s a fascinating little language that treats math like grammar. I saw a 30-minute presentation on it at Strangeloop and I guess that’s all it takes for me to start wandering off into the recesses of obscure languages.

Based on my current schedule, I’ll be shopping for brain coolant. Not sure it won’t overheat in the crucible of learning. (now where is that Organic Chemistry book?)

Beach bum for a week

If you want to do anything well, you have to do it often. Whether you’re offering trades, hacking code or flying airplanes. This week I’m working on becoming an expert beach bum.

This expertise, like many perishable skills, requires a sense of self. Do what you love and you will find yourself. There is nobody who is better at being yourself than you.

I’m on Hilton Head Island at the Shipyard plantation, and I am a beach bum. I bike to the beach (which is within walking distance really, but why work so hard?), I read third-rate paperback novels and I consume beach water. Beach water is an adult beverage made by a large conglomerate brewer. A unit of beach water is a 6-pack. Its logo is a number and I’m sure it’s marketed after two types of people. Those who care about calories (not me) and those who care about fibonacci numbers (me). You bring beach water to the beach and you leave it in the ocean.

My kids have already started complaining about the fact there there is ‘quote’ — too much sand at the beach. It gets everywhere, so I understand their discomfort. It’s not easy being a bum sometimes.

I’m not sure why the cast of the Sopranos and Cake Boss drive all the way down to lay in the sun and drink subversive margaritas in plastic cups, but they do. I’m not sure why I haven’t gotten stung by the mosquito of the sea (jelly fish) yet either. Some things you’ll never know. One thing I do hope to learn though is why an eccentric billionaire has hired a CIA agent to investigate Illuminati infiltration of the Vatican, and how ancient art and Soviet gulags fit in with this plot.

Next week I’ll return to coding, trading and flying airplanes. That’s who (I think) I am, this week is for pretending. But the irony is not lost on me about the time Charlie Chaplain entered a Charlie Chaplain look-alike contest. Of course he didn’t win. He came in third. Sometimes the best you can do is to almost become who you are.

You Buy It, You Own It

And if you drop it, you need to clean up the mess. This is what happens when central bankers decide that part of their mandate is to prop up equity markets. Of course it’s all in the name of fighting deflation, but we all know where it all leads.

Markets change their nature over time. As soon as we begin to recognize patterns, we see those patterns evaporate. This is one reason why it’s difficult to have any confidence in a trading system that has been “quote” — backtested. This is not to say that patterns will not repeat, but it helps to take a panoramic view of the market landscape and realize what higher-level macro forces are at play.

I feel for the central bankers who probably aren’t getting much sleep at night as they worry about the prospect of complete financial disaster, but I also hold them accountable for the moral hazard they had no problem creating. They own this market. The rallies are all theirs. And now the selloffs will be theirs too. Everyone is waiting to see what the owners of this market will do next. Nobody cares about the economy anymore. I’m not sure they ever did, quite frankly, but what’s different now is that everyone does care about the Fed’s next move. And what the next bailout will look like.

When you buy into something — be that an economic theory, a trades system or an idea — you own it. And it owns you. Just remember to clean up the mess when it fails you.

Nasdaq 100 going to 5,500 (in 14 months)

Alright, it’s a bold statement. And as you will see below, it’s based on the premise that this is an interaction, or a a “quote” — recurrence of 1999. But that’s what it is. Let’s start with exhibit A:

Nasdaq 1999

Wow, that’s a wicked weekly chart. Clearly oversold, right. Well that was 1999. Let’s see where we are now. Stage hands: Exhibit B!

Nasdaq 2012

This is where we are now. Not to say history repeats itself, but if it ever did what do you suppose would happen next? Well if you are guessing a great meltdown, think again. Stage hands, please present Exhibit C!

Nasdaq big down

Conclusion: We have room to go.

Get Used To It

This is a brain teaser. You’ve been warned.

The S&P 500 is in a bear market (defined as the 50-day MA being below the 200-day MA) 30.8% of the time. Also, the S&P 500 has experienced single-day 4% declines 0.242% of the time. Of the times we experienced single-day selloffs exceeding 4%, 75.7% of the time we were in a bear market.

How much more likely are we to experience single-day losses exceeding 4% now that we’re in a bear market, compared to the likelihood of such an event in a bull market?

Odds are you stopped reading this puzzle after the first three sentences. In the unlikely event that you’re still with me, I’ve created a link to some code that holds the answer.  It’s a prolog script whose knowledge-base is derived from data-mining in R.

Trading Model Derby

This past year was the first year I made two models. That’s because my youngest son has joined his older brother in Cub Scouts and the annual ritual of Pinewood Derby Car building became twice the fun.  I’ve been “quote” — helping — my oldest for two years now, so I’ve got some experience. I’ve learned quite a bit from watching how others have built their cars. I look at both winners and losers to form my theories on how a model should be constructed. Losers typically have some piece of Lego attached to it, creating a pretty car with catastrophic drag. The winners tend to be modestly built (not over-engineered, mind you) and properly weighted.

This year as I had both models resting on a table near my kitchen, my neighbor came over and decided to make some comments. This is the neighbor who takes great pleasure in contradicting everything I say. I could say “the sky is blue” and he would invariably respond with “the sky is not blue.” I call him Null Hypothesis. I take pleasure in watching his silly comments laid waste, but sometimes it takes more effort than I’m willing to expend at the moment. This year he proclaimed “those cars will lose, they are too simple.” I could not prove him wrong on the spot, but the die was cast. I responded “they will not lose, but you will lose. These cars will win first place!” It would take racing day results to reject my neighbor’s proclamation.  I couldn’t wait.

While it’s true that I’m a Cub-Scout Pinewood Derby car-builder dad by day, I’m also a algorithm trading model builder by night. I’ve got the Bumblebee, Gnat, Monkey, Pelican and a whole host of others sitting on the shelf. These vary in construction from simple trading-rule algos to more sophisticated predictors that implement radial-basis function support vector machines. I do suffer from  an ambitious curiosity as to how others are building their models though, so I tend to look around and survey the landscape often.  I’ve lately glanced over to a group of builders called Econometricians. No, not electricians. I mean practitioners of the black art of Econometrics.

Econometrics is the science, art and voodoo of building financial models. They are typically frequentists so I found it important to familiarize myself with their methods and terms. When they build their models, they save the wood chips for analysis. Yes, they are indeed a serious group.

One of their favorite modeling techniques is called regression. All types including simple linear regression, polynomial and multiple linear regressions. Regression is basically the process of fitting a line or curve to describe the relationship between a dependent variable and an independent variable or variables. Kind of like the return tomorrow of the S&P 500 will have a relationship with the return of the range of the VIX today. Something like that.

Building the model is not the difficult part. It’s analyzing the model and the wood chips (commonly referred to as disturbances) where things can get a little complicated. If the wood chips are too cozy with each other, as in they are serially correlated, there’s a problem. If the goodness of fit (R-squared) is low, there’s a problem. If the F-statistic isn’t large enough or the p-value isn’t small enough, there’s a problem. You spend most of the time trying to prove that the opposite of what you’re trying to show is false. I can sort of relate to this, with my neighbor being who he is. But I know what my neighbor looks like, a 50-something male with kids already out of the house. They don’t tell you up front who their pesky neighbor is so you need to listen carefully. Actually, you don’t have to listen because I’ll just tell you. It’s the hypothesis that there is no relationship, or that the slope of the regression is zero.

It isn’t good enough to reject this null hypothesis of course, but that’s the minimum. I’m going to take a shot soon at this wily art and build myself my own model or two. I’m expecting the first few attempts to fail miserably because this process includes a laundry list of assumptions that I can’t keep track of ranging from expected mean of errors is zero to explanatory variables must be non-stochastic. I really can’t tell you if anyone ever races one of these models though, so I can’t tell you how well they will perform.

You have to race your model eventually, and on race day back at Cub Scouts I was able to prove my neighbor Null Hypothesis wrong. My boys’ cars didn’t lose. They won a couple of heats. I wasn’t really right to say they would win first place because they didn’t (silly dad forgot to sand off the burrs on the nail axles!), but at least my neighbor Null Hypothesis was rejected. Next year, the Den Leaders have agreed that dads will be able to build their own dad-only cars for a special race. The only rules are no thermonuclear powered devices. I can’t wait.

vRoom vRoom : Speeding up R with C

Many times you don’t want to trouble friends for help with menial tasks like moving furniture. But sometimes you need to step out and ask. Your friends are always happy to help, and after the heavy lifting is done you see how easy it can be. R likes to move furniture. It’s okay with moving a small table across the room, but when you need to bring a large sofa up three flights of stairs, it’s time to ask for help. Your best friend is C.

The basic idea behind speeding up R with a C script is to write a C script, compile it, load it into the environment (ask it over to the house, so to speak) and call it from R with a built-in R function. There are some choices here but for our test case, we’re going to use the .C() function.

A couple ground rules first. The C function must be of type void (does not return nothing) and needs to accept pointer arguments. You define the argument in R as an object, and it gets passed as a pointer to the C function. R objects are basically pointers anyway so this is not terribly difficult. Let’s illustrate before this becomes a lecture on pointers and objects.

Here is the code for an R function that uses an R loop to print out permutations of a double loop. It’s the guts of a brute force search but only returns a string saying that it was able to locate and print each permutation. You can either copy and paste it into R or save it and source() it.


twoloop <- function(){
for(i in seq(10, 50, 1 ))
  for(j in seq(80, 240, 4))
cat("Parameter one is", i, "and Parameter two is",j,"\n" )
}

We are stepping from 10 to 50 in increments of 1 (40 events) and also from 80 to 240 in steps of 4 (40 events again). The permutation total is those two multiplied, which comes to 1600. The size of a large sofa basically. R can do it, but it takes some time because of the explicit looping. Here is the tail of the output along with performance statistics. The system.time() function is used to calculate performance.

system.time(twoloop())…
Parameter one is 50 and Parameter two is 224 
Parameter one is 50 and Parameter two is 228 
Parameter one is 50 and Parameter two is 232 
Parameter one is 50 and Parameter two is 236 
Parameter one is 50 and Parameter two is 240 
   user  system elapsed 
  0.070   0.031   0.139  

To get some help from C, we write the following script:

#include < R.h >
#include < stdio.h >

void twoloop(int *startOne, int *stopOne, int *stepOne, int *startTwo, int *stopTwo, int *stepTwo)
{
int i,j;
for(i = *startOne; i < *stopOne+1; i = i + *stepOne)
for(j = *startTwo; j < *stopTwo+1; j = j + *stepTwo)
Rprintf(“Parameter one is %d and Parameter two is %d\n”, i, j);
}

Notice the pointer arguments. Also, notice we include R.h header file. Save this script as twoloop.c and then from command line, use the following instruction to compile it.

R CMD SHLIB twoloop.c

This creates a new file called twoloop.so that will be loaded into R with the following function inside of an R session.

dyn.load(“twoloop.so”)

Great, now we use the .C() function to call the function. But first we need to create some R objects. There are six variables so here is what we’ll do.

a <- 10
b <- 50
c <- 1
p <- 180
q <- 240
r <- 4

Now we simply pass those objects in as integers. Remember, that is what the C function is expecting so let’s not mess around. It is here to help us after all.

.C(“twoloop”, as.integer(a), as.integer(b), as.integer(c), as.integer(p), as.integer(q), as.integer(r))

Of course if you’re like me, you run the function first because you’re so excited to see if it actually works. It does and then you remember that you were supposed to time it. No worry here, hit the up arrow to prompt R to display the last command (the monstrosity above) and hit Ctrl-A to get to the beginning of the command. Insert system.time() like thus:

system.time(.C(“twoloop”, as.integer(a), as.integer(b), as.integer(c), as.integer(p), as.integer(q), as.integer(r)))

Here is the tail of the C function’s output along with its performance.

Parameter one is 50 and Parameter two is 224
Parameter one is 50 and Parameter two is 228
Parameter one is 50 and Parameter two is 232
Parameter one is 50 and Parameter two is 236
Parameter one is 50 and Parameter two is 240
   user  system elapsed 
  0.002   0.002   0.017 

Nice. Time to crack a beer on the new sofa. With our best friend C.

Chop, Slice and Dice Your Returns in R

I have a knife rack on my kitchen wall with all my kitchen knives easily identifiable and accessible. I also have small scars on my hand where each knife can claim to have left a mark. It’s not the knife’s fault, of course. They hardly like being suddenly dropped and cursed at. They have no control over who gets picked on a given day. The choice is really mine.

What’s good in the kitchen is good at the trade desk. We like choice as traders. We choose markets, trading styles and excuses for our sub-optimal performances. On those occasions when we need to crunch some numbers, we also like some choice. More than any other curve-fitting software, R is best suited for trading precisely because of its “quote” – diversity.

In fact, I’m sure this is on purpose. The Unix geniuses did a lot of thinking about software design and architecture when they designed their operating system. They even came up with a set of rules, one of which is the Rule of Diversity. This states that one must distrust all claims to one true way.  Many commercial packages cannot do this of course, as they are confined to a monolithic vision of how things get done. R does this well.

This is a double-edged kitchen knife though and some care must be taken when choosing what tool you want to use for preparing your algorithm mis-en-place. Suppose you’re interested in calculating price changes for your favorite, useless metal, silver. Price change or percentage change? Already with the choices. We are going to use percentage change over price change for our illustration of R’s diversity.

We’ll keep it simple and get the daily closing prices of the silver ETF known as SLV, managed by JP Morgan of course, and isn’t that ironic? In any case, we’ll avoid the crazy split that happened back in 2007 and just get the prices for the year 2010. I’m going to require three packages in this example, Jeff Ryan’s quantmod,  Joshua Ulrich’s TTR and Brian Peterson’s PerformanceAnalytics. TTR actually automatically loads with quantmod (as does xts and zoo) so you don’t need to specify it. But I’m going to do it anyway. We’ll illustrate diversity right from the get go.

require(“quantmod”)require(“TTR”)require(“PerformanceAnalytics”)
Now we get the SLV prices into our environment. Two ways. I usually don’t give the function’s surname, but I will for now because it adds clarity later on.
     quantmod::getSymbols(“SLV”)  #getSymbols(“SLV”) is equivalent

Now to simplify our demonstration of returns, let’s index out the closing price only, and look at prices for  2010.

SLV <- SLV[,4]
SLV <- SLV[“2010”]
There are several permutations of that approach that are suitable, and probably some that aren’t suitable but still work. Here is the head of data that we have as a result:

           SLV.Close
2010-01-04     17.23
2010-01-05     17.51
What?!? Silver was trading at $17?  I could have gotten it that cheap and now it’s trading what, north of $45? It’s like Netflix, Amazon and Lulu. Combined. And I was going to get a gazillion sleeves from the Maple coin makers north of the border. Well, it’s too late now. Isn’t it. But I digress.

Now we set ourselves upon the task at hand. To calculate the percent change from one day to the next. I know everyone loathes to do it this way, and would much rather calculate the actual dollar and cents amount change, but percent returns are more tractable when we decide to get serious about our statistical pursuits. I’m going to use the default settings for four functions for the illustration. And each one will populate it’s own epynomous column in our matrix.

SLV$Delt             <- quantmod::Delt(Cl(SLV))  
SLV$dailyReturn      <- quantmod::dailyReturn(Cl(SLV))
SLV$ROC              <- TTR::ROC(Cl(SLV))
SLV$CalculateReturns <- PerformanceAnalytics::CalculateReturns(Cl(SLV))

Now, the head of our data again:

            SLV.Close    Delt   dailyReturn          ROC  CalculateReturns
2010-01-04  17.23          NA   0.000000000           NA                NA
2010-01-05  17.51 0.016250725   0.016250725  0.016120096       0.016120096

Alright, well we certainly have diversity don’t we? Let’s take one line at a time. For January 4, 2010, only dailyReturn put a value in that row, and the others returned NA. And for January 5, 2010, we have Delt and dailyReturn with the same number, but different from the number that ROC and CalculateReturns share. Hmmm. As it turns out, there are two schools of though on this topic. Do you want simple returns, where you simply take yesterday’s price minus today’s price and divide it by yesterday’s price, or do you want the log of that equation? Log, you’re kidding right? No actually logarithms are no joking matter. There will be no laughter when logarithms come into the room to perform their magical tricks. Only awe.

Natural logarithms of returns are nice because you can add them up in their log-ness and then take the result and un-log it to get the correct result. That result being the total percent change from one date to the other. This comes in handy for calculating a monthly or annual return. Try it out yourself, it’s quite cool actually. Don’t forget to un-log though, by way of the exp(my_log_return) function.

You cannot add simple returns, but must multiply them. If your column has zeroes in it, well we have a problem that the add folk don’t need to deal with.  I like logs better, even though you have to put your returns into a spacesuit on a temporary basis.

Each function has a default on this issue and an option to change it to the other way of doing things. Here is a truncated parameter list for each function. Notice the diversity in how we define simple returns versus logarithmic returns.

Delt(x1,type = c(“arithmetic”, “log”))
periodReturn(x, type=’arithmetic’) # log would be “log”
ROC(x, type=c(“continuous”, “discrete”))
CalculateReturns(prices, method=c(“compound”,”simple”))

This adds up to more thinking on your part in the end, but you’ll be fine. Make yourself a sandwich and begin contemplation at your leisure. Just don’t cut yourself while doing it.

Recursive Trading System in R

I have a trick knee. Normally, it works just fine. But if I stand on my head when its raining on Tuesdays and Thursdays and pinch my nose, it hurts. Not just a little. It hurts a lot. I went to the doctor and he told me not to stand on my head when it’s raining on Tuesdays and Thursdays and pinch my nose. I left the office a little dejected, realizing I have limitations. When I got back to the lab, I confided my story to White Bumblebee. It didn’t say anything, but I could sense some empathy. As if it were telling me that sometimes it has bad days too.

Algorithms sometimes need to just sit still, kinda like you and I. They’re always sending signals, so it’s not an easy task. One way to settle down a wild algorithm is to program in a sit-down-and-relax statement. The equivalent of a “quote” — go flat instruction.

You can determine when to go flat based on current algorithm performance. Look at the equity curve as a stock and determine if it’s in an uptrend or downtrend. If it’s on a good run, take whatever signal is generated. Otherwise, ignore the signal.

I tried this out on several different equities and the results are mixed. Sometimes it helps, other times not so much.  The following chart is for IWM. Notice how it had less severe drawdowns with the filter and realized better returns.

EDITOR NOTE: These charts reflect the results of code that had an error in them. This error was brought to my attention by Rahul Savani in the comments section. I have changed the code, but not the following charts. 

iwm

Now a chart for the same system on TLT, a less-volatile ETF. Makes you wonder if standing on your head on when it’s raining on Tuesdays and Thursdays and pinching your nose is really all that bad after all.

tlt

Here is the R code in about 20 lines. (code edited May 25, 2011 to remedy error caught by reader Rahul Savani)

require("quantmod")   
                                                           
getSymbols("TLT")   
                                       
TLT$fast  <- BBands(( TLT[,4]), n=10, sd=0.5)                                            
TLT$slow  <- BBands(( TLT[,4] ), n=30, sd=0.5)                   
TLT       <- na.omit(TLT)                                                                

signal    <- ifelse (TLT$mavg > TLT$up.1, 1,                                          
              ifelse(TLT$mavg < TLT$dn.1, -1, NA))                                                                                                           
signal    <- na.locf(signal, na.rm=TRUE)                                                
                                                                                 
returns   <- na.omit(dailyReturn(Cl(TLT))*Lag(signal))                               
                                                                                 
equity    <- cumprod(1+returns)                                                       
                                                                                 
equity_10 <- SMA(equity)                              
equity_20 <- SMA(equity, n=30)                        
equity_TA <- na.omit(merge(equity_10, equity_20))     

recursion <- na.omit(merge(signal, equity_TA))

SIGNAL    <- ifelse(recursion$equity_10 > recursion$equity_20, recursion$mavg, 0)
SIGNAL    <- na.omit(SIGNAL)

RETURNS   <- na.omit(dailyReturn(Cl(TLT))*Lag(SIGNAL))    
                                                         
EQUITY    <- cumprod(1+RETURNS)       

plot(equity, main="white bumblebee")
plot(EQUITY, main="with equity curve filter")

A Super-Easy, Simple-Dimple Backtester in R

I cut my finger on a paring knife this morning. Don’t use a sharp knife to spread butter on your toast. It’s better to limit yourself to using dull kitchen utensils until the caffeine kicks in. No matter, I still have most of my digits to type in a simple backtesting program in R. Good thing it’s only about 15 lines of code.

require(quantmod)

getSymbols("GLD")

for(i in seq(5, 15, 5))
  for(j in seq(50, 80, 15))

{
    GLD$fast     <- SMA(Cl(GLD), n=i)     
    GLD$slow     <- SMA(Cl(GLD), n=j)         

    golden_cross <- Lag(ifelse(GLD$fast > GLD$slow, 1, -1))
    golden_cross <- na.locf(golden_cross, na.rm=TRUE)
        
    coin         <- ROC(Cl(GLD))*golden_cross
    best_coin    <- max(coin)
    worst_coin   <- min(coin)
    last_coin    <- cumprod(1+coin)[NROW(coin),]  
    
    annual_coin   <- round((last_coin-1)*100, digits=2)/(NROW(coin)/252)

    cat(i,j,annual_coin, best_coin, worst_coin, "\n", file="~/goldcat", append=TRUE)  
    cat(i,j,annual_coin, best_coin, worst_coin, "\n")  
}