I made a recent discovery about National Sheep Improvement Program (NSIP) numbers which I felt was worth some thought and graphing. I’d been wondering for a long time about these little “acc” numbers that come back with each score you get. It stands for accuracy, of course, and is expressed as a percentage. I asked around, so what’s considered a good accuracy value? I could never really get a concrete answer. People would just say, well, higher is better, and your accuracies will get higher the longer you’re in the program. Well, that part is easy enough to deduce. What I wanted to know was, what’s the threshold of tolerable accuracy, below which, I might as well just use a roll of the dice to pick my breeding stock? Or just go back to eyeballing them and picking the nice looking ones?

I couldn’t find a well-articulated answer that satisfied my curiosity. This made me a little nervous, wondering if the whole kit and caboodle is just terribly inaccurate and not worth betting on. What if you need a gigantic flock with hundreds, or thousands, of data points to make the data reliable? But, I had to imagine that the researchers who developed NSIP believed that it was worth doing for most of us. So there had to be some explanation.

I finally found one! Recently, a link was shared to this article, written by Reid Redden at North Dakota State. Inside this article is the magic information I was seeking: a correlation between accuracy values and confidence intervals. These numbers will move around over time, as more data is accumulated in the system; so the numbers quoted in the article are just a snapshot in time. But this was written in October, 2012, so fairly recent. It’s good enough to get a general feel for how trustworthy the data really is, at least as it applies to me (your results may vary Winking smile).

I should note here that I am not yet using the NSIP program to its full advantage. I still have a fairly small number of sheep overall (34 ewes last year). I use too many rams (four on that set of 34 ewes). Ideally each ram should have a sample size of about a dozen ewelambs and a dozen ram lambs each season. Sometimes I fall below that. And sometimes I end up splitting the “contemporary groups” even smaller, due to management deltas: castrating some of the ram lambs, raising some bottle lambs, weaning some early for sale, separating hermaphrodites into their own groups, etc. I have reasons for doing these things, but have to accept that it lowers my accuracy values in NSIP. On the plus side, I do some close line-breeding, and I have some sheep from other NSIP flocks. Both increase the correlations between relatives, thus increasing accuracy.

I’ll skip over the math, you can email me if you want to know how I came up with this. But using the values from Reid’s chart, here is analysis of six potential breeding rams I had last fall, and their Estimated Breeding Value (EBV) for Post Weaning Weight (PWWT). I have calculated the 95% confidence case here, so there is only a 5% chance of being wrong to bet on these spreads (any gamblers out there?). I had known for some time that the scores for ram #3 were not looking too hot. But knowing there is some error in the values, how to decide if he truly does lack potential compared to the other rams? Well, this box plot makes it pretty straightforward:


What we can see here is that even at best case, ram #3 is only hovering around the median score of the other rams, and his worst case is lower than all of them. Except #6, which is his son. But his son benefits from his mother’s good genes, and from his own good growth performance, so has a higher top-end potential.

Some bands are wider than others: this reflects the amount of data on each ram, and thus, the uncertainty of the prediction. Rams 2 and 4 have less data to bank on (both around 60% accuracy values), and rams 5 and 6 have not yet had any lambs (both have ~55% accuracy values); so the range of prediction is wider. Rams 1 and 3 have >75% accuracy values, because they have the most lambs and relatives in the system.

All of the bands are still pretty wide, indeed reflecting that I still have quite a bit of uncertainty due to small sample sizes. The bands will tighten up as these rams are in the system longer. If these are good rams, their scores will also start to rise: sheep with unknowns start out with zero scores, reflecting the assumption that they are average. So already most of these guys are inching upwards, implying they are at least above average. But poor #3, he has been inching downward, showing, over time, an increasing confidence that he is subpar.

If you were to turn this graph on its side and bell curve each ram’s range potential scores (with his current score being the median, or highest confidence case), you get this:


Ram #3 is the green curve furthest to the left. So statistically he is least likely to give me good PWWT lambs in the future, as compared to the other rams.

The bottom line is: when you look at these graphs, if you are picking which rams to keep or let go, it’s clear that #3 is headed for the freezer. And that’s exactly where he wound up. Even with “only” a 75% accuracy value, his PWWT score is low enough to keep him out of the running. Comparing the other rams becomes a little trickier, since their median scores are clustered closer together, but this is OK. Usually when choosing replacement breeding stock, we are only dropping animals from the low end and replacing them with animals on the high end. So the big lump of similar-scoring animals in the middle is status-quo, and not something that needs further analysis.

The interesting thing is, all of these rams are what I would consider “nice looking” rams. They all have desirable pedigrees. If I were picking them at a sheep show, I may have trouble distinguishing them. Even if I tried to look over all the progeny data from them (98 lambs and counting), I would have had difficulty picking out this trend. I would just get confused, comparing lambs out of all different dams, years, and all different rearing scenarios- singles, twins, triplets. #3’s lambs always seemed nice enough. But the computational math illuminates factors I cannot spot with visual appraisal of the rams themselves, or notice by trying to mentally survey a huge pile of data. It eliminates the variables of dam, dam age, rearing type, lambing year, and sex; just telling me what I need to know: can this guy’s lambs grow??

It’s also worth noting that this is only looking at one trait, and there are multiple traits to consider. But in this case, this ram scored low on  a lot of traits, so he was a clear choice for replacement. I should also say that I picked the worst-case metric on which to do this analysis. The bell curves for BWT, WWT, and NLW are much narrower, (meaning the prediction error is smaller), so much more confident.

This little exercise convinced me that, yes, this data really is statistically trustworthy. Even in a small flock with a shepherd who uses too many rams! If I keep making generational breeding selections based on the data, over time, I will shift my bell curves in a positive direction for traits I want to improve. (In fact, the two bell curves furthest to the right are 2011 and 2012 rams, so I can see I’m already making generational progress with only four years’ worth of data in the system.) This data-driven selection will help me outperform other breeders who are not using data. It’s easy to envision how, if this ram had won at some shows, or was especially friendly or had splashy markings, another breeder might have picked him as her top ram, over the others. And subsequently her PWWT bell curve would be going in the wrong direction!

As I figured, upon analysis, data wins!