Surveys & Truth: A Lesson From November 8th

By Steve Fajen

I’m going to catch a bit of flak for this, but here goes…

We all learned a hard lesson on the night of November 8th this year.  Nearly every single election poll had Hillary Clinton winning by a sizeable margin.  The gold standard, Nate Silver’s 538 website, which mashes hundreds of state by state polls into their algorithm had her winning with 302 electoral votes, 32 more than needed.  They predicted Michigan, Wisconsin and Pennsylvania would all go her way.  None of them did.  While they predicted Florida and North Carolina were too close to call, when pressed, they assigned them to Clinton. They went for Trump as well.  Their model projected that there was only a 10.5% chance that Clinton would win the popular vote and lose the Electoral College. To make matters worse, they had already adjusted for likely voters, so turnout should hardly have affected their conclusions.  Despite being so wrong this time, like everyone else, in 2008 and 2012, 538 got every single state and DC right in their model.  Not since Truman beat Dewey in 1948, were the polls this wrong.  For nearly seventy years now we have been trained to believe in surveys, which made the ultimate result even more shocking.

My question is very simple.  If the most sophisticated and redundant survey systems available got a simple binary choice wrong (forget the third party candidates please), how can we trust the surveys we use in business (which are much more complicated) to make decisions about where to invest media and marketing money and what the intended result will be?  Maybe it’s time for a little review on surveys.  Let’s take a deep breath.

We begin by understanding what is at stake when we depend on surveys to give advice to clients and take action on their behalf.   We risk the money we invest, our company’s reputation, our own credibility, client relationships and perhaps even our jobs.

So we start with the measures of risk associated with surveys.  Generally speaking they are called confidence intervals or conversely ranges of error.  These measures assume the following about the samples they measure.  The samples are supposed to be perfectly random yet representative of the population, questions are presumed to be perfectly phrased and airtight, and respondents are truthful.  Nonetheless, some samples we use are overstuffed, overburdened, overblown, oversold, overused, under-sampled and imperfect.  Every single survey we use has at least one or more of these characteristics, but we choose to ignore them.  I understand why.  I understand that it costs too much to solve many of these problems and that our surveys, in many cases, represent the only game in town.  This is not an indictment of research companies.  It is, however an hard look at how we use their product.  That said given what just happened on November 8th, I think its time to take a closer look at our understanding of surveys.

Generally speaking clients want surveys to reflect reality, while agencies that understand the nuances of sampling are content with consistency, and the media just want big numbers.

A number of years ago (late 90s) I took three random measures and compared sample results to actual sales of products.  You can do this in any year, with any products and get the same results – MIXED!

  • Survey said 22.7 million TV sets were sold that year. The actual count was 22.2 million.  Not bad!
  • Survey said 6.3 million computers were sold. Actual sales that year were 11.0 million.
  • Survey said 2.5 million Hondas were sold. Actual sales were 832,000.

Maybe we should look more closely at margins of error.  The factors that affect the reliability of samples are the quality of the sample, truthfulness of respondents, market size, demo size, size of the category being considered.

Even if all these considerations were perfect, sampling error merely tells you that if you went out and re-sampled 100 times you would get a fairly consistent result within the margin of error.  With that in mind, that error levels measure consistency (that approximates reality), let’s look at a few results.  We will use a two sigma level of assurance, which is another way of saying if you went out and sampled another 100 times, 95 times out of 100 you would get a consistent result in the range given.

On broadcast network TV a huge rating of 15% against a big demo like adults 25-54 is between 13.8% – 16.2%.  That’s a swing of 1.2% or a variance of 8% on the original 15% rating.  (1.2/15.0 = 8%).  Not bad.

However a 3% rating against a smaller demo like women 18-24 varies from 2.3% – 3.7%.  That’s a 27% variance on the 3% rating (+ or – 0.7/ 3,0 = 27%).

On lower rated cable stations the variance can be more than 75% of the original rating.  In small TV markets, with low rated shows, the variance can be as much as 99% of the original rating.  Sure, these numbers improve with higher rated programs, dayparts, stations and demos, but remember the only thing reflected here is the statistical theory that the sample approximate reality in a perfect world.

Put another way, samples do not have the veracity of a complete census.

You can do this error range exercise with CPMs and CPPs.  In a smaller market like Louisville, in late fringe, for example, a cost per point of $45 has an error margin of $45.

Wow.  Remember, we programmatically negotiate for pennies.

Okay, I know what critics will say.  When you gross up an entire schedule, the confidence level increases and ranges of error decrease.  So too when discussing reach and frequency.  When I discussed this with the late Erwin Ephron (THE MEDIA GURU) he liked to say it was “akin to regressing to the mean.”  Like when you love a restaurant the first time you go and then revisit only to find that the experience is not quite as good.  The experience gets watered down.  So too, it does with large schedules reducing extremes.

However, to make matters worse, start to think about the small numbers associated with micro-targeting and digital initiatives.  Frustrating, isn’t it?

So Trump is President-Elect and we still use the same ratings systems.  What should we do to avoid the trap of November 8th in the way we use samples?

  1. Talk openly about confidence levels and what they mean. Get a firm understanding of the risks involved with the survey you are using.  Double the error margin to 95% confidence and add just a bit for the vagaries of reality vs. consistency.
  2. Don’t be afraid to be anecdotal. Sometimes a story is worth a thousand numbers.
  3. Find ways to relate survey results to real life consequences, like sales. Draw relationships to return on investment.
  4. Summarize findings in a readable, interesting and digestible form that senior executives will have the patience to read on a busy day.
  5. Try to confine your surveys to categories and demos that are sizeable enough to offset major error levels.
  6. When you encounter smaller segments, regroup them to form larger units.
  7. Pay careful attention to trends. They tell you a lot.
  8. Remember that digital initiatives generally can be expressed in actions that have already taken place, thus removing the predictive nature of surveys.
  9. Try not to make business decisions based solely on results from a sample. Find other criteria as well.  Try to triangulate your way to a conclusion.
  10. Find an advisor who knows how to use surveys productively and can explain then to you in simple English.

There is no silver bullet on this subject.  Research houses do a wonderful job of trying to reflect reality with the money made available to them by the people who us their service and within the boundaries of sampling science.  Now it is up to us not to be fooled by the whimsy of chance.

Originally published in Media Village