Pricing Login
Interactive demos

Click through interactive platform demos now.

Live demo, real expert

Schedule a platform demo with a Sumo Logic expert.

Start free trial
Back to blog results

September 9, 2020 By Sumo Logic

Can We Rely on Data to Predict the Outcome of the 2020 Election?

Data increasingly means “people,” down to the preferences of each consumer and voter. That’s why data is so valuable to political campaigns. With good data, you can predict things that haven’t happened yet.

Campaigns leverage data to analyze voters and forecast election results at any given stage, so that campaign staff knows where to focus efforts, for example, in places like swing states.

But what happens when people don’t give you any data? That’s part of what transpired in 2016 when nearly everybody, including the side that won, thought the election was going to go the other way based on polling data.

However, it’s not just that polling data didn’t give an accurate view of voters. Decisions were made based on the data, and that can be costly in a presidential election when you’re wrong.

In this outstanding episode of Masters of Data, we discuss just this subject with political data analyst David Shor, and the episode bears another look as we barrel into November 2020.

Why Everybody’s Looking for a Golden Report

Eight years ago, analyst David Shor found himself in the Cave, a windowless room where he and colleagues crunched numbers for the 2012 Obama re-election campaign. Polling alone wasn’t enough to be certain. His mission was to help create a reliable forecasting system that could aggregate poll results and other voter profile data.

His team produced the Golden Report, which took support, turnout and persuasion scores, vital data points for the campaign, and turned them into insights that staff could take action on.

As Shor describes it, the Obama data-gathering strategy was “basically this big multilevel Bayesian model that took into account all of the information we had available, public polls, our private polling program, the IDs we were getting on the ground, and [it] synthesized all of that to estimate our probability of winning the election, our probability of winning in every state.”

The end result was that Obama won, and the Golden Rule’s predictions were within a tenth of a percent of the actual election results. “We got every battleground state down to within a point,” Shor adds. “We called so many people, it was basically no sampling error.”

Why Didn’t Data Get the 2016 Presidential Election Right?

Bad polls get most of the blame, but the truth is a little more complicated. It turns out that trust was at the center of the 2016 election. Many voters simply didn’t trust campaigns and were selective in the information they shared with data collectors and polls. Some didn’t respond at all, which led to a non-response bias.

In 2012, Shor’s team had a 12% response rate on phone calls, trending toward older voters who are more likely to answer the phone.

“Going up to 2016,” Shor says, “We saw a decline in response rates from about 12% to about 0.8%, and so when you're at 12%, you can do a bunch of statistical adjustments. You can ask about age and gender and number of people in the household.”

Polls work when people answering surveys are statistically interchangeable with voters not answering the poll. You can’t do that with a 0.8% response rate.

Looking back at the Golden Report, the reason it was so accurate was that it was the largest political campaign polling effort at the time. “Since that, I haven't seen anything like it,” Shor says. “I think, in total, we polled about a million and a half people in 2012. By the end of the election, we were doing something like 1,000 people per battleground state per day.”

That wasn’t the case in 2016, and bad data can snowball into what Shor believes was the real reason no one got the election right.

Sound Campaign Decisions Require Sound Data

Unrepresentative data isn’t dangerous just because it’s inaccurate. It also impacts every decision you base on it.

In 2012, Shor says the Golden Report was “used as an input for basically every single decision that the campaign made... This was really the first time I think [for] any political campaign that basically every single decision that got made first went through the analytics team. And that goes for like big questions, like how much money do you invest in a state, or how much are we spending on mail versus television versus digital?”

In 2016, bad data led the Clinton campaign to miss investing money in states like Wisconsin and Michigan where it could have made a difference.

How Campaigns Are Using (or Abusing) Data in 2020

The biggest difference this year is that election campaigns are finding ways to get around social media platforms and connect directly with supporters.

The Republicans

Downloaded 1.4 million times, the Trump 2020 campaign app is designed to scoop up huge amounts of data on each user, including phone number, full name, device ID, email address, zip code, and location data by accessing your phone’s Bluetooth.

Users are encouraged to share the app with their contacts, where the app’s News and Social tabs allow the campaign to selectively feed content to voters, a source of legitimate news and misinformation.

The Democrats

Downloaded 64,000 times, the Biden app also gathers data from users — you’re prompted for your contact list right off — but it’s much less data than the Trump app mines. Instead, the Biden app is designed to build “relational organizing” — a less pushy way to get volunteers to reach out to their networks and persuade voters on a friend-to-friend level.

It was a successful strategy in both Obama presidential campaigns, and more recently in the Bernie Sanders campaign, where volunteers sent out personalized SMS messages.


As you’ll hear in the episode, David Shor sees data and the underlying technologies that analyze data as the crystals in the crystal ball, but it still takes a human to decide what to do with the data.

The lesson from 2016 is that a campaign must collect better predictive data in a non-invasive way and with larger groups of voters. If your polling sample is too small or you get non-response bias, you can’t get a solid picture of which parts of the electorate are leaning one way or the other.

And if you’re battling in a swing state, representative data will allow you to decide where to best spend money and volunteer effort in getting the vote out for your side.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial

Sumo Logic

More posts by Sumo Logic.

People who read this also enjoyed