Posts Categorized: Technology

Benefits of Skepticism: Big Data


Big data is the future of design. Big data is the future of marketing. Big data is empirical. Big data is going to make up for the fallibility of the human mind.

Though there are a lot of potential applications for utilizing this data, it is important to look at it for what it is: a big pile of information we’re still trying to figure out how to sort.

Big data is the term for large data sets, “typically consisting of billions or trillions of records, that are so vast and complex that they require new and powerful computational resources to process.”[1] Often, this data is accumulated through computational processes: algorithms, machine learning, etc.

It feels significant because it is an enormous amount of information that outskirts the need for research methods and the design of a research study. If you can just access and process the data, you have your research right there.

There are, however, many problems with this usage of big data, and our utopian view of it. Big data, much like other kinds of data sets, can be incredibly biased. It can be misinterpreted. Its quality varies. It is not as sound or completely reliable as we would like it.


There are 4 things we need to consider when talking about big data:


1. The cum hoc ergo propter hoc fallacy

Latin for “with this, therefore because of this.” The phrase presents a logical fallacy about correlation. If two variables are correlated, we are often tempted to assume that one caused the other. The vast majority of assumptions made using big data are based on correlation. That one thing causes the other, or is in someway related to the other.

For example:

“A big data analysis might reveal, for instance, that from 2006 to 2011 the United States murder rate was well correlated with the market share of Internet Explorer: Both went down sharply. But it’s hard to imagine there is any causal relationship between the two. Likewise, from 1998 to 2007 the number of new cases of autism diagnosed was extremely well correlated with sales of organic food (both went up sharply), but identifying the correlation won’t by itself tell us whether diet has anything to do with autism.”[1]

Just because two variables are correlated does not necessarily mean one caused the other. Though that is the case some of the time, it is important to understand that it is not the case all of the time.

We used Google Correlate, which is the algorithm responsible for Google Flu Trends, and looked up a few random words. Google Correlate “finds search patterns which correspond with real-world trends” according to their site. Out of curiosity, we looked up “robots,” which correlates with a variety of things, but our favorite is the phrase “being a girl.”

So, at some point in 2005, a bunch of people were Googling the phrase “robots” and the phrase “being a girl.” If we were to make a cum hoc ergo propter hoc assumption about these variables, we would say that there is something about robots that is like being a girl. Or that being a girl caused us to think about robots.

It’s important to look beyond your data to add context. In early March 2005, the wonderfully whimsical children’s movie (starring the voices of Ewan McGregor, Mel Brooks, and Robin Williams) Robots hit theaters. Also, around the same time, GAP aired a commercial featuring Sarah Jessica Parker, singing a song called “I Enjoy Being a Girl.”

It’s a silly example, but it is something to consider. We can’t assume that correlation is causality, especially not without research and context outside of the dataset.


2. Recency Bias

If 90% of the world’s data was created in the last few years, we have an inherent recency bias in our data.

Recency bias is “the tendency to assume that future events will closely resemble recent experience. It’s a version of what is also known as the availability heuristic: the tendency to base your thinking disproportionately on whatever comes most easily to mind. It’s also a universal psychological attribute.”

The present moment is always the largest dataset, having a greater influence on our research than anything in the past. Thus, if we’re looking at big data for something predictive, something to tell us how things will be in the future, we need to know what is significant in our present data. We need to wash away what isn’t significant. We also need to include the past. We can not determine our future based on what has happened in the last couple years alone.


3. Confirmation Bias

Another very human psychological attribute that affects our data is confirmation bias. Confirmation bias is the “seeking or interpreting of evidence in ways that are partial to existing beliefs, expectations, or a hypothesis in hand.” This, much like recency bias, is a universal psychological characteristic. This is something everyone does, whether they are aware of it or not.

“Once we have formed a view, we embrace information that confirms that view while ignoring, or rejecting, information that casts doubt on it. Confirmation bias suggests that we don’t perceive circumstances objectively. We pick out those bits of data that make us feel good because they confirm our prejudices. Thus, we may become prisoners of our assumptions.”

The issue here is that we are coming to the data with questions. Because big data is far too large for it to yield one result, like a designed research study might, we approach the data with a question. That question, presumably, has an answer. We, as people, have an assumption of what that answer is going to be, and tend to look for data that confirms our assumptions.

This is just a truth of human psychology. We all naturally create linkages between the things we want to believe and what evidence exists that would confirm those beliefs. However, our general inability to critically think about data, especially when it’s giving us the answer we want, becomes problematic.


4. Data Quality

Data, in the past, was a result of research. Now, the majority of our data comes from private companies who are collecting it without a designed study or a specific goal. It is simply being dug up and piled somewhere. Because of this, it’s hard to tell what data we’re missing. We don’t have a good sense of what we have, let alone what the gaps in the information are.

Research is designed for a reason: to work toward an empirical and well rounded set of data, to know where it comes from, to be aware of its accuracies and faults. When we use random data, we don’t attribute for what we’re missing, for what its faults are.

For example:

“…consider the Twitter data generated by Hurricane Sandy, more than 20 million tweets between October 27 and November 1. A fascinating study combining Sandy-related Twitter and Foursquare data produced some expected findings (grocery shopping peaks the night before the storm) and some surprising ones (nightlife picked up the day after — presumably when cabin fever strikes). But these data don’t represent the whole picture. The greatest number of tweets about Sandy came from Manhattan. This makes sense given the city’s high level of smartphone ownership and Twitter use, but it creates the illusion that Manhattan was the hub of the disaster. Very few messages originated from more severely affected locations, such as Breezy Point, Coney Island and Rockaway. As extended power blackouts drained batteries and limited cellular access, even fewer tweets came from the worst hit areas. In fact, there was much more going on outside the privileged, urban experience of Sandy that Twitter data failed to convey, especially in aggregate. We can think of this as a “signal problem”: Data are assumed to accurately reflect the social world, but there are significant gaps, with little or no signal coming from particular communities.” See more here.

Big data is not sorted through or thought about critically. Because it’s simply gathered and stockpiled, it’s full of holes, inaccuracies, and misleading correlations. We must learn to read and scrutinize our data thoroughly. It is a form of literacy we have not developed because our society has an overarching believe that computation is somehow beyond human fallibility. We forget that the data is curated by algorithms we wrote, and is made up of our own information.


Overall, this is not to say big data isn’t a valuable resource. The potential for its application in a variety of fields is significant. We do, however, need to develop these literacies. We need to be skeptical about our data, where it comes from, and what it’s telling us.

– – – – –

[1] Gary Marcus & Ernest Davis, The New York Times.


In Reluctant Praise of Emoji

Oxford English Dictionary’s word of 2015 was a surprise for some of us.

Because it was this: 😂.

We even felt weird putting a period at the end of that sentence. Even though emoji are all over the place—Twitter, Instagram, in almost all text conversations we have—it has never occurred to us to define it as a word.

Oxford Dictionary emojis

What is a word?

We didn’t know how to answer this question, so we turned to the dictionary, because that’s where the words live.

Here’s what we found: words are carriers of meaning, they can be written or spoken, and they are the smallest units that can be used independently to convey meaning.

A word is made when a group of people decide on a meaning for it. This doesn’t happen around a board room table, but more organically. We all know what hangry means. It makes sense. We have a shared consensus, but we know we weren’t there when this was decided. It developed naturally. Now we all describe ourselves as hangry from time to time. The point is that the dictionary as an entity didn’t decide it, it emerged from our culture.

What Makes a Word Real?

Ann Curzan in her talk What Makes a Word “Real?” asks students to challenge dictionaries, saying that they “are human and they are not timeless.” Editors put together dictionaries. These editors are not simply deciding amongst themselves which words are real, they are trying to keep up with what is happening in our society. They are trying to stay on top of the evolution of our living language.

Unlike other forms of internet language, emoji are not part of a spoken language. We’ve (almost) all adopted the phrase lol, even to the point of saying it out loud.

However, I can’t say “little face with hearts for eyes” as efficiently as I can say “omg I love that.” But once it’s written down, 😍  goes a lot further than saying “omg I love that.” Though an emoji can’t be said out loud efficiently, it is a symbol for something that wouldn’t be conveyed as well if it were written out. And thus, if we go back to our dictionary definition, it is the smallest unit of its meaning.

We have a shared agreement on what (most) emoji mean, they can be written down, and they are the smallest unit of their meaning. Sounds like a word to us.


Marketing with Emoji

Some brands take full advantage of this. GE, Taco Bell, Budweiser, among others create social media campaigns that have no problem using emoji. You can order a pizza by tweeting🍕 to Domino’s. GE has an entire website dedicated to the approval and love of emoji. Here are a few reasons why brands have adopted this new language:

Language development emojis
Emoji are (mostly) universal.

😍  means the same thing no matter where you are. There are a few exceptions, but generally speaking, emoji transcend language. In his talk How Language Transformed Humanity, Mark Pagel says, “language is a piece of social technology for enhancing the benefits of cooperation.” Though he’s talking about the development of language generally, the concept is applicable to how global communication is adapting to the internet. Are emoji the first step toward a global language? Are they a means of enhancing cooperation between people online that are separated by languages?

Emoji are concise.

You have only 140 characters (for now!) to convey your point, empathize with your reader, and get them to click on whatever it is you want them to click on. There is no space to waste. Honestly, even if you had all the space in the world, you only have a few seconds before someone is bored and skims down to the next post.

GE emojis twitter
Take a look at GE’s tweet regarding electricity in parts of Africa. Not only is there an image associated with the tweet that is informative, but they used their 140 characters to imbed a link and keep their headline in full.

Nat Geo no emojis
National Geographic has fantastic content, but this tweet probably wouldn’t stop someone in their tracks. It doesn’t necessarily need an emoji. They could use an image to draw the eye of people lazily scanning their feed, in which case they would have to condense the tweet.

Images are more memorable than words.

The Proceedings of the National Academy of Sciences of the United States of America ran an experiment to study why people remember pictures better than words (words, in this instance, being combinations of letters). Though the majority of the study is scientifically over our heads, they state that “the greater activity in medial temporal cortex during encoding of pictures compared with words suggests that pictures more directly or effectively engage these memory-related regions in the brain, thereby resulting in superior recollection of these items.” Basically, images have more of an effect on our memory, and are easier to refer back to. To put this into the most simple of terms, a tweet with an image is more memorable than a tweet without one.

Emoji are empathetic.

Research shows that your brain reacts to facial expressions of emojis in the same way it reacts to facial expressions in real life. If you’re looking for a way to humanize your brand, to make emotional connections with your online consumer-base, an emoji might be one of the best things you can use.

But don’t do it wrong.

Don’t recklessly shove a bunch of emoji into your social media marketing. There can be something problematic about using solely emoji for communication, or using emoji in an inappropriate way. For example, in August of last year, Hillary Clinton, or at least her marketing team, tweeted “How does your student loan debt make you feel? Tell us in 3 emojis or less.” There are some things that we can not boil down to simple images, there are things that are more complicated and require more nuance than emoji can provide.

Continually, posting a tweet that is only emoji can be confusing, like a cypher people have to figure out. Chevy put out a marketing campaign that asked their readers to decipher an entire page of emoji. For a certain person, this might be fun, but most people are not going to put in the time and effort to decode it. Emoji should make what you’re saying easier to understand, more fun to read, more memorable, but not trivialize what you’re saying.

All of this is about communication, being able to reach people, to get them to a place where they understand and are eager to engage with you. The more you can be on their level, the better. The more you can convey your humanity, the better. Emoji are another vehicle of communication, potentially a silly one, but one that already functions as a successful part of our language.