Data Tells a Story: counting Africa’s elephants; predicting box office success; sewage stories

African Elephant, Loxodonta africana at waterhole in Mapungubwe

Gather around kiddies, it’s story time — data story time that is. This week: counting Africa’s elephants; predicting box office success; and the story of sewage.

How exactly do you count Africa’s elephants?

Fifty researchers and a fleet of small aircrafts have teamed together to track the number of African savanna elephants. In recent years, poachers have killed about 100,000. Data is already helping to crack down on poachers; now it will help to see if how well those crack-down efforts are working.

This effort is a combination of human curation and technology: observers record and photograph the elephants they see from the air while a “data logger” automates the capture of this data, “easing burden on the surveyors, improving accuracy and reducing fatigue.”

Afterwards, the data software goes to work: using an algorithm, it “combines the animal observations made in the air along with factors such as flying altitude, which gives researchers a scientifically sound animal count.”

For additional accuracy, ground survey researchers also track the elephants and cross check their information with that of the aerial team.

Snowplow tracking apps help hold cities accountable

As cities across the country do battle with another tough winter, people are using apps to see if snowplow efforts are really being made.

The apps use data that’s already available — GPS information already collected to direct plows — and either shows “skeptics that plow drivers are working hard” and “not just clearing the streets of the wealthy and well-connected,” or give evidence on “snow-cleanup shortfalls.”

How Hollywood Is Using Social Data to Better Reach Audiences — Or Not

Is there a link between social media buzz and box office performance? The short answer is yes — at least according to a company that has developed “a proprietary social-ranking tracker” that tries to predict box office performance by measuring “social-media conversations” pre-release.

Using data provided by Twitter, the company assigns on a scale of one to 100 a social media “buzziness” score. They found that a movie from last summer that scored a 98 was also a “big winner” at the box office, while movies that scored in the low 30s did poorly in terms of ticket sales.

Fashion tech startups use data science to build virtual dressing rooms

While online clothes shopping is convenient, fit isn’t a guarantee. As a result, return rates are at much as 50% to 80%, while shopping cart abandonment rates are between 60% and 75%.

Tech startups have attempted to address this issue by using data to create “virtual dressing rooms.” One company create “reference garments” based on the measurements of consumers’ favorite clothes. Another offers a “data-driven recommendation engine” that leverages data from users as well as clothing manufacturers, while a third startup focuses their recommendations on preferences such as a tighter or baggier fit.

What does Cambridge sewage say about residents? MIT plans to find out

Sewage tells a story, at least according to MIT researchers.

The Underworlds project seeks to address an issue that public officials have long been facing: a lack of metrics. The project’s goal is to collect data on the sewage of Cambridge, and as a result, to learn and predict more about the health of Cambridge residents.

Through a software platform, the collected data — specifically, the presence of viruses, bacterial pathogens, and biomarkers — will be correlated with demographic information such as ethnicity and age.

Such data could give multiple insights: it “could predict epidemics or tell when they’re waning”; demonstrate “the impact of shifts in regulations, such as bans on using trans fat in restaurants”; and help public health officials gain information to fight disease and provide health care more effectively.

[Photo via Flickr, “African Elephant, Loxodonta africana at waterhole in Mapungubwe,” CC BY 2.0 by Derek Keats]

Data Tells a Story: helping supermarkets; saving India’s wild tigers; knowing you better than Mom


It’s time again for our favorite data stories. This week: helping supermarkets; saving India’s wild tigers; and knowing you better than Mom.

Why an obscure British data-mining company is worth $3 billion

A data company owned by Tesco, the British supermarket chain? Actually, it makes perfect sense.

Twenty years ago, “long before web firms started collecting data in order to better understand their users” and better target ads, Dunnhumby launched the Tesco Clubcard loyalty program. Through that program, the then independent company was able to gather and analyze data about customer buying habits, and as a result Tesco was able to stock “its stores with precisely what customers might want in the future.”

By 2010, Tesco owned Dunnhumby outright and are now looking to sell the data mining company for as much as $3 billion.

Plan to Quit? Big Data Might Tell Your Boss Before You Do

A new data-gathering software might be able to give bosses insight that an employee is about to leave. The program analyzes employee activity “such as hiring, promotions, relocations, raises and performance review data, and enhances these findings with trends in the industry and region like job postings, shifts in worker demand and standard of living.”

The software improves through user feedback and “offers recommended job changes” once it determines an employee might be looking to leave.

Can Big Data Save The Last Of India’s Wild Tigers?

Scientists have discovered a possible way to use animal poachers’ behavior against them.

A team of ecologists with the Nature Conservation Foundation wrote computer code to analyze 25,000 data points “collected since 1972 across 605 districts — on wildlife poaching crimes, including locations of confirmed tiger poaching instances and sites where tiger parts had been seized.” This data analysis allowed the team to pinpoint 73 “hot spots” that have a “high likelihood of tiger poaching and trafficking in tiger parts,” and also may help to “better place informers on the ground and use cell phone interceptions”;  determine “where to target field patrols and forest ranger activities”; and predict “where crime patterns have changed as poachers’ tactics change.”

Facebook data know you better than your own mother

Researchers at the University of Cambridge and Stanford University found that your Facebook likes more accurately describe your personality than your friends and family. The researchers say this could “boost more targeted marketing” and even “revolutionize how people choose ‘whom to marry, hire, or elect as president.’”

As with many data models, the more Facebook Likes the better. With the average number of Likes, 227, the algorithm beat all friends and family members except spouses. But significant others were no match for the computer for people with 300 or more Likes.

Uber to Hand Over Trip Data to Boston

According to the Uber blog, the ride-hailing company and the city of Boston have entered a “partnership” to “help manage urban growth, relieve traffic congestion, expand public transportation, and reduce greenhouse gas emissions.”

However, Boston City Councilors tell a story: in the past, they have said they wanted to see data from companies like Uber and Lyft in order “to figure out how to best regulate the services.”

[Photo via Flickr, “Pretty tiger,” CC BY 2.0 by Brian Gratwicke]

Data Tells a Story: data as defense; improving housing safety; improving health care


Welcome to our latest installment of data stories! This week: data as defense against cyber attacks; using data to improve student housing safety; and predictive analytics for better health care.

Is data the new weapon against cyber attacks?

Cybersecurity is in the spotlight today more than ever. While properly trained personnel and security tools play important roles, data is fast becoming “the most important layer of defense,” says Federal Times.

In cyber-defense, data can be used to analyze behavior and look for “anomalies and suspicious activity.” Network traffic data, described by security analysts as “pure gold,” is a sort of “phone bill for activity on a network.” Through network forensics, network traffic data can be analyzed to help investigate “what happened while malicious code was present in a network” and to help prevent further attacks in the future.

Using Big Data to predict criminal risk

Kentucky and 20 other states are using data-driven programs to help determine whether or not a defendant gets out on bail and how long their prison sentence should be.

A variety of factors are analyzed, including charges and criminal history. In the first six months of using this evidence-based approach to sentencing, Kentucky has seen a 15% reduction in crime among defendants on pretrial release.

However, some are concerned that if factors such as “socioeconomic status, neighborhood and education” are included, such “data profiling” could “hurt the poor and minority groups.” On the other hand, Kentucky for one doesn’t include such information to the judge for bail and sentencing determinations.

Teacher hopefuls go through big data wringer

School districts are increasingly using consulting firms and screening tools to “slice and dice” teacher applicants into data points. The data points are “fed into an algorithm” that produces a score, which aims to predict each candidate’s likely effectiveness as a teacher.

While some school officials hope such a method will take some subjectivity out of the hiring process, educators are wary. Some say “there’s no magic formula to tell which teachers will succeed,” especially taking into consideration the large differences in “classroom resources, student populations and district expectations across the country.”

However, such screening tools — which shouldn’t be used on their own, warns a developer of such a tool — may help schools too overwhelmed to thoroughly review all applicants and may help to prevent the reliance on teaching attitudes rather than tangible skills as well as the practice of hiring unqualified yet well-connected candidates.

Boston zeroes in on overcrowded student housing

For the first time, Boston housing officials are using housing data to identify properties that are “potentially dangerous living conditions,” such as that of Boston University student Binland Lee who died in 2013 “after a fire trapped her in an illegal attic apartment with only one way out.”

Analyzing 25,000 off-campus addresses from 31 colleges, city officials found about 580 properties “that appear to have five or more full-time undergraduate students living together in violation of city rules.” If housing is found to be unsafe, city officials will work students, landlords, and universities to find alternative housing.

In more data improvements, city officials’ upgraded computer systems will now keep better track of “problem landlords” and landlord and housing complaints.

Predictive analytics, a potent prescription for health care

Predictive analytics have the potential of improving health care in a variety of areas, from “clinical quality outcomes and financial performance, to the consumer experience and drug discovery.”

One predictive analytics company “uses real-time predictive analytics to identify sepsis, a notoriously difficult-to-diagnose infection that affects 1% to 2% or all hospital patients.” Another organization tries to predict how medical devices such as pacemakers will perform over time by combining data about “physics of the human body” with information about materials used in those medical devices. The goal is to  “predict potential failures and speed innovation, while reducing the amount of money spent on client trials.”

[Photo via Flickr, CC BY 2.0 by Jason Eppink]

Data Tells a Story: Chinese food on Christmas; Netflix categories; data and the needy


After a short holiday break, we’re back with more of our favorite data stories. This week: data proves Chinese food’s popularity on Christmas; how Netflix comes up with those crazy-specific categories; and how data and technology can help those in need.

Using data to prove that Chinese food on Christmas is the perfect combination

Ben Blatt at Slate worked with online delivery service, GrubHub, to figure this out. Due to privacy concerns, GrubHub didn’t give Blatt “the raw volume of orders to different cuisines” but did provide “information about the proportion of total sales enjoyed by different cuisines and how those proportions change on any given day.”

So what about Chinese food on Christmas? Last year on December 25, those orders increased by 152 percent, compared to other cuisines which “all declined by at least 30 percent.” Christmas Eve is also a big day for General Tso.

The worst day for Chinese takeout? Super Bowl Sunday, when pizza rules.

How Netflix Reverse Engineered Hollywood

Ever wonder how Netflix comes up with those super-specific categories (for instance, Cerebral British Crime TV Dramas is one that fits this blogger to a creepy T)? Basically, they paid a bunch of people to watch films and TV shows, and “tag them with all kinds of metadata.”

But not just any metadata: the taggers use a “36-page training document that teaches them how to rate movies on their sexually suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness.” As a result, Netflix has “a stockpile of data about Hollywood entertainment that is absolutely unprecedented.”

So what do Americans like, at least according to Netflix? Movies and shows “about marriage,” that are “romantic,” set in 1980s Europe, and that star Raymond Burr.

Sounds like the next Netflix Original Series.

Big data for urban planning

Data from your cellphone doesn’t just tell how many steps you’ve walked or the restaurants you’ve checked into. Real-time data from mobile networks can help governments make decisions about city planning, public fund allocation, and traffic management, says The Daily Star.

In addition, “understanding people’s regular mobility patterns” can help “model the spread of infectious diseases such as dengue or Ebola” so that limited resources can be allocated to areas that need them the most.

Data exchange helps humanitarians act fast and effectively

Sarah Telford, head of the reporting unit at the UN Office for the Coordination of Humanitarian Affairs (Ocha), was tasked with improving its reporting. There was one small problem: she couldn’t find the data to do the analysis.

Telford threw together a quick solution, and what began as a humble spreadsheet has blossomed into a platform with 90 registered organizations and more than 1,350 datasets, all of which is used to collect and assess data points in at risk and developing nations.

For instance, you can track the number of Ebola cases in specific countries; the number of people made homeless by natural disasters; and the number of internet and radio services in Liberia, Sierra Leone, and Guinea.

The Savvy Plan to Combat Malaria With Mobile Phones

Malaria No More is a not-for-profit seeking to use cellphone data to communicate with and track those in Africa most at risk for catching malaria, a disease that “kills around 400,000 children every year.” With Africa set to “top 1 billion mobile phone subscriptions” by 2015, CEO Martin Edlund and his colleagues have a treasure trove of data to work with.

Malaria No More is also partnering with Nigerian startup, Sproxil, to combat “the massive counterfeit drug market in Nigeria.” Sproxil spent five years placing on authentic medications codes which purchasers can text to Sproxil for free for verification. To date, Sproxil has about 13.4 million pieces of such data, which is invaluable to organizations like Malaria No More as it tells “where illnesses are occurring and how they’re being treated.”

[Photo via Flickr, “Salt and Peppered Pork,” CC BY 2.0 by Samson Loo]

Data Tells a Story: gender gap; 21st century policing; colleges learn from Amazon


Welcome to the latest installment of Data Tells a Story, in which we round up our latest favorite data stories. This week: the data gender gap; using data to grow the perfect Christmas tree; and how colleges are like Amazon.

How Self-Tracking Apps Exclude Women

Apple Health, a “comprehensive” app that lets users track “everything from calories to electrodermal activity to heart rate to blood alcohol content to respiratory rate to daily intake of chromium,” is missing  an important component: it doesn’t track menstruation.

Many of the menstruation and fertility apps that exist now are also lacking in that they focus on moods (“men want to know when their girlfriends are going to be grouchy”) and approach getting pregnant “like a level in a video game.” Some women developers are creating tracking apps to meet their own needs and address the gaps seen in apps like Apple Health.

Hillary Clinton spurs ‘gender data revolution’

Hillary Clinton also recently addressed this gender gap in data gathering and proposed a “gender data revolution” in order to paint a fuller “picture of the lives of women and girls” and “make the case for why public policies around the world need to change.”

For instance, in India “only 6% of women were officially counted as employed,” but “after further research, it was discovered that women do six hours of unpaid work on average outside of the traditional economy every day.” Bringing these women “into the paid economy at the same level as men,” says Fortune, would increase India’s GDP by $1.7 trillion.

In 2012, Clinton announced the Data2X initiative, which has the goal of using “data to advance gender equality and women’s empowerment.”

Can Big Data Help Build Trust in the Police?

After the grand jury decision not to indict a police officer in the shooting death of an unarmed black teen, President Obama announced the creation of a Task Force on 21st Century Policing. Newsweek suggests that in order for the approach to be “truly ‘21st Century,’” data must be used to improve performance, and the “the task force needs to recommend ways to collect data about various aspects of police operations and provide open access for rigorous public examination.”

Big Data and the Science of the Christmas Tree

What’s the best way to grow a Christmas tree? Let big data tell you.

Researchers at the University of Connecticut and other partner universities are developing software “that will connect genetic, physical, and environmental data housed in more than 15 major plant databases.” The collection and analysis of such data not only benefits crop science, it helps “to help address important ecological issues like reforestation and climate change.”

Combining this data with information obtained via drone technology can help scientists begin to understand important questions such as the effect of climate change on a forest’s biodiversity.

Here’s the New Way Colleges Are Predicting Student Grades

About 125 colleges and universities around the U.S. are using “the performance data of former students to predict the outcomes of current ones.” Using a process similar to that employed by Amazon and Google to predict consumers’ purchasing behavior, schools have “seen impressive declines in the number of students who drop out, and increases in the proportion who graduate.”

The payoff, says TIME, “goes beyond graduation rates.” Students who stay keep paying tuition, and schools can avoid the cost of recruiting new ones, which is about “$2,433 per undergraduate at private and $457 at four-year public universities.”

[Photo via Flickr, “Power & Equality,” CC BY 2.0 by Steve Snodgrass]

Data Tells a Story: a lefty wage gap; resisting arrest; tracking our feelings


Last week we launched a new series, in which we round up the five latest of our favorite data stories. This week: an unexpected wage gap, a troubling police statistic, and tracking data once more — with feelings.

Study: Left-handed people earn 10 percent less than righties

A new study from Joshua Goodman of Harvard’s Kennedy School of Government argues that “left-handed people not only earn significantly less but do so in part due to lower cognitive abilities.”

The data showed that “lefties have annual earnings around 10 to 12 percent lower than those of righties.” Goodman attributes this gap to “observed differences in cognitive skills and emotional or behavioral problems,” and not physical differences, “since lefties tend to do more manual work than right-handers.”

Vox goes on to say that left-handed people “in the UK and US are 3 to 4 percentage points more likely than righties to be in the bottom decile of scores on math and reading tests”; are sometimes “also shown to have a greater likelihood of speech problems and learning disabilities”; and in the U.S., were slightly less likely than their right-handed counterparts to graduate from college.

However, left-handedness isn’t the cause of these issues, says Vox, but might be “a proxy for other issues,” such as “lower birth weight and complications at birth” which have been associated with left-handedness.

The Incredible Shrinking Incomes of Young Americans

According to a recent analysis of the Census Current Population Survey, the median wage for those between 25 and 34 has “fallen in every major industry except for health care” since the Great Recession began in 2007.


The Atlantic surmises a reason is that the Great Recession “devastated demand for hotels, amusement parks, and many restaurants,” and “as the ranks of young unemployed and underemployed Millennials pile up, companies around the country know they can attract applicants without raising starter wages.” In addition, middle-class jobs have been gutted by companies “sending work abroad or replacing it with automation and software.”

As for why health care wages haven’t fallen, The Atlantic asserts that this is because “demand for medical services is dominated by the government,” such as Medicare and Medicaid, and that the government “doesn’t face the same vertiginous up-and-downs as the rest of the economy.”

One Troubling “Resisting Arrest” Statistic Reveals a Major Problem with Police Departments

According to a report conducted by WNYC, half of all “resisting arrest” charges in New York are generated by only 15% of the New York Police Department. An even smaller group, just 5%, accounts for a whopping 40% of that total.

University of Nebraska accountability consultant told WNYC:

There’s a widespread pattern in American policing where resisting arrest charges are used to sort of cover — and that phrase is used — the officer’s use of force. Why did the officer use force? Well, the person was resisting arrest.

Eric Garner is said to have resisted arrested, resulting in the deadly chokehold conducted by NYPD officer Daniel Pantaleo.

Fitbit data is being used as evidence in court

Last month was what appeared to be the first case of using data from a personal fitness tracker in court. A Calgary woman claiming personal injury is using her Fitbit data to “show how her activity levels have declined since the accident,” says the Verge.

The Verge suggests that this “represents a new and unexpected use for personal data,” that the same data could be used “to establish or disprove a defendant’s alibi in a criminal case,” and that data could easily be obtained by subpoena.

Apps Are Getting All Emotional

If tracking your weight, food, exercise, and sleep isn’t enough, you may want to consider tracking your feelings. A new wave of apps do just that, whether by asking you to input data, collecting other data (such as Facebook moods), recognizing facial expressions, or measuring brainwave patterns.

But having our emotions tracked is nothing new, as FastCompany says. For instance, Spotify “can correlate a user’s mood with the kind of music they listen to,” and Facebook can (creepily) read “the emotional content of users’ news feeds and tweak them accordingly,” which is a whole other data story.

[Photo: “Robot heart goes BEEP!” CC BY 2.0 by Sean McMenemy]

The Vicious Content Cycle and How to Break Out of It

In the past, the business model for publishing was perhaps more straightforward than it is today. Consumers paid for content via subscriptions. Advertisers paid for space according to predictable factors such as size, frequency, and positioning.

However, with the advent of the Internet came a whole new, digitized world. With social media, blogging, and website platforms, almost anybody can create content (whether or not it’s any good), resulting in a bonanza — or glut, some might say — of free content.

With so much free content available, more and more readers have turned away from subscriptions and paywalls, and as a result, traditional publishers have struggled to monetize. Some, like The New York Times, use a “freemium” model, giving away some content for free and charging for full access. Others, however, might find it a challenge to attract readers willing to pay. Hence, the ad revenue model and the Vicious Content Circle.

The Vicious Cycle


Vicious Content Cycle

Most of us know what a vicious circle is: you go through a series of steps that leads you back to the first step, with unfavorable results. Some might argue an ad revenue model based on clicks is a kind of vicious content circle.

The current content model of new media and digital publishers, says FastCo Labs, plays with “old rules from old media.” In the past, the placement of print media ads involved a lot of guesswork with no quantifiable effect.

Because of this “media companies priced units to advertisers using CPMs that had no functional basis in reality,” and now this “process has migrated to digital media, even though in digital there is much more data available about readers and their reading habits.”

Why Publishers Care About CPMs

CPM, or sometimes CPI, stands for cost per mille, where mille stands for 1,000 impressions. For print media, audience data is often not available and so the estimate of 1,000 audience impressions is used. For digital media, however, audience data is available and so page views are often used to quantify number of impressions.

So why do publishers care about CPMs? Because advertisers pay a certain amount each time their ads are displayed. Each ad display counts as an impression, and so a single page view may result in multiple impressions.

Therefore in order for publishers to monetize from advertising, they must either glut their pages with ads or get as many page views as possible.

But what do their audience members want to read? Most publishers aren’t quite sure. Through DMPs, they might only have some broad information, such as gender, age, and location. However, not all 45-year old women in Chicago, Illinois have the same interests. One might love knitting; another might prefer numchucks.

As a result, creating content might involve the same type of guesswork that placing print ads did in the past, and creating content by guesswork could result in less engaged audience members.

The Importance of Engagement

Engagement is “the level of an audience member’s interaction with, and attention to, a publisher” and its content.  An audience member with high engagement, says FastCo Labs, is more valuable than one with low engagement because the high-engagement audience member is “paying more attention” to whatever is on the page.

Moreover, engagement isn’t just impressions or page views. It also includes dwell time, total content consumed, and number of times content is shared. By that logic, an audience member would want to spend more time with and share content they’re really interested in.

Not All Audience Members Are Created Equal

While all audience members are valuable, some, quite frankly, are more valuable than others, depending on their levels of engagement with content.

FastCo Labs suggests three levels of content engagement — high, medium, and casual — based on factors such as amount time spent on the content item and other user actions like printing the item, bookmarking it, or sharing it in social channels.

With limited resources, a publisher might want to focus on audience members with the highest level engagement.

The Importance of Segmentation

Perhaps more obvious than engagement segmentation is segmentation by interest. But knowing audience members are interested in Fashion, Technology, or Sports may not be enough.

A reader interested in shoes may not care about earrings. Someone into gaming might not give a flying fig about Microsoft, and football fan may be taken aback to be lumped with basketball lovers.

Knowing the audience’s true interests can help a publisher target its content, not just by interest but amount as well. Why produce 50 articles a day about television when audience members are only reading 10?

The Virtuous Cycle

Virtuous Content Circle

Virtuous Content Cycle

Publishers can break out of a vicious content cycle and get into a virtuous one by increasing and improving engagement with their audience members. But of course this is easier said than done. In the next several posts, we’ll be exploring how publishers and other content creators are engaging with their audiences.

Can’t wait to find out more? Contact Reverb at feedback AT helloreverb DOT com.