Data Tells a Story: recruiting better talent; surfacing fraud; data mining chicken tikka masala


Thank God it’s Data Stories Friday! This week: recruiting better talent; surfacing fraud; and data mining chicken tikka masala.

Speaking of data stories, publishers and content owners can now see the stories their own data tells via our newest product, Reverb Insights. Find out more!

Recruiting Better Talent With Brain Games And Big Data

While employers may feel they have plenty of data points to judge a potential hire — years of experience, number of previous employers, last salary — in the end they may still be relying on a hunch about the perfect fit.

Some companies, says NPR, are offering the idea “that new behavioral science techniques” — in combination with data analytics — “can give employers more insight into hiring.”

Although intelligence and personality tests have been around for a hundred years, big data allows for the creation of “more nuanced tests” that can possibly better gauge personality traits that may “help increase productivity and reduce turnover.”

In one example, potential hires play an Angry Birds-type game while behind the scenes, the company collects thousands of data points such as level or pressure, intensity, and level of challenge, since gameplay potentially correlates with how people think and work.

LinkedIn Starts Using Its Data to Sell Ads Across the Web

LinkedIn recently released a new product that allows marketers to target ads at segments of LinkedIn’s audience and serve them ads, not just in LinkedIn but across the web.

LinkedIn follows in the footsteps of other social media companies such as Facebook, which is using login data “to help marketers connect user identities across desktop and mobile,” and Twitter, which recently launched a product “that allows its advertisers to serve targeted, Promoted Tweets to users of Flipboard and Yahoo Japan.”

LinkedIn offers the most precise targeting, “allowing marketers to pick and choose exactly the type of professionals they want to target out of its 347-million-strong network.”

Finally, a decent use for big data: Weeding out crooked City traders

In the banking hub of London, pinpointing malpractice by City traders is a priority. However, according to the Market Practitioner Panel, current methods “of monitoring for illegal trading practices, such as ‘key word surveillance’,” are flawed, and big data technology may provide a longer-term solution.

One company suggests a method called predictive coding, which “goes beyond simply looking for key words” and identifies “irregularities in patterns of behaviour,” such as communication through “unofficial channels”; working unusual hours; or missing mandatory “block leave,” or vacation.

Bringing Big Data to the Fight Against Benefits Fraud

State and city agencies have begun enlisting data companies to help root out fraud. For instance, the NYC Human Resources Administration had “data detectives” run benefit recipients through a “computerized pattern-recognition system,” which surfaced a small percentage of anomalies.

For instance, one individual had received more than $50,000 in Medicaid benefits, which raised a red flag since “most families of similar size and income typically received multiple benefits — like health coverage, food stamps and cash assistance.”

After a multisource data analysis, the data detectives found that “that the family had underreported its assets,” including an electrical contracting business, three residential properties, and banks accounts with more than $100,000.

After this kind of systematic “multisource data analysis” was implemented, staff members identified $46.5 million in fraud through almost 30,000 investigations compared with only $29 million through 48,000 investigations.

Data Mining Indian Recipes Reveals New Food Pairing Phenomenon

Some chefs have long believed in the food pairing hypothesis, that “ingredients that share the same flavors ought to combine well in recipes.” But while that practice is common in North American and Western European cuisines, does it apply to food from other parts of the world?

Researchers at the Indian Institute of Technology wanted to find out. By data mining more than 2,500 recipes “from eight sub-cuisines, including Bengali, Gujarati, Punjabi, and South Indian,” they found that the recipes contained 194 different ingredients, with an average of seven but as many as 40 for more “royal” dishes.

What the data told the researchers was that negative food pairings — that is, the combination of ingredients that have dissimilar flavors — dominate Indian cuisine, and that “the strength of this negative correlation is much higher than anything previously reported.” They also found that the addition of certain spices such as coriander, tamarind, ginger, and cinnamon “make the negative food pairing effect more powerful.”

All of this could potentially lead to the creation of even more novel Indian dishes.

[Photo: “The Host,” CC BY 2.0 by SteFou!]

Reverb Insights is here! User and content analytics at your fingertips

We have exciting news! Our latest product, Reverb Insights, a user and content analytics platform, is now available.

Now large and small content owners alike can take the guesswork out of how much and what type of content to produce for their audiences, and instead can make informed decisions based on data science. As a result, audience members stick around, engagement increases, and more monetization opportunities arise.

“It’s not rocket science,” said Mike Maples, Founding Partner, FLOODGATE. “Less engaged audience members will be more likely to leave (your site) and less likely to return. Giving users content they actually want will keep them on your site.”

So how does Reverb Insights work? “We have developed a unique approach to language,” said Tony Tam, founder and CEO, “allowing us to understand people through what they read, watch, or buy online.”

More specifically, the same technology that brought you the Reverb News App and Reverb for Publishers, has analyzed 50 million words and ideas from over 600 million users. As a result Reverb Insights has a sophisticated understanding of what people are interested in, from broader ideas such as Business & economics to narrower ones like the Startup industry:

insightspiechartIn addition, Reverb Insights allows content owners to gauge how their content is performing by showing top performing content by page views; what’s trending, and, more importantly, what’s driving those trends; and by comparing the amount of content produced versus the amount consumed by their readers.

Audience + Content Comparison

A publisher can see, for instance, that while they’re producing a lot of content on National politics, their audience members are actually more interested in Political talk shows.

But Reverb Insights doesn’t just understand what users want now, explained Tony. “Based on the content they’ve already consumed,” he said, “we can predict what they’ll want in the future.”

Want to learn more about Reverb Insights? Check out this short video demo, or you can request a live demo by sending an email to

Data Tells a Story: the Oscar race; call center matchmaking; language data tells a happy story


It’s time once again to explore the stories that data tells us. This week: data analysis and the Oscars; playing matchmaker in call centers; and, in language, data tells a happy story.

For Your Consideration: How Data Analysis Was Used in Oscar Race

A Los Angeles-based company is using data to help film production companies narrowly target subtle messages about their Oscar-nominated movies to voting Academy members, as well as those in their social networks.

The company’s other service, “a fan-made content-publishing platform,”provides them with access to 2.5 million users. From there, the company “creates a larger look-alike segment” which they can target online.

To figure out the best way to target Academy members, the company built “models based on demographic and geographic data known about the approximately 6,000 Academy members.”

Doctors store 1,600 digital hearts for big data study

Scientists in London hope to shed more light on the relationship between genetics and heart disease through the 1,600 digital hearts that have beating on their computers.

The 3D videos of the hearts, provided by volunteers, along with genetic information should provide much more information than normal clinical trials, “in which relatively small amounts of health information is collected from patients over the course of several years.”

With so much data on so many hearts, scientists and doctors can compare the data “to see what the common factors are that lead to illnesses.”

Big Data: Matching Personalities In The Call Center

Anyone who has called a customer support center knows that the conversations are often recorded “for training and quality purposes,” but a lot more could be done with the data beyond training and quality.

Using information such as behavioral factors, distress signals, and the words callers use, one company has developed “personality-based applications” for call centers to determine if a caller is “outgoing, sarcastic, serious…shy,” etc., and to match them with the most well-suited representative.

For example, if a caller is classified as “difficult,” they might be matched with a rep known for their ability to adeptly handle challenging customers. If a caller is deemed more introverted, they might be paired with a rep accustomed to drawing out those who are more reticent.

City Governments Are Using Yelp to Tell You Where Not to Eat

Yelp is partnering with city government data agencies to better inform the public about restaurant hygiene.

Besides posting hygiene scores near reviews (it was found that “restaurants whose low hygiene ratings are posted on Yelp tend to respond by cleaning up and performing better on their next inspections”), user review platforms can help improve restaurant hygiene in other ways.

For example, all of Yelp’s reviews and ratings — “some of which contain telltale words or phrases such as ‘dirty’ and ‘made me sick’” — can be merged “with the history of hygiene violations” and fed into an algorithm “that can predict the likelihood of finding problems at reviewed restaurants.” As a result, health inspectors can be “allocated more efficiently” and health departments can more efficiently use their resources.

Languages Are Mostly Made of Happy Words

While it seems, especially on the internet, that language is dominated by negativity, a recent data analysis has actually shown the opposite. After examining “100,000 words across texts in 10 different languages,” researchers found “a universal positivity bias.”

Universal positivity bias was first posited in a 1969 paper called “The Pollyanna Hypothesis” (a Pollyanna is a “foolishly” optimistic person), which showed that high school boys across different cultures more often rated words as positive than negative.

In this most recent study, researchers analyzed words from a variety of sources including Google Books, Twitter, subtitles from movies and shows, and music lyrics, and across several languages. Native speakers rated “how negative or positive they felt upon hearing those words,” and “in every language, on every platform, the median happiness score was higher than five—five being a totally neutral word.”

The happiest languages? Spanish and Portuguese.

[Photo via Flickr, “Happiness,” CC BY 2.0 by Caleb Roenigk]

Data Tells a Story: NYC subway; commuting; Valentine’s Day

nyc subway mix

It’s time again for our weekly series! This week in favorite data stories: don’t think too much about the subway; bettering urban commutes; what data tells us about Valentine’s Day.

NYC subway germs reflect their neighborhoods

Like Cambridge’s sewage system, New York City’s underworld also tells a story, but instead of a sewage, the source is the subway.

Researchers from Weill Cornell Medical College have collected 10 billion bits of data in the form of DNA fragments from surfaces in NYC subways, subway stations, parks, and a waterway. The data serves “as a mirror or echo of people who move through that station,” but also tells another story, that of Hurricane Sandy in 2012. In one station that was flooded at that time, researchers found “germs linked to marine life and Antarctic environments,” most likely swept in by the storm.

These findings are important because they “establish the first city baseline of microbial life,” which can help “detect strong changes that may determine if there is anything at all threatening,” such as disease or bioterrorism.

Urban Engines: Using data to power a better commute

Most commuters know the trek in and out of the office can be challenging and unpredictable. This is partly because, says Computer World, cities and counties have only “the barest sense of what routes are in high demand and how outside events affect traffic.” In addition, data is still collected manually (think people with clipboards and clicker counters).

One analytics startup is tackling the commuter challenge by “taking an algorithmic approach to public transit” and using data already generated every day “to deliver a better, deeper view into cities and how people move around in them.”

Based on information provided by transit agencies (for example, data generated by “tap-in/tap-out” cards), the startup can devise a working model of a city “that gives real-time visibility into what’s going on,” surfacing instances of delays, overcrowding, peak times, and more.

Starwood Hotels Using Big Data to Boost Revenue

Starwood Hotels and Resorts have huge stores of data, and one of their challenges has been how best to leverage that data to better target markets and boost revenue.

One area of focus has been “revenue optimization,” which is essentially using data to figure out the best time to push promotional offers, raise or lower prices, and how long to hold prices. Starwood also uses property and customer data “to help tailor offers to specific customers while they’re on the property,” as well as before and after their stay.

On Valentine’s Day be happy if you have friends, data suggests

Data from the Office for National Statistics in the UK has shown a positive correlation between number of real-life friends and well-being in general, “even after controlling for income, demographic variables and personality differences.”

The Guardian suggests this is especially important during “this of all weeks” as previous ONS data has shown that “unmarried adults were in the majority in England and Wales for the first time.”

Forget Valentine’s Day, we buy chocolate all year

While Valentine’s Day may seem like the quintessential flower-and-chocolates holiday, data tells a different story: neither chocolates nor flowers have peak sales in February.

A data analytics firm’s e-commerce platform, which “tracks the online shopping habits of more than 5 million people,” shows that people crave — and buy — chocolate throughout the year, and that while flower buying spikes for Valentine’s Day, it does even more so on Mother’s Day.

One theory is that while flowers on Mother’s Day has few substitutes, there’s a wider selection of gifts for Valentine’s Day, from stuffed animals to jewelry to romantic getaways.

[Photo via Flickr, “nyc subway mix,” CC BY 2.0 by Colin Mutchler]

Data Tells a Story: parsing parrots; helping hockey fans; the Super Bowl and sales calls


It’s time again to explore how different organizations are using data to learn, improve, and predict. This week: parsing parrots, helping hockey fans, and why salespeople should be glad the Patriots won.

Parrot Pecking Order Hints at Humans’ Social Lives

Who’s top bird, and why? That’s what Elizabeth Hobson, a researcher at the National Institute for Mathematical and Biological Synthesis, wants to find out about parrots, and she’s using data to get the answers.

While understanding why some animal species exhibit “complex social structures” is a long-standing interest in biology, says Hobson, there’s a lack of standardized methods “to define or quantify levels of social complexity.” Hobson wants to help develop those methods.

The advantage collecting and analyzing data provides over observation alone is that “as an observer, it’s almost impossible to pick out the really subtle patterns in the data.” Only upon quantitative analysis do patterns begin to emerge. For instance, while birds at the very top and bottom of the hierarchy are easy to pick out, it’s more difficult to discern the hierarchy within the “middle-ranked birds.”

Quantitative methods can determine “the full rank order” of the entire group, and provide a fuller, more accurate understanding of the group’s social structure as a whole.

NHL, Sportvision test program to track players, puck

A company has developed a way to digitally track the fast-moving game of hockey.

Through tracking chips inside pucks and tracking devices inside players’ jerseys, information can be collected “at a rate of 30 times per second.” This wealth of data can be mined to help coaches better understand the game by closely examining different aspects such as “what happens with the penalty kill against a power play in certain formations.”

It can also provide “greater accuracy” for statistics on concepts such as “puck possession, zone entries and quality of competition.”

Most of all, however, the company wants to help fans at home follow players more easily as they quickly come on and off the ice.

A Head Coach Botched The End Of The Super Bowl, And It Wasn’t Pete Carroll

It was the play that sent Super Bowl fans into a frenzy: instead of giving the ball to Seattle Seahawks running back Marshawn Lynch (aka The Beast Mode), quarterback Russell Wilson “attempted a doomed pass that [Patriots’ cornerback] Malcolm Butler intercepted” — and that decided the game.

It has been called “the worst play call…in the history of football,” and head coach Pete Carroll has been accused of “botching” the Super Bowl for the Seahawks. However, FiveThirtyEightSports begs to differ.

Using the Advanced Football Analytics’ Win Probability Calculator, FiveThirtyEight figured out that “under the most pro-Beast set of assumptions, rushing may have been the better play but by the slimmest of margins (0.3 percentage points).” In addition, while many have thought “the risk of throwing an interception was too great,” this season quarterbacks threw, on the one-yard line, 66 touchdowns with just one interception: Wilson’s.

It was Patriots’ head coach, Bill Belichick, who made the worst play call of the game, according to FiveThirtyEight. Instead of calling a timeout “after Lynch ran 4 yards to set up second-and-goal at the 1,” the Pats let the clock run, “as if head coach Bill Belichick psychically knew the Seahawks would muck it up.”

Luckily for New England — and unluckily for Seattle — they did.

Here’s the data that proves Patriots fans take losses harder than everyone else

Salespeople in New England can thank the football gods that the Patriots won the Super Bowl this year: the last time they lost — in 2012 to the NY Giants — their home region showed a 45% decline in accepting sales calls after the Super Bowl, the biggest “among all losing team regions in the past.”

Losing regions showed on average a 25% decrease while winning regions saw a 32% increase in accepting sales calls. It takes about three weeks for contact rates to return to normal after the Super Bowl.

Tracking 125,000 Incidents Of Global Terrorism

The Global Terrorism Database (GTD) at the University of Maryland at College Park includes information on “58,000 bombings, 15,000 assassinations and 6,000 kidnappings going back to 1970.” This authoritative source “on both the history and dispersion of terrorist attacks” allows researchers “to compare domestic terrorism to transnational attacks and to follow terrorist organizations over time.” Such data is integral to understanding the true scope and frequency of terrorists attacks around the world today.

For instance, the data shows that the vast majority of terrorist attacks are “concentrated in a small number of countries where thousands of people die every year in these incidents,” and that while “half of terrorist attacks claim no lives,” terrorism fatalities still “number in the tens of thousands globally.”

The University of Maryland researchers understand the what of terrorism, but now they want to understand the why. They’ve compiled a set of “more than 1,500 people radicalized to violent and non-violent extremism in the United States since World War II,” and have divided that set into three categories, Islamist, Far Right, and Far Left.

Their preliminary findings suggest that Islamists stand out in some ways — they’re more likely to be young, unmarried, “unassimilated into American society,” and “to be actively recruited to an extremist group” — but in other ways are similar to other extremists. For instance, they “were equally likely to have become radicalized while serving time in prison,” and to have psychological issues, be loners, or “have recently experienced ‘a loss of social standing.’”

Perhaps once researchers understand the why behind such heinous acts, they can help determine the how behind stopping them.

[Photo via Flickr: “Parrots,” CC BY 2.0 by Chris]

Data Tells a Story: Twitter rage and heart health, consumer behavior and creditworthiness, preventing homelessness


Once upon a time, there was some data and the data told a story. This week’s Data Stories: angry tweets and heart disease, consumer behavior and creditworthiness, and preventing homelessness.

Can Angry Tweets Predict Heart-Disease Rates?

A recent study has found that Twitter sentiment is a better predictor of heart disease than “traditional” health and demographic variables such as age, race, and smoking and drinking behavior.

Using “148 million county-mapped tweets across 1,347 counties,” along with mortality statistics from the CDC, researchers discovered that in predicting the prevalence of heart disease mortality in a given county, a model based only on Twitter sentiment “performed slightly but significantly better than a model using all the classic predictors like age, race, and smoking and drinking behavior.”

So what does a 30-year’s angry tweets have to do with someone who has died from a heart attack? As New York Magazine says, “communities affect individuals.” Moreover:

Epidemiological studies have found that the aggregated characteristics of communities, such as social cohesion and social capital, account for a significant portion of variation in health outcomes, independently of individual-level characteristics.

Basically, by learning about the sentiments of one tweeter, you’re also learning about her community, including “its older, more vulnerable residents.”

Data From Alibaba’s E-Commerce Sites Is Now Powering A Credit-Scoring Service

Now Alibaba consumer data won’t just be used to predict consumer behavior but also creditworthiness.

An Alibaba affiliate has launched Sesame Credit (“open sesame,” get it?), a credit-scoring system “that uses data from the e-commerce giant’s sites.” Sesame Credit applies customer behavior analytics to data collected from the 300 million registered Alibaba shoppers as well as “37 million vendors that use Alibaba Group’s marketplaces” and payment histories from Alipay, an Alibaba subsidiary and “China’s largest online payments platform.” From there, Sesame Credit determines if an applicant is loan-worthy or not.

While it may seem strange that your shopping transactions could affect whether or not you get a loan, it’s useful for those without a credit history. These shoppers can show they follow through on transactions and interact well with others financially even without having applied for a bank loan or credit card.

Data-led team building: increasing the odds of success

The sports and business worlds alike are using data and technology to recruit “personnel.”

Coaches who rely on “evidence-based decisions” are forced “to consider all of the factors involved in a decision on team selection, to clarify what the evidence is on each factor, and to make explicit the relative importance of different factors.” Data analysis helps teams know their strengths and weaknesses, and clarifies critical — and sometimes surprising — success factors.

However, in both sports and business, performance data isn’t enough. How a team interacts together is “more of a matter of judgement than analysis,” and while data can get the person “in the door,” gut instinct often outweighs data.

Better use of data could help prevent future NHS crises

In recent months, hospitals across the UK have fallen short of the high standards they’ve set for themselves, resulting in longer wait times for patients, canceled operations, and a higher than expected number of patients. These hospitals want to know if they have the right information “to understand emergency care and to plan services properly.”

This is where data analytics comes in. A London-based consultancy has analyzed the available national data, says The Guardian, and based on the consultancy’s findings, hospitals would have known to have planned for “a return to higher levels of emergency admissions,” and “could have created more beds or provided more community support to help people stay out of hospital.”

The Guardian says too much of what healthcare workers do “is shaped by looking backwards,” and that the healthcare industry needs to get much better at using the information available to help predict “the impact of changes in primary care, social care and community services – as well as underlying patterns of disease.”

Big Data Could Help Some of the 200,000 NYC Households That Get Eviction Notices This Year

It’s well-known that big data can be used to target customers, but it can also be used to target those most in need.

A problem for social services agencies battling homelessness is “figuring out who’s at risk of imminently becoming homeless amid thousands of eviction notices, and reaching those who need help.” One Brooklyn agency had the arduous task of manually combing through the 5,000 new eviction cases every month in order to figure out the right addresses to send letters describing their eviction prevention counseling services.

An analytics firm created a tool that uses data such as “court records, shelter history, [and] demographic information,” and along with an algorithm, pinpoints those most at risk for homelessness, helping social workers focus their efforts “down to the individual.”

As a result, the social service agency’s process which usually took days took only hours, and in the test pilot, they were able to “connect 50 percent more families in the test neighborhood with eviction prevention services compared to demographically similar neighborhoods nearby over the same period of time,” which means about 65 families “avoided ending up in a shelter.”

[Photo via Flickr, “Tweet Me,” CC BY 2.0 by Kate Ter Haar]

Data Tells a Story: counting Africa’s elephants; predicting box office success; sewage stories

African Elephant, Loxodonta africana at waterhole in Mapungubwe

Gather around kiddies, it’s story time — data story time that is. This week: counting Africa’s elephants; predicting box office success; and the story of sewage.

How exactly do you count Africa’s elephants?

Fifty researchers and a fleet of small aircrafts have teamed together to track the number of African savanna elephants. In recent years, poachers have killed about 100,000. Data is already helping to crack down on poachers; now it will help to see if how well those crack-down efforts are working.

This effort is a combination of human curation and technology: observers record and photograph the elephants they see from the air while a “data logger” automates the capture of this data, “easing burden on the surveyors, improving accuracy and reducing fatigue.”

Afterwards, the data software goes to work: using an algorithm, it “combines the animal observations made in the air along with factors such as flying altitude, which gives researchers a scientifically sound animal count.”

For additional accuracy, ground survey researchers also track the elephants and cross check their information with that of the aerial team.

Snowplow tracking apps help hold cities accountable

As cities across the country do battle with another tough winter, people are using apps to see if snowplow efforts are really being made.

The apps use data that’s already available — GPS information already collected to direct plows — and either shows “skeptics that plow drivers are working hard” and “not just clearing the streets of the wealthy and well-connected,” or give evidence on “snow-cleanup shortfalls.”

How Hollywood Is Using Social Data to Better Reach Audiences — Or Not

Is there a link between social media buzz and box office performance? The short answer is yes — at least according to a company that has developed “a proprietary social-ranking tracker” that tries to predict box office performance by measuring “social-media conversations” pre-release.

Using data provided by Twitter, the company assigns on a scale of one to 100 a social media “buzziness” score. They found that a movie from last summer that scored a 98 was also a “big winner” at the box office, while movies that scored in the low 30s did poorly in terms of ticket sales.

Fashion tech startups use data science to build virtual dressing rooms

While online clothes shopping is convenient, fit isn’t a guarantee. As a result, return rates are at much as 50% to 80%, while shopping cart abandonment rates are between 60% and 75%.

Tech startups have attempted to address this issue by using data to create “virtual dressing rooms.” One company create “reference garments” based on the measurements of consumers’ favorite clothes. Another offers a “data-driven recommendation engine” that leverages data from users as well as clothing manufacturers, while a third startup focuses their recommendations on preferences such as a tighter or baggier fit.

What does Cambridge sewage say about residents? MIT plans to find out

Sewage tells a story, at least according to MIT researchers.

The Underworlds project seeks to address an issue that public officials have long been facing: a lack of metrics. The project’s goal is to collect data on the sewage of Cambridge, and as a result, to learn and predict more about the health of Cambridge residents.

Through a software platform, the collected data — specifically, the presence of viruses, bacterial pathogens, and biomarkers — will be correlated with demographic information such as ethnicity and age.

Such data could give multiple insights: it “could predict epidemics or tell when they’re waning”; demonstrate “the impact of shifts in regulations, such as bans on using trans fat in restaurants”; and help public health officials gain information to fight disease and provide health care more effectively.

[Photo via Flickr, “African Elephant, Loxodonta africana at waterhole in Mapungubwe,” CC BY 2.0 by Derek Keats]

Data Tells a Story: helping supermarkets; saving India’s wild tigers; knowing you better than Mom


It’s time again for our favorite data stories. This week: helping supermarkets; saving India’s wild tigers; and knowing you better than Mom.

Why an obscure British data-mining company is worth $3 billion

A data company owned by Tesco, the British supermarket chain? Actually, it makes perfect sense.

Twenty years ago, “long before web firms started collecting data in order to better understand their users” and better target ads, Dunnhumby launched the Tesco Clubcard loyalty program. Through that program, the then independent company was able to gather and analyze data about customer buying habits, and as a result Tesco was able to stock “its stores with precisely what customers might want in the future.”

By 2010, Tesco owned Dunnhumby outright and are now looking to sell the data mining company for as much as $3 billion.

Plan to Quit? Big Data Might Tell Your Boss Before You Do

A new data-gathering software might be able to give bosses insight that an employee is about to leave. The program analyzes employee activity “such as hiring, promotions, relocations, raises and performance review data, and enhances these findings with trends in the industry and region like job postings, shifts in worker demand and standard of living.”

The software improves through user feedback and “offers recommended job changes” once it determines an employee might be looking to leave.

Can Big Data Save The Last Of India’s Wild Tigers?

Scientists have discovered a possible way to use animal poachers’ behavior against them.

A team of ecologists with the Nature Conservation Foundation wrote computer code to analyze 25,000 data points “collected since 1972 across 605 districts — on wildlife poaching crimes, including locations of confirmed tiger poaching instances and sites where tiger parts had been seized.” This data analysis allowed the team to pinpoint 73 “hot spots” that have a “high likelihood of tiger poaching and trafficking in tiger parts,” and also may help to “better place informers on the ground and use cell phone interceptions”;  determine “where to target field patrols and forest ranger activities”; and predict “where crime patterns have changed as poachers’ tactics change.”

Facebook data know you better than your own mother

Researchers at the University of Cambridge and Stanford University found that your Facebook likes more accurately describe your personality than your friends and family. The researchers say this could “boost more targeted marketing” and even “revolutionize how people choose ‘whom to marry, hire, or elect as president.’”

As with many data models, the more Facebook Likes the better. With the average number of Likes, 227, the algorithm beat all friends and family members except spouses. But significant others were no match for the computer for people with 300 or more Likes.

Uber to Hand Over Trip Data to Boston

According to the Uber blog, the ride-hailing company and the city of Boston have entered a “partnership” to “help manage urban growth, relieve traffic congestion, expand public transportation, and reduce greenhouse gas emissions.”

However, Boston City Councilors tell a story: in the past, they have said they wanted to see data from companies like Uber and Lyft in order “to figure out how to best regulate the services.”

[Photo via Flickr, “Pretty tiger,” CC BY 2.0 by Brian Gratwicke]

Data Tells a Story: data as defense; improving housing safety; improving health care


Welcome to our latest installment of data stories! This week: data as defense against cyber attacks; using data to improve student housing safety; and predictive analytics for better health care.

Is data the new weapon against cyber attacks?

Cybersecurity is in the spotlight today more than ever. While properly trained personnel and security tools play important roles, data is fast becoming “the most important layer of defense,” says Federal Times.

In cyber-defense, data can be used to analyze behavior and look for “anomalies and suspicious activity.” Network traffic data, described by security analysts as “pure gold,” is a sort of “phone bill for activity on a network.” Through network forensics, network traffic data can be analyzed to help investigate “what happened while malicious code was present in a network” and to help prevent further attacks in the future.

Using Big Data to predict criminal risk

Kentucky and 20 other states are using data-driven programs to help determine whether or not a defendant gets out on bail and how long their prison sentence should be.

A variety of factors are analyzed, including charges and criminal history. In the first six months of using this evidence-based approach to sentencing, Kentucky has seen a 15% reduction in crime among defendants on pretrial release.

However, some are concerned that if factors such as “socioeconomic status, neighborhood and education” are included, such “data profiling” could “hurt the poor and minority groups.” On the other hand, Kentucky for one doesn’t include such information to the judge for bail and sentencing determinations.

Teacher hopefuls go through big data wringer

School districts are increasingly using consulting firms and screening tools to “slice and dice” teacher applicants into data points. The data points are “fed into an algorithm” that produces a score, which aims to predict each candidate’s likely effectiveness as a teacher.

While some school officials hope such a method will take some subjectivity out of the hiring process, educators are wary. Some say “there’s no magic formula to tell which teachers will succeed,” especially taking into consideration the large differences in “classroom resources, student populations and district expectations across the country.”

However, such screening tools — which shouldn’t be used on their own, warns a developer of such a tool — may help schools too overwhelmed to thoroughly review all applicants and may help to prevent the reliance on teaching attitudes rather than tangible skills as well as the practice of hiring unqualified yet well-connected candidates.

Boston zeroes in on overcrowded student housing

For the first time, Boston housing officials are using housing data to identify properties that are “potentially dangerous living conditions,” such as that of Boston University student Binland Lee who died in 2013 “after a fire trapped her in an illegal attic apartment with only one way out.”

Analyzing 25,000 off-campus addresses from 31 colleges, city officials found about 580 properties “that appear to have five or more full-time undergraduate students living together in violation of city rules.” If housing is found to be unsafe, city officials will work students, landlords, and universities to find alternative housing.

In more data improvements, city officials’ upgraded computer systems will now keep better track of “problem landlords” and landlord and housing complaints.

Predictive analytics, a potent prescription for health care

Predictive analytics have the potential of improving health care in a variety of areas, from “clinical quality outcomes and financial performance, to the consumer experience and drug discovery.”

One predictive analytics company “uses real-time predictive analytics to identify sepsis, a notoriously difficult-to-diagnose infection that affects 1% to 2% or all hospital patients.” Another organization tries to predict how medical devices such as pacemakers will perform over time by combining data about “physics of the human body” with information about materials used in those medical devices. The goal is to  “predict potential failures and speed innovation, while reducing the amount of money spent on client trials.”

[Photo via Flickr, CC BY 2.0 by Jason Eppink]

Data Tells a Story: Chinese food on Christmas; Netflix categories; data and the needy


After a short holiday break, we’re back with more of our favorite data stories. This week: data proves Chinese food’s popularity on Christmas; how Netflix comes up with those crazy-specific categories; and how data and technology can help those in need.

Using data to prove that Chinese food on Christmas is the perfect combination

Ben Blatt at Slate worked with online delivery service, GrubHub, to figure this out. Due to privacy concerns, GrubHub didn’t give Blatt “the raw volume of orders to different cuisines” but did provide “information about the proportion of total sales enjoyed by different cuisines and how those proportions change on any given day.”

So what about Chinese food on Christmas? Last year on December 25, those orders increased by 152 percent, compared to other cuisines which “all declined by at least 30 percent.” Christmas Eve is also a big day for General Tso.

The worst day for Chinese takeout? Super Bowl Sunday, when pizza rules.

How Netflix Reverse Engineered Hollywood

Ever wonder how Netflix comes up with those super-specific categories (for instance, Cerebral British Crime TV Dramas is one that fits this blogger to a creepy T)? Basically, they paid a bunch of people to watch films and TV shows, and “tag them with all kinds of metadata.”

But not just any metadata: the taggers use a “36-page training document that teaches them how to rate movies on their sexually suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness.” As a result, Netflix has “a stockpile of data about Hollywood entertainment that is absolutely unprecedented.”

So what do Americans like, at least according to Netflix? Movies and shows “about marriage,” that are “romantic,” set in 1980s Europe, and that star Raymond Burr.

Sounds like the next Netflix Original Series.

Big data for urban planning

Data from your cellphone doesn’t just tell how many steps you’ve walked or the restaurants you’ve checked into. Real-time data from mobile networks can help governments make decisions about city planning, public fund allocation, and traffic management, says The Daily Star.

In addition, “understanding people’s regular mobility patterns” can help “model the spread of infectious diseases such as dengue or Ebola” so that limited resources can be allocated to areas that need them the most.

Data exchange helps humanitarians act fast and effectively

Sarah Telford, head of the reporting unit at the UN Office for the Coordination of Humanitarian Affairs (Ocha), was tasked with improving its reporting. There was one small problem: she couldn’t find the data to do the analysis.

Telford threw together a quick solution, and what began as a humble spreadsheet has blossomed into a platform with 90 registered organizations and more than 1,350 datasets, all of which is used to collect and assess data points in at risk and developing nations.

For instance, you can track the number of Ebola cases in specific countries; the number of people made homeless by natural disasters; and the number of internet and radio services in Liberia, Sierra Leone, and Guinea.

The Savvy Plan to Combat Malaria With Mobile Phones

Malaria No More is a not-for-profit seeking to use cellphone data to communicate with and track those in Africa most at risk for catching malaria, a disease that “kills around 400,000 children every year.” With Africa set to “top 1 billion mobile phone subscriptions” by 2015, CEO Martin Edlund and his colleagues have a treasure trove of data to work with.

Malaria No More is also partnering with Nigerian startup, Sproxil, to combat “the massive counterfeit drug market in Nigeria.” Sproxil spent five years placing on authentic medications codes which purchasers can text to Sproxil for free for verification. To date, Sproxil has about 13.4 million pieces of such data, which is invaluable to organizations like Malaria No More as it tells “where illnesses are occurring and how they’re being treated.”

[Photo via Flickr, “Salt and Peppered Pork,” CC BY 2.0 by Samson Loo]