Data Tells a Story: Cricket World Cup; personalizing beauty; the story behind Ikea

cricket 16

Welcome to another installment of Data Tells a Story, in which we round up our latest favorite data stories. This week: making cricket more interesting; personalizing beauty; and how Ikea took over the world.

Science: Sunny days make British office workers sad

Unlike what many might think, happiness is not correlated with sunshine, at least according to a study from the University of Westminster in England.

Researchers compared daily weather patterns from 1991 to 2009 with data from the British Panel Household Survey on wellbeing which began in 1991, and found there “was no significant variation in reported happiness between sunny days and cloudy ones.”

What they did find was a “small but statistically significant correlation between unhappiness at work and sunny days.” In other words, on a nice day people would rather be outside than stuck on the office .

How the ICC is using data analytics to make the Cricket World Cup more interesting

The International Cricket Council (ICC) is mining 40 years worth of Cricket World Cup data “to produce insights that enhance the viewer’s experience,” and to “improve team performance and strategies out on the field.”

After analyzing “statistics on scores, player performance, player profiles and more,” the ICC  came away with several findings. For instance, they discovered that the team with the most well-equipped bowling line up and not necessarily “the power-packed batting [line-up]” might be the team most likely to win the World Cup.

They also found the characteristics that make a skilled player, which explains why countries not usually thought of as dominant cricket nations, such as the UAE and Ireland, perform well: they “were found to have strong performing players with similar characteristics.”

Why VCs Are Hungry for Beauty Data Startups

A beauty start-up that began as a give-away site is now using their data to power personalization tools and help brands give their consumers better experiences.

As part of their give-away program, the start-up invited consumers to periodically answer questions, of which there were 1,300. The company has collected over 10 million data points and is using this data for customer personalization and hypertargeting.

For example, they found that almost half of acne sufferers want “overnight results”; a quarter of people are loyal to certain perfume brands and aren’t eager to try new ones; and 37% are “most influenced by bloggers and editors as to what skincare products to purchase.”

Can data analysis reveal the most bigoted corners of Reddit?

When a post on Reddit asked Redditors to nominate the most “toxic communities” on the site, Ben Bell, a data scientist at a text-analytics start-up, thought there should be an objective way to measure toxicity.

To do so, Bell pulled out a sample of comments from the top 250 subreddits and from the forums mentioned in the toxicity thread. Using sentiment analysis, each comment was coded as positive, negative, or neutral, and afterward human annotators examined the negative comments to determine their toxicity.

He found that in some subreddits, “the community is proactive enough at self-policing that the average score for a bigoted comment is negative,” and “at the other end of the spectrum are those communities which seem to deliberately encourage bigotry.” He also found another kind of toxicity, that which is directed outwards — in other words, a subreddit that “focuses on highlighting bad content around the rest of Reddit.”

How Ikea took over the world

It’s not just the meatballs.

Market research is “at the heart of Ikea’s expansion.” For example, the furniture company gathered data about morning routines from over 8,000 people in eight cities. They found that people from Shanghai were “fastest out the door” (56 minutes) while those from Mumbai were the most leisurely, clocking in at two and a half hours before leaving. Those most likely to work in the bathroom? Stockholmers and New Yorkers.

What researchers also found was that regardless of city, women spend more time than men picking out their outfits, “a process many find stressful.” Ikea’s solution? A freestanding mirror called the Knapper onto which one can hang clothes and accessories the night before to decrease morning stress.

To make up for unreliably reported data (in other words, sometimes people lie, whether consciously or not), the company incorporates observed data too, and sometimes finds their items being used in unexpected ways. For instance, via cameras set up in homes, they found that residents in Shenzhen, China often sat on the floor, “using the sofas as a backrest.”

[Photo via Flickr: “cricket 16,” CC BY 2.0 by Barry Skeates]

Swagger + SmartBear!


Since Swagger‘s creation in 2011, we’ve seen phenomenal uptake of Swagger in the API community. From startups to enterprises alike, Swagger has become a common word with both REST APIs and integration architecture. We are proud to have brought frictionless development between architects, API devs, client devs, and documentation—even the company of one!

Fast forward a few years and several thousand espressos, we see Swagger actively supported in nearly every programming language, and deployed across tens-of-thousands of servers. Thanks to you, Swagger is far and away the clear leader in the API description landscape. We have accomplished this by being completely open source, transparent about our plans and goals, and most importantly by being vendor neutral. The official Swagger tools are downloaded 7,000 times a day now! And after Microsoft’s recent announcement of Swagger support across their Azure services and tooling (see the March Azure Announcement), this will increase even more rapidly.

Despite our passion and dedication for Swagger over the last several years, Swagger has outgrown Reverb. We need to guide Swagger to its next phase of growth with more resources and focus on the API space (Reverb stays plenty busy with its publisher products!) while staying true to the transparent and open-source nature that has enabled us to grow so quickly. That said, I’m both proud and excited that SmartBear has stepped up to officially lead the Swagger project!

Why SmartBear?

A change has been in the makings for the last few months. During that period, I’ve spent a lot of time talking with potential Swagger partners. With such wide industry adoption, there’s intense interest in keeping Swagger a common standard for API descriptions. Being the “glue” between services is certainly a privilege, but comes with a great responsibility to the industry. We had to ensure that the open spirit is accelerated with our partner.

We’ve been working with SmartBear for several years now. In addition to being an important yet neutral vendor in the API space, they have a track record with open source with their industry-leading SoapUI project. Critical to the success of Swagger, both past and future, has been the open attitude to both the spec and the source. SmartBear has not only a solid grasp of APIs and their importance, but with SoapUI, they are connecting to both small shops with their OSS version as well as providing support for enterprises with Pro versions.

This is where Swagger will continue its path. Both the Swagger specification—the connective tissue for the API—and the tooling will remain completely open source. SmartBear is in fact pushing the openness of Swagger forward to the next level by engaging industry leaders to establish an open governance model for the Swagger specification. The benefits of a common and shared standard in API description has proven to be invaluable, and we don’t intend to take that for granted.

Expect great things to come in this next stage of growth for Swagger. We want APIs everywhere, and to enable the developer to focus on making great products, not API plumbing.

If you have any questions for the Swagger team, please reach out to! Thank you for all your support, and look forward to the next stage of Swagger’s growth!

Data Tells a Story: Ferguson; taxis versus Uber; how music helps


Welcome to another installment of Data Tells a Story, in which we round up our favorite data stories of the week. The latest: what data says about Ferguson; when to take a taxi instead of Uber; how music helps.

Ferguson isn’t an anomaly: The real lesson of the Department of Justice’s explosive report

Data from the Justice Department’s recent report on the Ferguson, Missouri police department showed that from 2012 through 2014, African Americans accounted for 93 percent of all arrests in Ferguson although they make up for only 67 percent of the population.

But this pattern is evident beyond the city of Ferguson. Another analysis showed that from 2000 to 2013, the stop rate for blacks in St. Louis County increased by 522 percent, compared to the stop rate for whites which increased by 284 percent.

Stop rates have also increased at the state level, showing a 385 percent increase for African Americans and 252 percent for whites, during that same time period.

Computers can now predict violent outbreaks around the world

Political scientists at Yale University have found that while US efforts in Afghanistan “to win villagers’ hearts and minds were successful enough to render their villages Taliban targets,” they weren’t effective enough to encourage villagers to provide useful intelligence about improvised explosive devices (IEDs).

Researchers discovered this by surveying 2,754 men in 204 Afghan villages “about their level of support for the Taliban and the International Security Assistance Force (ISAF),” a NATO-led security mission in Afghanistan, and combining that data with data on “insurgent violence and the locations of military bases and aid projects.”

They fed these factors into a statistical model which showed that villages’ levels of support for ISAF could predict the degree of IED attacks. For instance, a village with “modest” support for ISAF would experience 13 more attacks on average over the following five months than one strongly against the ISAF.

The Early Warning Project is hoping to use such data to be able to predict violent outbreaks before they happen and to provide additional aid in order to stop them.

Data scientists have isolated the exact times a Yellow Taxi is a better deal than an Uber

To Uber or to taxi? That’s the question, and some researchers have figured it out, at least in New York.

Computer scientists from the University of Cambridge and Belgium’s University of Nanmur compared trip and fare data for “every yellow cab ride taken in 2013” with data from Uber’s system, “which allows anyone to query how much a fare between two points would cost.” What they found was that “Uber is more expensive than a yellow cab for a trip in New York City that costs less than $35.” In other words, if you have a short ride within Manhattan, it’s cheaper to wait for a cab.

Of course there’s app for all this, which pairs the researchers’ findings with your location and destination, and tells you if you’re better off with a cab or an Uber.

How Big Data Busted Abe Lincoln

While we think of big data and computer technology as going hand-in-hand, data analysis has been used as far back as the mid-1800s to surface surprising stories.

A U.S. 19th century law provided Congressman “compensation for travel to and from their districts” at 40 cents a mile. Chicago Tribune editor Horace Greeley was “shocked” when he saw the sums, considering them “an outrageous waste” of taxpayers’ money. He also thought that the “disbursements were a wasteful relic of an earlier time, when travel to and from the far-flung reaches of the United States would have been a costly, bruising affair.” During Greeley’s time, steamships and trains were becoming more and more available, making travel faster and cheaper.

The editor asked one of his reporters to compare the “shortest path from each congressman’s district to the Capitol” (using a U.S. Post office book of mail routes) with each congressman’s mileage reimbursements. Greeley then printed the findings in his paper.

Honest Abe was one of the worst culprits, having received about $677 in excess mileage, “more than $18,700 today.” Only worse was Jefferson Davis, who received an extra $736.80.

The House was up in arms, claiming that Greeley’s charges were “absolutely false.” However, the representatives eventually passed a bill “to change the computation of mileage to ‘the shortest continuous mail route,'” although the Senate would kill it. Still later, Congress lowered the per-mile rate from 40 cents to 20.

Data-driven therapy: How The Sync Project wants to use music as medicine

Using biometric data gathered through various wearables, the Sync Project seeks to understand the psychological effects of music and why some types of music “can enhance our moods, boost concentration, trigger emotional reactions or pump up our energy levels.”

The global collaborative wants to go beyond using their findings as a “potential music recommendation algorithm” — they want to be able to use music as therapy. For instance, the Sync Project’s CEO says “music has helped his son, who suffers from autism, communicate in ways that he hasn’t been able to before,” and  “although he has trouble expressing himself, he’s able to sing through an entire Beatles song and feel relaxed after an episode.”

Other research has shown the positive effects of music on Alzheimer’s and dementia patients. In a well-known video, a nearly vegetative patient with Alzheimer’s “awakens” when listening to music from his youth.

[Photo via Flickr: “um som e o mar,” CC BY 2.0 by Raíssa Viza]

Data Tells a Story: gender equality; thinning Arctic sea ice; Twitter was right


Welcome to another installment of Data Tells a Story, in which we round up our favorite data stories of the week. The latest: gender equality; thinning Arctic sea ice; and Twitter was right.

Gender equality report: an example of how big data can address big problems

A report published by the Bill & Melinda Gates Foundation and the Bill, Hillary & Chelsea Clinton Foundation shows that while “the status of women and girls has improved substantially since 1995,” there’s still much work to be done.

After collecting and analyzing 850,000 gender-related data points over a 20-year period from nonprofit organizations such as the United Nations and the World Bank, researchers came up with a multitude of findings. For instance, they found that “almost two-thirds of the world’s illiterate adults, 496 million people, are women.” But they also found some good news: since 1995, the maternal maternity rate as decreased by 42 percent, with South Asian showing the most improvement.

For their next project, the foundations are combining data sets from the UN and other organizations to assess what those organizations have achieved “over the past two decades in the field of gender development.”

Arctic Sea Ice ‘Thinning Dramatically,’ Study Finds

A new study has found that Arctic sea ice is “thinning at a steadier and faster rate than researchers previously thought.”

The researchers acquired data from multiple sources, “making them the first to combine all available observations on Arctic sea-ice thickness into one study.” One data set from 1975 to 2000 showed that Arctic sea ice had thinned 36%. However, a larger data set used in the new study showed that this was “a little less than half” of the actual ice thinning rate, and “that the leveling off of sea ice thinning in the 1990s was only temporary.”

Fukushima data show rise and fall in food radioactivity

Four years after the Fukushima nuclear disaster, researchers have found that “few people are likely to have eaten food that exceeded strict Japanese limits on radioactive contamination.”

The researchers used data provided by a massive food-monitoring program, which sampled “foods before they hit the market for levels of radioactive elements such as caesium-137,” and banned “producers or areas that exceeded regulatory limits.” So not only did the program support food safety, it provided researchers with almost 900,000 samples collected between 2011 and 2014.

The scientists found that “during the first year after the accident, 3.3% of food from the Fukushima region had above-limit contamination” (these foods were prevented from ever reaching the market). This percentage rose slightly in 2012 but by 2014 had fallen to 0.6%

UK draws billions in unrecorded inflows, much from Russia: study

A Deutsche Bank study has shown that Britain is attracting more than a billion pounds ”of capital inflows a month not recorded by official statistics,” and up to 40 percent of this might be from Russia.

The report said that financial institutions are misreporting data and using “tax avoidance and accounting methods.” In addition, Britain has a “perceived ‘safe-haven’ status” for stolen cash, “with tens of thousands of London properties owned by secretive companies,” according to another report.

The Deutsche Bank findings are part of a broader study of “net errors and omission” (NEOs) “across major economies.” Such NEOs “could have big implications for foreign exchange rates.”

Study of TV Viewers Backs Twitter’s Claims to Be Barometer of Public Mood

Twitter has long contended that it’s “a reliable barometer of the public’s changing moods and interests,” and that volume of tweets correlates with the popularity of television programs. Now Nielsen has the data to back those claims.

In their study, researchers measured the brain activity of about 300 people as they watched several TV shows. The researchers then compared those measurements with tweets about those same shows, and found that “number of tweets correlated closely with TV viewers’ depth of engagement with whatever was appearing on the screen at that moment,” and that as the 300 viewers were getting more engaged with a particular segment, the more intense Twitter activity became.

Such data can be used to predict which shows will be most popular, Nielsen says. For instance, the more a forthcoming show is tweeted about, the more popular its premiere may be.

[Photo via Flickr, “Iceberg,” CC BY 2.0 by NOAA’s National Ocean Service]

Data Tells a Story: online dating; killer whales; the evolution of American music


It’s time once again for our latest batch of our favorite data stories. This week: what data tells us about online dating, killer whales, and the evolution of American music.

Data can tell you how to up your online dating game

According to Vox, some data analysis is able to show what works — and what doesn’t — in online dating.

For instance, one study, after analyzing more than 150,000 first messages, discovered that those who used words that focused more on the other person, such as “you,” were more likely to get a response than those who focused more on “me” or “I.”

Overly casual language was another data point that seemed to make a difference. OkCupid researchers analyzed 500,000 first messages and “found that casual spellings like like ‘ur’ and ‘wat’ in first messages pushed the reply rate well below average.” However, first messages with “haha” or “lol” resulted in above-average reply rates.

Finally, a 2006 study of 6,500 heterosexual online daters found that 60% of women who reached out to men first received a response compared to just 35% of men who made initial contact.

How Google’s using big data and machine-learning to aid drug discovery

Google, working with Stanford University, is looking at how using data from a variety of sources “can better determine which chemical compounds will serve as ‘effective drug treatments for a variety of diseases,’” says VentureBeat.

To accelerate drug discovery, Google proposes using deep learning, “a system that involves training systems called artificial neural networks on lots of information derived from key data inputs, and then introducing new information to the mix.” Google’s models use data “from many different experiments to increase prediction accuracy across many diseases.” Their results suggest that adding even more data could improve their performance even more.

Drug discovery is normally a long, arduous, and costly process. Google suggests that automating and improving predictive techniques should “not only speed up the drug discovery process but cut the costs.”

After Menopause, Killer Whale Moms Become Pod Leaders

Killer whales are only a “handful of animals” who live many years after menopause, says Scientists at the University of Exeter, the University of York and the Center for Whale Research wanted to find out why.

The research team examined “35 years’ worth of observational data,” including decades’ worth of photographs, and, like all good data researchers, noticed a pattern: “Post-menopausal females, the oldest in the group, typically swam at the front and directed their pods’ movements in a variety of scenarios.”

To try to explain this, the researchers then focused their dataset “to years when killer whales’ primary food supply, salmon, was critically low,” and found that, because the whales have such a specialized diet, “the ability to find fish becomes invaluable to the whales’ survival and reproductive success,” especially when salmon are in short supply.

That’s where “killer whales with years of hunting experience,” such as post-menopausal mothers, come in. These females may “may boost the survival of their kin is through the transfer of ecological knowledge,” which would explain why they’re the leaders of the pack.

Unplanned Births: Another Outcome of Economic Inequality?

In 2008, says The Atlantic, data showed that unplanned pregnancies were five times more likely for women in poverty. A Boston University study attempted to answer why.

The researchers looked at the data of 3,885 single women between the ages of 15 and 44 who weren’t trying to get pregnant. Across five different economic brackets, the team found a couple of similarities. For instance, in every bracket two-thirds had had sex in the past year, with women in the highest income bracket reporting the highest rate of active. Therefore, frequency of sex wasn’t a factor.

Regarding how upset the women would be if they got pregnant, the numbers were also almost the same, with one in three saying “they would not be all that upset” and two-thirds saying that it would be “very upsetting.”

The areas of discrepancy were in contraception use and occurrence of abortions. Only 11 percent of women in the highest income bracket said they didn’t use contraception while more than twice that percentage reported the same in the poorest group. As for abortions, the wealthiest group was “more than three times as likely to have the procedure than the lowest-income group.”

While different cultural or religious views should also be taken into consideration, such findings might support the idea that poorer women have less access to contraception and safe and affordable abortions. Contraception coverage is required for many federally-backed insurance plans, but abortion, “except in the case of rape, incest, or life-threatening emergency,” is often prohibited. Even some private insurers are not allowed to cover the procedure.

In addition, women in a higher income bracket might know more about more effective contraception, such as IUDs, which are more expensive upfront but cheaper in the long run than less reliable options.

Genetic Data Tools Reveal How Pop Music Evolved In The US

A team of researchers at Queen Mary University in London have applied their number crunching techniques to study American pop music’s evolution.

The researchers analyzed more than 17,000 songs from the US Billboard 100 from 1960 to 2010. First, they rated each song “in one of 8 different harmonic categories and one of 8 different timbre categories.” They then used an algorithm to “find objective categories of musical genre that depend only on the musical qualities,” which resulted in “13 separate styles of music.” Next they used enrichment analysis, a bioinformatics technique, to search for tags “that were more commonly associated with songs in each music style.”

They came away with several interesting findings. For example, the frequency of jazz or blues style has been declining since 1960 while rock has always fluctuated. Rap is rare before 1980 but afterwards skyrockets and remains the dominant genre for the next 30 years.

The team also identified three revolutions: “a major one around 1991 and two smaller ones around 1964 and 1983.” The one in 1964 was most complex with an increase in soul and rock and a decline in doowop. The 1983 change saw increases in new wave, disco, and hard rock, and a drop in soft rock and country, while 1991 saw a rise in rap-related tags.

Finally, the researchers took on a question heavy on many music lovers’ minds: the Rolling Stones or the Beatles? The answer? Neither. They discovered that the British didn’t start the American music revolution of 1964 at all, and that it was already well underway before the British invaded.

[Photo via Flickr: “Online romance,” CC BY 2.0 by Don Hankins]

Bigger on the inside: the TARDIS, technology, your brain, and beyond


We’ve always said that Reverb is bigger on the inside. Behind all of our products — whether a recommendations plug-in that fits into any site, a news app only as big as the palm of your hand, and now an analytics dashboard that gives publishers user and content data at their fingertips — is big technology embodied by our unique Interest Graph.

How big? Inside our Interest Graph are 50 million unique words modeled, 600 million users connected, and more than 30 billion web pages processed. Billions of API calls are made a month, and into all this technology have gone more than a million people hours — all powered by what has to be tens of thousands of cups of coffee.

All of this got us thinking: what else is bigger on the inside? Let’s take a look.


The TARDIS is one of the most famous examples of dimensional transcendentalism, or the (fictional) idea that an object’s interior can be bigger than its exterior due to something called transdimensional engineering.

A time machine and spacecraft, the TARDIS looks from the outside to be the size of a British police box, but on the inside “is actually infinite in size.” Thanks to augmented reality, this amazing TARDIS replica really is bigger on the inside.

The TARDIS isn’t the only living space that’s roomier than it looks. There are the wizarding tents in the Harry Potter universe, Snoopy’s doghouse (complete with rec room, birdhouse, basement, den, etc.), and Oscar’s trashcan, which “boasts such amenities as a farm, swimming pool, ice-rink, bowling alley, and a piano.”

bag of holding

In the Dungeons & Dragons universe, the bag of holding is a bag capable of holding much more than its small size implies. Similar are Mary Poppins’s carpetbag and Hermione Granger’s beaded handbag, complete with Undetectable Extension Charm, a spell that allows the bag to hold as many items as needed.

Now there’s even a brand of Bag of Holding messenger bags which are “so big, you might think [they’re] bigger on the inside.”


Reverb’s technology was built on words and so of course we believe that books are always bigger on the inside. At an average of less than a pound each, books contain tens of thousands of words, a multitude of characters (or almost 7 million if you’re the world’s longest novel), storylines, and worlds.

Get yourself a Kindle and your universe grows exponentially. Get a TARDIS little library and you have worlds within an infinite world.


They started out as a big as a room and now you don’t even need a pocket of holding to hold one.

Slim laptops as light as air aren’t the only compact computers out there. There’s the niftily named matchbox computer, which are, says Computer World, a PC that can be “crammed into a space not much larger than” — you guessed it — “a matchbox.” One of the more popular matchbox computers is the Raspberry Pi.

Finally, one of the tiniest computers out there is the microcontroller. The microcontroller resides on a single integrated circuit, also known as a microchip, and contains “a processor core, memory, and programmable input/output peripherals.” Some microchips, such as those used for animal tracking, are as small as a grain of rice.

your brain

Your brain is much bigger than a grain of rice, but it’s still pretty small compared to all the stuff inside.

The average adult brain, which is only about three pounds, holds about 100 billion neurons, or cells that act as processors and transmitters of information through connectors called synapses. There are between 1,000 to 10,000 synapses per neuron, which means you could have up to — what’s 100 billion times 10,000? A very very large number of connections firing.

That’s a ton of activity for something the size of two fists.


Reverb works a little bit like your brain does. Through our recommendation plug-in and news app, our technology makes connections quickly and gives you recommendations based on your interests. It remembers what you like and don’t like, and gives you more of what you want and less of what you don’t.

Our newest product, Reverb Insights, takes in millions of data points from publishers and other content owners, makes lightning quick connections, and tells stories based on that data. For instance, we found that the topic, Children and Grief, was high-performing in our news app data in connection with a popular news story. This told us that it isn’t just Celebrity our readers are concerned about.

Want to learn more? Check out our website, the announcement on our blog, and this terrific write-up at TechCrunch. You can also request a demo by emailing us at

[Photo via Flickr: “brain,” CC BY 2.0 by Lovelorn Poets]

Data Tells a Story: recruiting better talent; surfacing fraud; data mining chicken tikka masala


Thank God it’s Data Stories Friday! This week: recruiting better talent; surfacing fraud; and data mining chicken tikka masala.

Speaking of data stories, publishers and content owners can now see the stories their own data tells via our newest product, Reverb Insights. Find out more!

Recruiting Better Talent With Brain Games And Big Data

While employers may feel they have plenty of data points to judge a potential hire — years of experience, number of previous employers, last salary — in the end they may still be relying on a hunch about the perfect fit.

Some companies, says NPR, are offering the idea “that new behavioral science techniques” — in combination with data analytics — “can give employers more insight into hiring.”

Although intelligence and personality tests have been around for a hundred years, big data allows for the creation of “more nuanced tests” that can possibly better gauge personality traits that may “help increase productivity and reduce turnover.”

In one example, potential hires play an Angry Birds-type game while behind the scenes, the company collects thousands of data points such as level or pressure, intensity, and level of challenge, since gameplay potentially correlates with how people think and work.

LinkedIn Starts Using Its Data to Sell Ads Across the Web

LinkedIn recently released a new product that allows marketers to target ads at segments of LinkedIn’s audience and serve them ads, not just in LinkedIn but across the web.

LinkedIn follows in the footsteps of other social media companies such as Facebook, which is using login data “to help marketers connect user identities across desktop and mobile,” and Twitter, which recently launched a product “that allows its advertisers to serve targeted, Promoted Tweets to users of Flipboard and Yahoo Japan.”

LinkedIn offers the most precise targeting, “allowing marketers to pick and choose exactly the type of professionals they want to target out of its 347-million-strong network.”

Finally, a decent use for big data: Weeding out crooked City traders

In the banking hub of London, pinpointing malpractice by City traders is a priority. However, according to the Market Practitioner Panel, current methods “of monitoring for illegal trading practices, such as ‘key word surveillance’,” are flawed, and big data technology may provide a longer-term solution.

One company suggests a method called predictive coding, which “goes beyond simply looking for key words” and identifies “irregularities in patterns of behaviour,” such as communication through “unofficial channels”; working unusual hours; or missing mandatory “block leave,” or vacation.

Bringing Big Data to the Fight Against Benefits Fraud

State and city agencies have begun enlisting data companies to help root out fraud. For instance, the NYC Human Resources Administration had “data detectives” run benefit recipients through a “computerized pattern-recognition system,” which surfaced a small percentage of anomalies.

For instance, one individual had received more than $50,000 in Medicaid benefits, which raised a red flag since “most families of similar size and income typically received multiple benefits — like health coverage, food stamps and cash assistance.”

After a multisource data analysis, the data detectives found that “that the family had underreported its assets,” including an electrical contracting business, three residential properties, and banks accounts with more than $100,000.

After this kind of systematic “multisource data analysis” was implemented, staff members identified $46.5 million in fraud through almost 30,000 investigations compared with only $29 million through 48,000 investigations.

Data Mining Indian Recipes Reveals New Food Pairing Phenomenon

Some chefs have long believed in the food pairing hypothesis, that “ingredients that share the same flavors ought to combine well in recipes.” But while that practice is common in North American and Western European cuisines, does it apply to food from other parts of the world?

Researchers at the Indian Institute of Technology wanted to find out. By data mining more than 2,500 recipes “from eight sub-cuisines, including Bengali, Gujarati, Punjabi, and South Indian,” they found that the recipes contained 194 different ingredients, with an average of seven but as many as 40 for more “royal” dishes.

What the data told the researchers was that negative food pairings — that is, the combination of ingredients that have dissimilar flavors — dominate Indian cuisine, and that “the strength of this negative correlation is much higher than anything previously reported.” They also found that the addition of certain spices such as coriander, tamarind, ginger, and cinnamon “make the negative food pairing effect more powerful.”

All of this could potentially lead to the creation of even more novel Indian dishes.

[Photo: “The Host,” CC BY 2.0 by SteFou!]

Reverb Insights is here! User and content analytics at your fingertips

We have exciting news! Our latest product, Reverb Insights, a user and content analytics platform, is now available.

Now large and small content owners alike can take the guesswork out of how much and what type of content to produce for their audiences, and instead can make informed decisions based on data science. As a result, audience members stick around, engagement increases, and more monetization opportunities arise.

“It’s not rocket science,” said Mike Maples, Founding Partner, FLOODGATE. “Less engaged audience members will be more likely to leave (your site) and less likely to return. Giving users content they actually want will keep them on your site.”

So how does Reverb Insights work? “We have developed a unique approach to language,” said Tony Tam, founder and CEO, “allowing us to understand people through what they read, watch, or buy online.”

More specifically, the same technology that brought you the Reverb News App and Reverb for Publishers, has analyzed 50 million words and ideas from over 600 million users. As a result Reverb Insights has a sophisticated understanding of what people are interested in, from broader ideas such as Business & economics to narrower ones like the Startup industry:

insightspiechartIn addition, Reverb Insights allows content owners to gauge how their content is performing by showing top performing content by page views; what’s trending, and, more importantly, what’s driving those trends; and by comparing the amount of content produced versus the amount consumed by their readers.

Audience + Content Comparison

A publisher can see, for instance, that while they’re producing a lot of content on National politics, their audience members are actually more interested in Political talk shows.

But Reverb Insights doesn’t just understand what users want now, explained Tony. “Based on the content they’ve already consumed,” he said, “we can predict what they’ll want in the future.”

Want to learn more about Reverb Insights? Check out this short video demo, or you can request a live demo by sending an email to

Data Tells a Story: the Oscar race; call center matchmaking; language data tells a happy story


It’s time once again to explore the stories that data tells us. This week: data analysis and the Oscars; playing matchmaker in call centers; and, in language, data tells a happy story.

For Your Consideration: How Data Analysis Was Used in Oscar Race

A Los Angeles-based company is using data to help film production companies narrowly target subtle messages about their Oscar-nominated movies to voting Academy members, as well as those in their social networks.

The company’s other service, “a fan-made content-publishing platform,”provides them with access to 2.5 million users. From there, the company “creates a larger look-alike segment” which they can target online.

To figure out the best way to target Academy members, the company built “models based on demographic and geographic data known about the approximately 6,000 Academy members.”

Doctors store 1,600 digital hearts for big data study

Scientists in London hope to shed more light on the relationship between genetics and heart disease through the 1,600 digital hearts that have beating on their computers.

The 3D videos of the hearts, provided by volunteers, along with genetic information should provide much more information than normal clinical trials, “in which relatively small amounts of health information is collected from patients over the course of several years.”

With so much data on so many hearts, scientists and doctors can compare the data “to see what the common factors are that lead to illnesses.”

Big Data: Matching Personalities In The Call Center

Anyone who has called a customer support center knows that the conversations are often recorded “for training and quality purposes,” but a lot more could be done with the data beyond training and quality.

Using information such as behavioral factors, distress signals, and the words callers use, one company has developed “personality-based applications” for call centers to determine if a caller is “outgoing, sarcastic, serious…shy,” etc., and to match them with the most well-suited representative.

For example, if a caller is classified as “difficult,” they might be matched with a rep known for their ability to adeptly handle challenging customers. If a caller is deemed more introverted, they might be paired with a rep accustomed to drawing out those who are more reticent.

City Governments Are Using Yelp to Tell You Where Not to Eat

Yelp is partnering with city government data agencies to better inform the public about restaurant hygiene.

Besides posting hygiene scores near reviews (it was found that “restaurants whose low hygiene ratings are posted on Yelp tend to respond by cleaning up and performing better on their next inspections”), user review platforms can help improve restaurant hygiene in other ways.

For example, all of Yelp’s reviews and ratings — “some of which contain telltale words or phrases such as ‘dirty’ and ‘made me sick’” — can be merged “with the history of hygiene violations” and fed into an algorithm “that can predict the likelihood of finding problems at reviewed restaurants.” As a result, health inspectors can be “allocated more efficiently” and health departments can more efficiently use their resources.

Languages Are Mostly Made of Happy Words

While it seems, especially on the internet, that language is dominated by negativity, a recent data analysis has actually shown the opposite. After examining “100,000 words across texts in 10 different languages,” researchers found “a universal positivity bias.”

Universal positivity bias was first posited in a 1969 paper called “The Pollyanna Hypothesis” (a Pollyanna is a “foolishly” optimistic person), which showed that high school boys across different cultures more often rated words as positive than negative.

In this most recent study, researchers analyzed words from a variety of sources including Google Books, Twitter, subtitles from movies and shows, and music lyrics, and across several languages. Native speakers rated “how negative or positive they felt upon hearing those words,” and “in every language, on every platform, the median happiness score was higher than five—five being a totally neutral word.”

The happiest languages? Spanish and Portuguese.

[Photo via Flickr, “Happiness,” CC BY 2.0 by Caleb Roenigk]

Data Tells a Story: NYC subway; commuting; Valentine’s Day

nyc subway mix

It’s time again for our weekly series! This week in favorite data stories: don’t think too much about the subway; bettering urban commutes; what data tells us about Valentine’s Day.

NYC subway germs reflect their neighborhoods

Like Cambridge’s sewage system, New York City’s underworld also tells a story, but instead of a sewage, the source is the subway.

Researchers from Weill Cornell Medical College have collected 10 billion bits of data in the form of DNA fragments from surfaces in NYC subways, subway stations, parks, and a waterway. The data serves “as a mirror or echo of people who move through that station,” but also tells another story, that of Hurricane Sandy in 2012. In one station that was flooded at that time, researchers found “germs linked to marine life and Antarctic environments,” most likely swept in by the storm.

These findings are important because they “establish the first city baseline of microbial life,” which can help “detect strong changes that may determine if there is anything at all threatening,” such as disease or bioterrorism.

Urban Engines: Using data to power a better commute

Most commuters know the trek in and out of the office can be challenging and unpredictable. This is partly because, says Computer World, cities and counties have only “the barest sense of what routes are in high demand and how outside events affect traffic.” In addition, data is still collected manually (think people with clipboards and clicker counters).

One analytics startup is tackling the commuter challenge by “taking an algorithmic approach to public transit” and using data already generated every day “to deliver a better, deeper view into cities and how people move around in them.”

Based on information provided by transit agencies (for example, data generated by “tap-in/tap-out” cards), the startup can devise a working model of a city “that gives real-time visibility into what’s going on,” surfacing instances of delays, overcrowding, peak times, and more.

Starwood Hotels Using Big Data to Boost Revenue

Starwood Hotels and Resorts have huge stores of data, and one of their challenges has been how best to leverage that data to better target markets and boost revenue.

One area of focus has been “revenue optimization,” which is essentially using data to figure out the best time to push promotional offers, raise or lower prices, and how long to hold prices. Starwood also uses property and customer data “to help tailor offers to specific customers while they’re on the property,” as well as before and after their stay.

On Valentine’s Day be happy if you have friends, data suggests

Data from the Office for National Statistics in the UK has shown a positive correlation between number of real-life friends and well-being in general, “even after controlling for income, demographic variables and personality differences.”

The Guardian suggests this is especially important during “this of all weeks” as previous ONS data has shown that “unmarried adults were in the majority in England and Wales for the first time.”

Forget Valentine’s Day, we buy chocolate all year

While Valentine’s Day may seem like the quintessential flower-and-chocolates holiday, data tells a different story: neither chocolates nor flowers have peak sales in February.

A data analytics firm’s e-commerce platform, which “tracks the online shopping habits of more than 5 million people,” shows that people crave — and buy — chocolate throughout the year, and that while flower buying spikes for Valentine’s Day, it does even more so on Mother’s Day.

One theory is that while flowers on Mother’s Day has few substitutes, there’s a wider selection of gifts for Valentine’s Day, from stuffed animals to jewelry to romantic getaways.

[Photo via Flickr, “nyc subway mix,” CC BY 2.0 by Colin Mutchler]