Inventing the “Google” for predictive analytics | MIT News
Platform analyzes big data to answer plain-language business queries in minutes instead of months.
Companies often employ number-crunching data scientists to gather insights such as which customers want certain services or where to open new stores and stock products. Analyzing the data to answer one or two of those queries, however, can take weeks or even months.
Now MIT spinout Endor has developed a predictive-analytics platform that lets anyone, tech-savvy or not, upload raw data and input any business question into an interface — similar to using an online search engine — and receive accurate answers in just 15 minutes.
The platform is based on the science of “social physics,” co-developed at the MIT Media Lab by Endor co-founders Alex “Sandy” Pentland, the Toshiba Professor of Media Arts and Sciences, and Yaniv Altshuler, a former MIT postdoc. Social physics uses mathematic models and machine learning to understand and predict crowd behaviors.
Users of the new platform upload data about customers or other individuals, such as records of mobile phone calls, credit card purchases, or web activity. They use Endor’s “query-builder” wizard to ask questions, such as “Where should we open our next store?” or “Who is likely to try product X?” Using the questions, the platform identifies patterns of previous behavior among the data and uses social physics models to predict future behavior. The platform can also analyze fully encrypted data-streams, allowing customers such as banks or credit card operators to maintain data privacy.
“It’s just like Google. You don’t have to spend time thinking, ‘Am I going to spend time asking Google this question?’ You just Google it,” Altshuler says. “It’s as simple as that.”
Financially backed by Innovation Endeavors, the private venture capital firm of Eric Schmidt, executive chairman of Google parent company Alphabet, Inc., the startup has found big-name customers, such as Coca-Cola, Mastercard, and Walmart, among other major retail and banking firms.
Recently, Endor analyzed Twitter data for a defense agency to detect potential terrorists. Endor was given 15 million data points containing examples of 50 Twitter accounts of identified ISIS activists, based on identifiers in the metadata. From that, they asked the startup to detect 74 with identifiers extremely well hidden in the metadata. Someone at Endor completed the task on a laptop in 24 minutes, detecting 80 “lookalike” ISIS accounts, 45 of which were from the pool of 74 well-hidden accounts named by the agency. The false positive rate was also extremely low (35 accounts), meaning that human analysts could afford to have experts investigating the accounts.
Clusters of commonality
Machine learning is used for complex computational problems that are relatively static, such as image recognition and voice recognition. Written and spoken English, for instance, has been essentially unchanged for centuries.
Human behavior, on the other hand, is ever-changing. Predicting human behavior means analyzing a large number of small signals over a short period of time, perhaps days or weeks. Traditional machine-learning algorithms rely mainly on constructed models that analyze data over much longer periods.
“In general, you need a lot of data to build accurate models for human behavior, and that means you have to rely on the past. Because you rely on the past, you cannot detect things that recently happened, and you can’t predict human behavior,” Altshuler says.
Throughout the early- and mid-2000s, Pentland and Altshuler developed “social physics” in the Human Dynamics Lab, with aims of capturing and analyzing short-term data to understand and predict crowd dynamics. In their research, they found all big data contain certain mathematical patterns that indicate how social interactions spread and converge, and those patterns can help predict future behaviors.
Using those mathematical patterns, they built a platform — the core technology of Endor’s platform — that can extract “clusters” of behavioral commonalities from millions of raw data points, much more quickly and accurately than machine-learning algorithms. A cluster may represent families of four, people who buy similar foods, or individuals who visit the same locations. “Most of those data patterns would be indistinguishable from noise with any other technologies,” Altshuler says.
It isn’t immediately clear what clusters represent, just that there is a strong correlation. Querying the data, however, provides context. With customer data, for instance, someone might query which customers are most likely to buy a specific product. Using keywords, the platform matches behavioral traits — such as location and spending habits — of customers who have bought that product with those who haven’t. This overlap creates a list of possible new customers that are apt to buy the product.
In short, uploading data and asking the right question presents the platform with a basic request: Here is an example X, find me more of X. “As long as you can phrase a question in that way, you’ll get an accurate response,” Altshuler says.
Endor and Endor-ish
To test the platform, the researchers worked early on with the U.S. Defense Advanced Research Project Agency (DARPA) to analyze mobile data in certain cities in times of civil unrest to show how emerging patterns can help predict future riots. Altshuler also spent a couple months in Singapore analyzing taxi ride data to predict traffic jams in the city.
In 2014, Altshuler connected with Doron Alter, a friend and Stanford University graduate, who at that time was a partner in Innovation Endeavors. The investors asked if the technology could be wrapped “into a product that could be used by anyone,” Altshuler says.
That year, with Innovation Endeavors’ financial support, Altshuler and Pentland, a serial entrepreneur, co-founded Endor to transform the platform into commercial software. The team was joined by Alter and Stav Grinshpon, a tech-industry veteran and former leading technical expert at 8200, an Israeli Intelligence Corps. unit.
The company had soon entered the Mastercard Start Path program, a global initiative to support technology and fintech companies. The program connected Endor to the Mastercard Advisors team, where they ran a proof-of-concept experiment to answer queries Mastercard would normally ask its data scientists, such as insights into how consumer segments spend.
On a single flight from Tel Aviv, Israel, to New York City, Altshuler crunched anonymized and aggregated mock data points to answer 10 questions. Traditionally, data scientists would need to spend weeks, or months, cleaning the data and designing machine-learning models to answer each question individually. “While the company may have spent two months developing models to answer those questions, I was able to answer 10 on one transatlantic flight,” Altshuler says.
Companies may employ their own analytics-savvy staff to use Endor. Others will set up brief weekly meetings with Endor representatives to determine the best phrasing for questions. “It takes about five minutes to translate their English to what we call ‘Endor-ish,’ meaning the way our system can understand questions,” Altshuler says.
The startup’s webpage offers an example of results and a comparison with traditional machine-learning engines. A marketing department for a bank asks, “Who is going to get a mortgage in the next six months?” Machine-learning engines may detect a pool of, say, 5,000 customers who have a bank credit card and a high credit score, and are married — many of which may be false positives. Endor detects more specific clusters of, say, couples about to get married or going through a divorce, founders who recently sold their startups to Facebook, or customers who recently graduated from a local real-estate course. Results from Endor offer far fewer false positives and dig up far more additional potential customers, according to the startup.
Importantly, Altshuler says, Endor isn’t aimed at replacing data scientists; it’s designed as a tool to empower them. Data scientists, he says, are most familiar with their organization’s business semantics and can incorporate Endor into their workflow. By opening a “bottleneck” — where data input comes in faster than anyone can produce an output — Endor aims to help data scientists improve their companies. “Data scientists understand we can make them heroes,” Altshuler says.
Endor recently won the “Cool Vendor” status by Gartner, reserved for industry disrupters, and was acknowledged as a “Technological Pioneer” by the World Economic Forum. As word spreads, Endor is now gaining customers across the U.S., with first customers also in Europe and Latin America. “It’s exciting times,” Altshuler says.Reprinted with permission of MIT News