Big Data 50: Hottest Big Data Startups of 2014


From “Fast Data” to visualization software to tools used to track “Social Whales,” the startups in the Big Data 50 have it covered.

The 50 startups in the Big Data 50 are an impressive lot. In fact, the Big Data space in general is so hot that you might start worrying about it overheating – kind of like one of those mid-summer drives through the Mojave Desert. The signs warn you to turn off your AC for a reason.

Personally, I think we’re a long way away from any sort of Big Data bubble. Our economy is so used to trusting decision makers who “trust their gut” that we have much to learn before the typical business is even ready for data Kindergarten.

In fact, after a few decades following the “voodoo” of supply side economics, which fetishized the mysterious and elusive “rational consumer,” the strides we’re making towards being a more evidence-based economy still have us pretty much just playing catch up.

These 50 Big Data startups are working to change that.

Here’s a sampling of the startups in the Big Data 50 report. To access the full report, simply sign up for the Startup50 newsletter, and you’ll get the report delivered to your email inbox.

Big Data Startups Poised for Explosive Growth

These startups are relatively new to the scene, but have made big strides in a short amount of time. The more entrenched startups in this report should watch their backs, since these folks may be creeping up on them


What they do: Provide analytical tools that help users gain insights and develop predictions from streams of real-time, high-velocity data.

Headquarters: Chicago, IL Entrigna-logo

CEO: Murali Kashaboina. He previously served as Managing Director, Enterprise Architecture at United Airlines and played a key role in the merger integration of United and Continental Airlines.

Founded: 2012

Funding: The startup is currently financed by ~$1.1 million in client contracts and through the founding teams’ own investments. Entrigna says that it is in the process of raising a minimum of $5 million from angel investors and is about to close two more client contracts that will provide an additional $1.2 million.

Why they’re one of the 50: With the tremendous increase in computing power and decrease in memory and storage costs, today’s businesses are weighed down with a deluge of disconnected data, much of it highly relevant to effective decision-making. This data is associated with business applications, processes, transactions, operations, customers, customer insights, products, product insights, policies, systems, business-partnerships, competition – the list goes on and on. Since this data exists in many different formats, gaining a unified view of anything is difficult, if not impossible. Much of this data is also highly volatile and often contains time-sensitive business intelligence.

If detected real-time, such “in-the-moment” business intelligence can be used to dynamically determine an optimal course of action.

To tackle this problem, Entrigna developed a real-time decisions platform called RTES (Real Time Expert System). RTES enables real-time decisions by offering “decision frameworks” packaged together in one system. The system relies on a combination of machine learning, predictive analytics, business rules, complex event processing, optimization, and artificial intelligence in order to derive real-time actionable business decisions.

Entrigna-RTES for profit

Essentially, the RTES platform exposes such decision frameworks as built-in modularized services that can be combined and applied on an organization’s business data on a real-time basis. Then, users can identify intelligent patterns that can lead to real-time business insights.

Customers include Jisu, Opargo, Pacific Gas & Electric, Sonata Software, and Decision Analytics International.

Competitive Landscape: Entrigna will compete against legacy data mining and BI tools, such as those from IBM and SAS. Most of the Big Data startups leaning on machine learning are focused on specific verticals, such as fraud detection for credit card companies. However, other general purpose, machine-learning based startups include Ayasdi, Feedzai, Skytree, and Sumo Logic.


What they do: Feedzai uses real-time, machined-based learning to help companies prevent fraud.

Headquarters: San Mateo, CA Feedzai-new-logo

CEO: Nuno Sebastião. Prior to Feedzai, he led the development of the European Space Agency’s satellite simulation infrastructure.

Founded: 2013

Funding: Feedzai has raised $4.3 million from SAP Ventures, Data Collective, and other international investors.

Why they’re one of the 50: It’s no great revelation that online fraud is a major problem. However, its impact is often underestimated. For instance, the Target breach could end up costing as much as $680 million, according to the Ponemon Institute.

Feedzai claims that it can detect fraud in any commerce transaction, whether the credit card is present or not, in real-time. Feedzai combines artificial intelligence (AI) to build more robust predictive models and analyze consumer behavior in a way that mitigates risk, protects consumers and companies from fraud, and preserves consumer trust.

Feedzai’s software attempts to understand the way consumers behave when they make purchases anywhere, online or off. Feedzai says that its fraud detection system aggregates both online and offline purchases for each consumer over a longer time-frame, which results in earlier, more reliable detection rates.


The software uses data to create profiles for each customer, merchant, location, and POS device, with up to a 3-year history of data behind each one. Profiles are updated for each consumer after every transaction. As a result, Feedzai claims to be able to detect fraud up to 10 days earlier than traditional methods and expose up to 60 percent more fraudulent transactions.

Clients include Coca-Cola, Logica, Vodafone, Ericsson, Payment Solutions, and Servebase Credit Card Solutions.

Competitive Landscape: Competitors include SiftScience, Signifyd, Kount, and Retail Decisions (ReD).


What they do: Provide Big Data analytics applications.

Headquarters: San Ramon, CA Nuevora logo

CEO: Phani Nagarjuna, who most recently served as EVP of Products and Business Development for OneCommand, which provides a SaaS-based CRM and Loyalty Automation Platform for the auto retail industry.

Founded: 2011

Funding: $3 million in early funding from Fortisure Ventures.

Why they’re one of the 50:  Nuevora has set its sights on one of Big Data’s early growth areas: marketing and customer engagement. Nuevora’s nBAAP (Big Data Analytics & Apps) Platform features purpose-built analytics apps based on best-practices-driven predictive algorithms. nBAAP is based on three key Big Data technologies: Hadoop (data processing), R (predictive analytics), and Tableau (visualizations).


On top of all of this, Nuevora’s algorithms work on disparate sources of data (transactional, social media, mobile, campaigns) to quickly identify patterns and predictors in order to tie specific goals to individual marketing tactics.

The platform includes pre-built apps for the customer marketing business process – acquisition, retention, up-sell, cross-sell, profitability, and customer lifetime value (LTV). With only “last-mile” configurations required for individual customer situations, Nuevora’s apps empower organizations to anticipate their customers’ behaviors.

Nuevora gives end users the ability to continually recalibrate their predictions using a “closed-loop recalibration engine,” which helps organizations keep up with only the most pertinent insights based on the latest data.

Competitive Landscape: When Nuevora assesses the competitive landscape, it zeroes in on big consulting firms, such as Accenture, and other predictive analytics companies, such as Alpine Data Labs.

However, since pretty much every marketing platform under the sun now includes some sort of analytics engine, I also expect them to compete with the major marketing automation providers, such as ExactTarget (which uses Pentaho for its Big Data analytics).

Billion-Dollar Baby

Raise a billion dollars in funding, forge a tight partnership with Intel, and you get your own category.


What they do: Provide a Hadoop-based Big Data platform.

Headquarters: Palo Alto, CA Cloudera logo

CEO: Tom Reilly. Prior to joining Cloudera in 2013, he was VP and GM of enterprise security at HP. He also served as CEO of enterprise security company ArcSight, which HP acquired in 2010

Founded: 2008

Funding: Cloudera has raised over $1 billion in venture capital to date. (Yep, that’s not a typo; that’s $1 billion with a “B.” I’m wondering what you are: “why dilute yourself that much,” but founder Mike Olson has given a pretty concise answer: a close relationship with Intel, which provided much of the new funding.) Other investors include Accel Partners, Google Ventures, Greylock Partners, Ignition Partners, In-Q-Tel, Meritech Capital Partners, and T. Rowe Price.

Why they’re one of the 50: Big Data is hot, and Cloudera pioneered the Hadoop-based Big Data space.

Cloudera lets users query all of their structured and unstructured data to gain a view beyond what’s available from relational databases. Cloudera recently released Impala, a new open-source interactive query engine for Hadoop that enables interactive querying on massive data sets in real time.

Moreover, they’re sitting on a giant pile of VC cash and have a top-notch management team. Cloudera is also the first major Big Data vendor to start investing heavily in a Big Data Achilles’ heel: security.

In June 2014, Cloudera acquired Gazzang, a startup specializing in encryption software for Big Data environments. And earlier this month, Cloudera entered into a partnership with home automation company Vivent to start targeting the Internet of Things (IoT) market.

Frankly, I thought long and hard about leaving Cloudera off this list – not because they don’t belong, but because they’ve been doing well enough for long enough that I’m not sure that the label “startup” really fits that well anymore.

However, they pretty much proved the business case for Hadoop, and they’re moving the space forward with the Gazzang acquisition, so, for the time being, anyway, I’d be foolish to exclude them.

Customers include Experian, FICO, National Cancer Institute, Nokia, Western Union, and Vivint (to see how Vivint is using Cloudera, check out my latest story in Datamation, “5 Big Data Apps with Effective Use Cases”).

Competitive Landscape: Cloudera clearly has first-mover advantage, but competitors include EMC, Pivotal, Hortonworks and MapR. One of their earlier competitors, Intel, which had its own home-grown distribution, dropped it and adopted Cloudera instead.


Hadoop Darlings


What they do: Provide a Hadoop-based Big Data application hosting platform.

Headquarters: Palo Alto, CA continuuity-logo-huge

CEO: Jonathan Gray, who was previously an HBase software engineer at Facebook.

Founded: 2011

Funding: $12.5 million from Battery Ventures, Ignition Partners, Andreessen Horowitz, Data Collective and Amplify Partners.

Why they’re one of the 50: Continuuity has come up with a clever way to get around the dearth of Hadoop experts: they offer an application developer platform targeted at Java developers. The lower-level infrastructure is all abstracted away by the Continuuity platform.

The startup’s flagship product, Reactor, is a Java-based integrated data and application framework that layers on top of Apache Hadoop, HBase, and other Hadoop ecosystem components. It surfaces capabilities of the infrastructure through simple Java and REST APIs, shielding end users from unnecessary complexity. Continuuity describes Reactor as a “Big Data Application Server for Hadoop.” It abstracts all the complexities of Hadoop and enables any developer to build Big Data applications.

Continuuity’s Loom service is a cluster management solution. Clusters created with Continuuity Loom utilize templates of any hardware and software stack, from simple standalone LAMP-stack servers and traditional application servers like JBoss to full Apache Hadoop clusters comprised of thousands of nodes. Clusters can be deployed across many cloud providers (Rackspace, Joyent, OpenStack) while utilizing common SCM tools (Chef and scripts).

In June, Continuuity entered into a partnership with AT&T Labs to develop and release into open source a new real-time data processing framework that will provide streaming analytics capabilities. Initially code-named jetStream, it will be made available to the market via open source in the third quarter of 2014.

Competitive Landscape: As of now, Continuuity is uniquely positioned. Indirect competitors come from the Hadoop-as-a-Service camp (AWS EMR, Altiscale, Infochimps, Mortar Data, etc.). One thing to keep an eye in is the CEO situation. Founding CEO Todd Papaioannou, who was previously VP and chief cloud architect at Yahoo!, left the company last year. Co-founder and previous CTO Jonathan Gray has taken over the CEO role. This is Gray’s first role as a business leader.

MapR Technologies

What they do: Provide a Hadoop distribution/NoSQL Big Data platform.

Headquarters: San Jose, CA MapRLogo

CEO: John Schroeder. He previously served as CEO of Calista Technologies, which was acquired by Microsoft. Before that, he was CEO of Rainfinity, which EMC purchased.

Founded: 2009

Funding: In June 2014, MapR Technologies raised $110 million in financing in a round led by Google Capital, with participation from Qualcomm Ventures and existing investors Lightspeed Venture Partners, Mayfield Fund, NEA, and Redpoint Ventures

Why they’re one of the 50: MapR argues that Hadoop suffers from an insufficient high availability design that results in downtime and an inability to protect against the application and user errors that lead to lost data. Hadoop’s distributed file system is designed to be “append only,” which forces interactive applications to spend excessive time writing new files and results in a 150M file cluster limit.

MapR was founded to address Hadoop’s limitations, transforming it into an enterprise-grade system that more organizations can actually use.

The new architecture is a high-performance data platform that supports full random read/write data access, real-time streaming, and can scale to 1 trillion files. MapR argues that this is a new distributed architecture that provides enterprise storage capabilities, such as snapshots and mirroring, but does not have the NameNode scaling drawbacks that plague clustered file systems. MapR also innovated at the shuffle layer to provide high-speed analytic processing.

Additionally, the MapR platform provides self-healing, automated failover, and an integrated in-Hadoop database for real-time capabilities.

Named customers include, Rubicon, and comScore.

Competitive Landscape: Competitors include Cloudera, Pivotal, and Hortonworks.


What they do: Provide Hadoop-as-a-Service.

Headquarters: Tel Aviv, Israel XplentyLogo

CEO: Yaniv Mor, who previously managed the NSW SQL Services practice at Red Rock Consulting.

Founded: 2012

Funding: Xplenty is backed by an undisclosed amount of seed funding from Magma Venture Capital, and it is currently in the midst of locking down a Series A round.

Why they’re one of the 50: While Hadoop is being hyped like crazy these days, not all the hype is empty. It has indeed become the de facto infrastructure technology for Big Data. The trouble is that the development, implementation, and maintenance of Hadoop require a very specialized skill set.

Xplenty technology provides Hadoop processing on the cloud via a coding-free design environment, so businesses can quickly and easily benefit from the opportunities offered by Big Data without having to invest in hardware, software, or highly specialized personnel.

According to Xplenty, competing services still target developers, whereas Xplenty targets the data and Business Intelligence (BI) users who do not know how to write code, but who need to move data to a Big Data platform.

A drag-and-drop interface eliminates the need to write complex scripts or code of any kind. With its automatic server configuration feature, users can simply point to a data source, configure the data transformation tasks, and tell the platform where to write the results to. Xplenty’s platform uses SQL terminology. Thus, for data analysts, the learning curve should be minimal.

Customers include Fiverr, WalkMe, Dealply Technologies, and Travel Global Systems.

Competitive Landscape: The main competition comes from Amazon’s Elastic MapReduce (EMR). Other Hadoop-as-a-Service competitors include Altiscale, Mortar Data, Qubole, and recently Microsoft with Hadoop on Azure. Rackspace is about to launch its own Hadoop-as-a-Service offering based on Hortonworks’ distribution.

Machine Learning Mavens

Sumo Logic

What they do: Apply machine learning to data center operations, using data analysis to pinpoint anomalies, predict and uncover potentially disruptive events, and identify vulnerabilities.

Headquarters: Redwood City, CA SumoLogic-logo-new

CEO: Vance Loiselle, formerly VP of Global Services at BMC. He joined BMC via the acquisition of BladeLogic, which he co-founded. BMC acquired BladeLogic for $800 million.

Founded: 2010

Funding: $80 million in funding from Sequoia Capital, Accel Partners, Greylock Partners, and Sutter Hill Ventures.

Why they’re one of the 50: Sumo Logic claims to address the “unknown” problem of machine data: how do you get insights about data that you don’t know anything about, or, worse, what do you do when you don’t even know what you should be looking for?

Sumo Logic argues that managing machine data – the output of every application, website, server, and supporting IT infrastructure component in the enterprise – is the starting point for IT data analysis. Many IT departments hope they will be able to improve system or application availability, prevent downtime, detect fraud, and identify important changes in customer and application behavior by studying machine logs. However, traditional log management tools rely on pre-determined rules and thus fail to help users proactively discover events they don’t anticipate.

Sumo Logic’s Anomaly Detection attempts to solve this pain point by enabling enterprises to automatically detect events in streams of machine data, generating previously undiscoverable insights within a company’s entire IT and security infrastructure and allowing remediation before an issue impacts key business services.

Sumo Logic uses pattern-recognition technology to distill hundreds of thousands of log messages into a page or two of patterns, dramatically reducing the time it takes to find a root cause of an operational or security issue.

Customers include Netflix, McGraw-Hill, Orange, Pagerduty, and Medallia.

Competitive Landscape: Sumo Logic will compete with the likes of CloudPhysics, Splunk, and open-source alternatives like Elasticsearch and Kibana.

Built for Speed


What they do: Develop database technologies to enable real-time Big Data analytics.

Headquarters: Cupertino, CAParStream-logo

CEO: Peter Jensen. Before joining Parstream, Jensen was the CEO of StopTheHacker, which was acquired in 2013 by Cloudflare. Before StopTheHacker, Peter was VP of worldwide sales for Pancetera (acquired by Quantum) and Thinstall (acquired by VMware).

Founded: 2008

Funding: ParStream has secured $13.6 million in Series A and B funding from Khosla, Baker Capital, CrunchFund, Tola Capital and Data Collective.

Why they’re one of the 50: Traditional databases just weren’t designed for Big-Data-scale analytics, and they certainly aren’t able to deliver those insights in real time. Traditional databases analyze data sequentially and aren’t able to take advantage of advances in multi-core processing.

When I spoke with co-founder Michael Hummel at CTIA last year, he noted that memory is a big bottleneck for traditional databases. Meanwhile, the Big Data database darling, Hadoop, has trouble scaling efficiently.

Hummel argues that ParStream’s database was purpose-built for speed. Whereas many database platforms exist for the purpose of storing and analyzing large quantities of data, ParStream was designed to deliver faster response times and to reduce Big Data storage infrastructure costs in the process.

ParStream enables “Fast Data” by using a distributed architecture that processes data in parallel. ParStream was specifically engineered to deliver both big data and fast data, enabled by a unique High Performance Compressed Index (HPCI). This removes the extra step and time required for decompression of data.

ParStream claims to provide sub-second response times on billions of data records while continuously importing new data.

Customers include CAKE, MPREIS, bd4travel, INRA, Searchmetrics, Ellisphere, the German Ministry of Economics, and DERTour.

Competitive Landscape: Competitors include SAP HANA, Apache platforms and Vertica Systems (HP).


What they do: Provide a cloud-based business planning and analytics solution.

Headquarters: Redwood City, CA

CEO: Christian Gheorghe. He previously founded Tian Software, which was acquired by OutlookSoft. After SAP acquired OutlookSoft, Christian served as SVP and CTO.

Founded: 2010

Funding: Tidemark has raised approximately $80 million in funding from Greylock Partners, Andreessen Horowitz, Redpoint Venutres, Tenaya Capital, and Silicon Valley Bank.

Why they’re one of the 50: Most companies already have large volumes of data, and it just continues to pile up. However, most companies also struggle to understand how the business is performing, what’s driving the performance, and what they can do about it. Much of the data people use to make decisions is no longer housed inside the walls of the business. Instead, it comes from external, unstructured sources like news outlets, Twitter, and other streams.

Legacy systems weren’t designed to handle this type of data, so companies are stuck trying to manually assemble disparate data into a view that can be used by the business to steer the company. Unfortunately, this data becomes outdated almost instantly, and it isn’t actionable.

To tackle this problem, Tidemark designed its system by focusing on actual end user pain points. Okay, every vendor says they have a customer focus, but that’s often empty rhetoric. Prior to writing a single line of code for Tidemark’s applications, CEO Christian Gheorghe and his co-founder, Tony Rizzo, dedicated nine months to surveying Fortune 1,000 companies to find out where they were struggling with their existing business intelligence and analytics solutions.

Armed with this information, Tidemark designed its cloud-based solution to financial and operational business planning, consolidation, and analytics to the entire enterprise. Tidemark does this by providing a series of intuitive analytical apps that replace manual, spreadsheet-based planning processes or inflexible legacy solutions. Unlike these legacy solutions, Tidemark gives enterprises access to real-time, in-context data about the company’s financials. As a result, they improve business performance, reduce risk, and accelerate decision making.

Tidemark helps companies “run in the now” by giving every user access to a computational grid that is free of cubes and doesn’t limit data use by constraining volumes or dimensions. This allows data-driven decisions to be processed up to 10 times faster than typical cube-based approaches, which helps eliminate lag time and confusion over outdated information.

The solution also adds context to numbers, and helps business users manage data by “processes, not cubes.”

Customers include Acxiom, Brown University, Chiquita, Chuck E. Cheese, Hostess Brands, HubSpot, Netfix, ServiceSource, and University of Miami.

Competitive Landscape: Tidemark divides its competitors into two camps: the first consists of legacy on-premises vendors, such as SAP BPC, Oracle Hyperion, and IBM Cognos. The second are cloud-hosted providers, which include the likes of Adaptive Insights, Anaplan, and Host Analytics.

Analytics for Sales, Marketing, and Social Media


What they do: Provide a location-based insights and consumer-targeting platform.

Headquarters: New York, NY

CEO: Duncan McCall, who formerly founded PublicEarth.

Founded: 2010

Funding: The startup is backed by $27 million raised in three round of funding from IA Ventures, Social Leverage, kbs+ Ventures, Neu Venture Capital, US Venture Partners, Valhalla Partners, Harmony Partners, and Iris Capital.

Why they’re one of the 50: Mobile advertising and marketing present a unique challenge. The typical way companies try to understand consumer behavior online is through cookies. On smartphones and tablets, cookies don’t have as much traction. Even if cookies are enabled in mobile browsers, they aren’t terribly useful, since browsers are giving way to apps.

However, a potentially better replacement is location. Just as cookies track your journeys through the Web, marketers can glean demographic information from the actual physical locations you’ve visited.

PlaceIQ says that it “provides a multidimensional depiction of consumers across location and time.” This allows brands to define audiences and intelligently communicate with those audiences to support greater ROI. PlaceIQ’s platform analyzes customers based on where they have been, adding relevancy to a brand’s marketing strategy and providing demographic insights that can help improve business strategy.

Customers include Mazda, Darden Restaurants, and Montana Tourism.

Competitive Landscape: The competition includes Verve Mobile, xAd, Placed, Sense Networks, jiWire, 4INFO, and Millennial Media.


What they do: Pursway uses big data analytics and proprietary algorithms to help companies identify the customers who are most likely to influence how people in their social networks shop.

Headquarters: Herzliya, Israel; U.S. HQ: Waltham, MA logo Pursway for use

CEO: Dave Ellenberger, who previously served as CEO of 170 Systems.

Founded: 2009

Funding: $17 million from Battery Ventures and Globespan Capital Partners.

Why they’re one of the 50: In an era of social-savvy, data-driven marketing initiatives, marketers are increasingly looking for ways to unlock the power of relationship-based marketing. Most consumer behavior is influenced by the opinions of people we know and trust – family, friends, and colleagues. While marketers have known this for quite a while, they have trouble acting on it.

Pursway’s software is intended to improve customer acquisition, cross-selling opportunities, and retention. By imprinting a social graph onto existing customer and prospect data, identifying actual relationships between buyers, and identifying target customers who have a demonstrated influence over others’ purchasing decisions, Pursway argues that it can help consumer-facing organizations close the gap between how businesses market and how people actually buy.

The core of Pursway’s various services is Connect, a Big Data database that maps current relationships among more than 120 Million U.S. consumers. Drawing from thousands of open data sources, Connect matches entities to create a single network in which real human connections are identified. Connection types include places of residence, schools attended, places of work, professional activities, travel, and social media sharing.

MyPIVO, Pursway’s dashboard, gives marketers a view into their customers’ real-life relationships and their potential sales influence. Pursway’s subscription-based scoring tools give marketers insight into the friendships, connections, and potential influence of existing customer and prospect pools to better target marketing messages, drive sales, and increase the ROI of any data-driven marketing campaign.

Customers include Sony, Orange, and Comcast.

Competitive Landscape: Competitors include Angoss, IBM, and SAS.

To read the rest of the Big Data 50 report, sign up for the Startup50 newsletter to get the report sent to your inbox.