Big Data 50 – the hottest Big Data startups of 2014

From “Fast Data” to visualization software to tools used to track “Social Whales,” the Big Data 50 has it covered.

The 50 startups in the Big Data 50 are an impressive lot. In fact, the Big Data space in general is so hot that you might start worrying about it overheating – kind of like one of those mid-summer drives through the Mojave Desert. The signs warn you to turn off your AC for a reason.

Personally, I think we’re a long way away from any sort of Big Data bubble. Our economy is so used to trusting decision makers who “trust their gut” that we have much to learn before the typical business is even ready for data Kindergarten.

In fact, after a few decades following the “voodoo” of supply side economics, which fetishized the mysterious and elusive “rational consumer,” the strides we’re making towards being a more evidence-based economy still have us pretty much just playing catch up.

These 50 Big Data startups are working to change that.

Here’s a sampling of the startups in the Big Data 50 report. To access the full report, simply sign up for the Startup50 newsletter, and you’ll get the report delivered to your email inbox.

Big Data Startups Poised for Explosive Growth

These startups are relatively new to the scene, but have made big strides in a short amount of time. The more entrenched startups in this report should watch their backs, since these folks may be creeping up on them

Entrigna

What they do: Provide analytical tools that help users gain insights and develop predictions from streams of real-time, high-velocity data.

Headquarters: Chicago, IL

CEO: Murali Kashaboina. He previously served as Managing Director, Enterprise Architecture at United Airlines and played a key role in the merger integration of United and Continental Airlines.

Founded: 2012

Funding: The startup is currently financed by ~$1.1 million in client contracts and through the founding teams’ own investments. Entrigna says that it is in the process of raising a minimum of $5 million from angel investors and is about to close two more client contracts that will provide an additional $1.2 million.

Why they’re one of the 50: With the tremendous increase in computing power and decrease in memory and storage costs, today’s businesses are weighed down with a deluge of disconnected data, much of it highly relevant to effective decision-making. This data is associated with business applications, processes, transactions, operations, customers, customer insights, products, product insights, policies, systems, business-partnerships, competition – the list goes on and on. Since this data exists in many different formats, gaining a unified view of anything is difficult, if not impossible. Much of this data is also highly volatile and often contains time-sensitive business intelligence.

If detected real-time, such “in-the-moment” business intelligence can be used to dynamically determine an optimal course of action.

To tackle this problem, Entrigna developed a real-time decisions platform called RTES (Real Time Expert System). RTES enables real-time decisions by offering “decision frameworks” packaged together in one system. The system relies on a combination of machine learning, predictive analytics, business rules, complex event processing, optimization, and artificial intelligence in order to derive real-time actionable business decisions.

Essentially, the RTES platform exposes such decision frameworks as built-in modularized services that can be combined and applied on an organization’s business data on a real-time basis. Then, users can identify intelligent patterns that can lead to real-time business insights.

Customers include Jisu, Opargo, Pacific Gas & Electric, Sonata Software, and Decision Analytics International.

Competitive Landscape: Entrigna will compete against legacy data mining and BI tools, such as those from IBM and SAS. Most of the Big Data startups leaning on machine learning are focused on specific verticals, such as fraud detection for credit card companies. However, other general purpose, machine-learning based startups include Ayasdi, Feedzai, Skytree, and Sumo Logic.

Nuevora

What they do: Provide Big Data analytics applications.

Headquarters: San Ramon, CA

CEO: Phani Nagarjuna, who most recently served as EVP of Products and Business Development for OneCommand, which provides a SaaS-based CRM and Loyalty Automation Platform for the auto retail industry.

Founded: 2011

Funding: $3 million in early funding from Fortisure Ventures.

Why they’re one of the 50: Nuevora has set its sights on one of Big Data’s early growth areas: marketing and customer engagement. Nuevora’s nBAAP (Big Data Analytics & Apps) Platform features purpose-built analytics apps based on best-practices-driven predictive algorithms. nBAAP is based on three key Big Data technologies: Hadoop (data processing), R (predictive analytics), and Tableau (visualizations).

On top of all of this, Nuevora’s algorithms work on disparate sources of data (transactional, social media, mobile, campaigns) to quickly identify patterns and predictors in order to tie specific goals to individual marketing tactics.

The platform includes pre-built apps for the customer marketing business process – acquisition, retention, up-sell, cross-sell, profitability, and customer lifetime value (LTV). With only “last-mile” configurations required for individual customer situations, Nuevora’s apps empower organizations to anticipate their customers’ behaviors.

Nuevora gives end users the ability to continually recalibrate their predictions using a “closed-loop recalibration engine,” which helps organizations keep up with only the most pertinent insights based on the latest data.

Competitive Landscape: When Nuevora assesses the competitive landscape, it zeroes in on big consulting firms, such as Accenture, and other predictive analytics companies, such as Alpine Data Labs.

However, since pretty much every marketing platform under the sun now includes some sort of analytics engine, I also expect them to compete with the major marketing automation providers, such as ExactTarget (which uses Pentaho for its Big Data analytics).

Billion-Dollar Baby

Raise a billion dollars in funding, forge a tight partnership with Intel, and you get your own category.

Cloudera

What they do: Provide a Hadoop-based Big Data platform.

Headquarters: Palo Alto, CA

CEO: Tom Reilly. Prior to joining Cloudera in 2013, he was VP and GM of enterprise security at HP. He also served as CEO of enterprise security company ArcSight, which HP acquired in 2010

Founded: 2008

Funding: Cloudera has raised over $1 billion in venture capital to date. (Yep, that’s not a typo; that’s $1 billion with a “B.” I’m wondering what you are: “why dilute yourself that much,” but founder Mike Olson has given a pretty concise answer: a close relationship with Intel, which provided much of the new funding.) Other investors include Accel Partners, Google Ventures, Greylock Partners, Ignition Partners, In-Q-Tel, Meritech Capital Partners, and T. Rowe Price.

Why they’re one of the 50: Big Data is hot, and Cloudera pioneered the Hadoop-based Big Data space.

Cloudera lets users query all of their structured and unstructured data to gain a view beyond what’s available from relational databases. Cloudera recently released Impala, a new open-source interactive query engine for Hadoop that enables interactive querying on massive data sets in real time.

Moreover, they’re sitting on a giant pile of VC cash and have a top-notch management team. Cloudera is also the first major Big Data vendor to start investing heavily in a Big Data Achilles’ heel: security.

In June 2014, Cloudera acquired Gazzang, a startup specializing in encryption software for Big Data environments. And earlier this month, Cloudera entered into a partnership with home automation company Vivent to start targeting the Internet of Things (IoT) market.

Frankly, I thought long and hard about leaving Cloudera off this list – not because they don’t belong, but because they’ve been doing well enough for long enough that I’m not sure that the label “startup” really fits that well anymore.

However, they pretty much proved the business case for Hadoop, and they’re moving the space forward with the Gazzang acquisition, so, for the time being, anyway, I’d be foolish to exclude them.

Customers include Experian, FICO, National Cancer Institute, Nokia, Western Union, and Vivint (to see how Vivint is using Cloudera, check out my latest story in Datamation, “5 Big Data Apps with Effective Use Cases”).

Competitive Landscape: Cloudera clearly has first-mover advantage, but competitors include EMC, Pivotal, Hortonworks and MapR. One of their earlier competitors, Intel, which had its own home-grown distribution, dropped it and adopted Cloudera instead.

Hadoop Darlings

Hadoop is practically synonymous with Big Data these days. The startups in this category are battling it out for the Hadoop lead.

Continuuity

What they do: Provide a Hadoop-based Big Data application hosting platform.

Headquarters: Palo Alto, CA

CEO: Jonathan Gray, who was previously an HBase software engineer at Facebook.

Founded: 2011

Funding: $12.5 million from Battery Ventures, Ignition Partners, Andreessen Horowitz, Data Collective and Amplify Partners.

Why they’re one of the 50: Continuuity has come up with a clever way to get around the dearth of Hadoop experts: they offer an application developer platform targeted at Java developers. The lower-level infrastructure is all abstracted away by the Continuuity platform.

The startup’s flagship product, Reactor, is a Java-based integrated data and application framework that layers on top of Apache Hadoop, HBase, and other Hadoop ecosystem components. It surfaces capabilities of the infrastructure through simple Java and REST APIs, shielding end users from unnecessary complexity. Continuuity describes Reactor as a “Big Data Application Server for Hadoop.” It abstracts all the complexities of Hadoop and enables any developer to build Big Data applications.

Continuuity’s Loom service is a cluster management solution. Clusters created with Continuuity Loom utilize templates of any hardware and software stack, from simple standalone LAMP-stack servers and traditional application servers like JBoss to full Apache Hadoop clusters comprised of thousands of nodes. Clusters can be deployed across many cloud providers (Rackspace, Joyent, OpenStack) while utilizing common SCM tools (Chef and scripts).

In June, Continuuity entered into a partnership with AT&T Labs to develop and release into open source a new real-time data processing framework that will provide streaming analytics capabilities. Initially code-named jetStream, it will be made available to the market via open source in the third quarter of 2014.

Competitive Landscape: As of now, Continuuity is uniquely positioned. Indirect competitors come from the Hadoop-as-a-Service camp (AWS EMR, Altiscale, Infochimps, Mortar Data, etc.). One thing to keep an eye in is the CEO situation. Founding CEO Todd Papaioannou, who was previously VP and chief cloud architect at Yahoo!, left the company last year. Co-founder and previous CTO Jonathan Gray has taken over the CEO role. This is Gray’s first role as a business leader.

MapR Technologies

What they do: Provide a Hadoop distribution/NoSQL Big Data platform.

Headquarters: San Jose, CA

CEO: John Schroeder. He previously served as CEO of Calista Technologies, which was acquired by Microsoft. Before that, he was CEO of Rainfinity, which EMC purchased.

Founded: 2009

Funding: In June 2014, MapR Technologies raised $110 million in financing in a round led by Google Capital, with participation from Qualcomm Ventures and existing investors Lightspeed Venture Partners, Mayfield Fund, NEA, and Redpoint Ventures

Why they’re one of the 50: MapR argues that Hadoop suffers from an insufficient high availability design that results in downtime and an inability to protect against the application and user errors that lead to lost data. Hadoop’s distributed file system is designed to be “append only,” which forces interactive applications to spend excessive time writing new files and results in a 150M file cluster limit.

MapR was founded to address Hadoop’s limitations, transforming it into an enterprise-grade system that more organizations can actually use.

The new architecture is a high-performance data platform that supports full random read/write data access, real-time streaming, and can scale to 1 trillion files. MapR argues that this is a new distributed architecture that provides enterprise storage capabilities, such as snapshots and mirroring, but does not have the NameNode scaling drawbacks that plague clustered file systems. MapR also innovated at the shuffle layer to provide high-speed analytic processing.

Additionally, the MapR platform provides self-healing, automated failover, and an integrated in-Hadoop database for real-time capabilities.

Named customers include Ancestry.com, Rubicon, and comScore.

Competitive Landscape: Competitors include Cloudera, Pivotal, and Hortonworks.

Xplenty

What they do: Provide Hadoop-as-a-Service.

Headquarters: Tel Aviv, Israel

CEO: Yaniv Mor, who previously managed the NSW SQL Services practice at Red Rock Consulting.

Founded: 2012

Funding: Xplenty is backed by an undisclosed amount of seed funding from Magma Venture Capital, and it is currently in the midst of locking down a Series A round.

Why they’re one of the 50: While Hadoop is being hyped like crazy these days, not all the hype is empty. It has indeed become the de facto infrastructure technology for Big Data. The trouble is that the development, implementation, and maintenance of Hadoop require a very specialized skill set.

Xplenty technology provides Hadoop processing on the cloud via a coding-free design environment, so businesses can quickly and easily benefit from the opportunities offered by Big Data without having to invest in hardware, software, or highly specialized personnel.

According to Xplenty, competing services still target developers, whereas Xplenty targets the data and Business Intelligence (BI) users who do not know how to write code, but who need to move data to a Big Data platform.

A drag-and-drop interface eliminates the need to write complex scripts or code of any kind. With its automatic server configuration feature, users can simply point to a data source, configure the data transformation tasks, and tell the platform where to write the results to. Xplenty’s platform uses SQL terminology. Thus, for data analysts, the learning curve should be minimal.

Customers include Fiverr, WalkMe, Dealply Technologies, and Travel Global Systems.

Competitive Landscape: The main competition comes from Amazon’s Elastic MapReduce (EMR). Other Hadoop-as-a-Service competitors include Altiscale, Mortar Data, Qubole, and recently Microsoft with Hadoop on Azure. Rackspace is about to launch its own Hadoop-as-a-Service offering based on Hortonworks’ distribution.

To read the rest of the Big Data 50 report, sign up for the Startup50 newsletter to get the report sent to your inbox.