Big Data Companies

This section of CTOvision, designed and optimized for CTOvision Pro members, provides a fast overview of Big Data companies we believe are poised to cause the most positive disruption in enterprise. Remember CTOvision Pro members can always use our Ask The CTO feature to task us to refine and assess these firms further.

Find our other company assessments at: Cloud Computing Companies | Artificial Intelligence Companies | Mobility Companies | Robotics Companies | Internet of Things Companies | Cybersecurity Companies | Infrastructure and Comms Companies

Below we dive deeper into the real technologies, tools and companies moving the world of Big Data forward.

More information is available on each of these firms, click the title for more.

  • Actifio: Decoupling Data From Infrastructure For Radically Simple Copy Data Virtualization

    For years we have had Actifio on our list of the most disruptive IT firms to watch in enterprise IT. They simplify and improve the way enterprises store, retrieve and manage data and in doing so improve things immensely. They have proven/low-risk approaches that provide more agility, provide paths to modernize the entirity of architectures, […]

    Aerospike: Real-time NoSQL database designed for multi-core, multi-processor and SSD

    Aerospike is the only real-time NoSQL database built from the ground up to take advantage of today’s multi-core, multi-processor architectures and solid-state drives (SSDs). That’s what makes it fast – 500k TPS per node with sub-millisecond latency. Hybrid architecture with indexes in-memory and data either in-memory or on flash. Srong consistency, tightly coupled clusters (auto […]

    Alluxio: A virtual distributed storage system orders of magnitude faster than others

    The world’s first memory-centric distributed storage system bridges applications and underlying storage systems providing unified data access orders of magnitudes faster than existing solutions. Alluxio is a memory speed virtual distributed storage system. Open source software is critical to the modern enterprise software landscape. Alluxio is open source under the Apache license, and we are […]

    Ancelus: Next Generation Database and Analytics

    The Ancelus Database and Analytics solution is an emerging technology that addresses the never ending problem of information overload. As sensors produce more data and we continue to consolidate data driven systems, we are quickly approaching serious problems with the traditional database approach that we see with current products. We continue to increase our storage […]

    Basho: The Makers of Riak

    Basho Technologies, Inc. was founded in January 2008 by a core group of Software Architects, Engineers and Executives from Akamai Technologies, Inc. (NASDAQ:AKAM). Basho is the developer of Riak, the leading distributed database that delivers elastic scalability, high performance, high reliability, and high availability with a reduced total cost of operations (TCO). Applications built using […]

    Basis Technology: Extracting Meaningful Intelligence from Multilingual Text

    Basis Technology provides software solutions for extracting meaningful intelligence from unstructured text in Asian, European and Middle Eastern languages. They help technology companies and government organizations improve the accuracy of information retrieval, text mining and other applications through advanced linguistics. Their Rosette Linguistics Platform uses state of the art Natural Language Processing techniques to improve […]

    Cloudant: Multi-petabyte data sets now analyzable in the cloud

    Cloudant was acquired by IBM in March 2014. Cloudant was founded in Cambridge, Massachusetts in 2008 by three MIT physicists who at the time were moving multi-petabyte data sets around from the Large Hadron Collider. Frustrated by the available tools for managing and analyzing big data in their research, the founders built a distributed, fault-tolerant, […]

    Cloudera: Extract Benefit From All Your Data

    Cloudera offers enterprises a powerful new data platform built on the popular Apache Hadoop open-source software package. Cloudera enhances the storage and processing technologies originally developed by the world’s biggest Web companies, allowing a growing list of global customers to use Hadoop to solve problems and achieve their particular business goals. For more on Cloudera […]

    Cognika: Automated Monitoring and Prediction

    Cognika Intelligence & Defense Solution (CIDS) was created with a mission to bring leading edge Analytic tools to the Military market. CIDS breakthrough technology combines the best of Artificial Intelligence (AI) techniques with unique algorithms that emulate human cognition and serve as the foundation for the Smart Analytics engine on which Cognika ESP™ was created […]

    CohoData: freeing enterprise data from today’s monolithic, low-performance storage architectures

    Led by a team of XenSource/Citrix virtualization and storage industry veterans, Coho Data is a stealth mode startup with the mission of freeing enterprise data from today’s monolithic, low-performance storage architectures. Inspired by the highly scalable, commodity-hardware based approaches of public clouds, the company is developing the first high performance flash-tuned scale-out storage architecture that delivers […]

    Couchbase: The NoSQL Database Leader

    Couchbase is the NoSQL database market share leader, with production deployments at AOL,Deutsche Post, NTT Docomo, Salesforce.com, Starbucks, Turner Broadcasting Systems, Vimeo, Zynga, and hundreds of other global enterprises. Couchbase Server, our NoSQL database offering, delivers a more scalable, high-performance and cost-effective approach to data management than relational database technology. It is particularly well suited […]

    Data Direct Networks: Massively Scalable Storage Made Simple

    DDN provides the backbone for the world’s most content-rich organizations whose requirements rival the limitless ambition that evolve the digital world. DDN is recognized by Gartner for its leadership in object storage and as a leader in scalable storage infrastructure. For an overview see the video at this link and embedded below: DataDirect Networks (DDN) […]

    DataBricks: SaaS data platform known for enterprise grade Spark

    Databricks provides an enterprise-ready SaaS data platform. Databricks is widely known for their work with Spark. Spin up and scale out clusters to hundreds of nodes and beyond with just a few clicks, without IT or DevOps. Easily harness the power of Spark for streaming, machine learning, graph processing, and more. For an overview see the […]

    DataStax: Solutions for Apache Cassandra

    DataStax delivers Apache Cassandra to the enterprise and powers online applications for 400+ customers and more than 20 of the Fortune 100. DataStax Enterprise is tailor made to manage Big Data. DataStax Enterprise’s built-for-scale architecture – based on Apache Cassandra – enables it to handle huge volumes of all types of data in a rapid […]

    Denodo: an Enterprise Data Services Platform based on Data Virtualization and Data Federation technologies

    Denodo Technologies, Inc. has redefined data integration to make the delivery of data to the corporate business applications simple. The Denodo Data Services Platform is an enterprise Data Virtualization, Data Federation and Cloud Data Integration middleware that uses a declarative approach to abstract, unify, federate and understand disparate data sources and systems, supporting multiple acquisition […]

    Digital Reasoning: Automated Understanding for Big Data

    Synthesys is the flagship product from Digital Reasoning that delivers Automated Understanding for Big Data. For an overview see the video at this link and embedded below: Enterprise and Government customers are awash with too much data. This data has three demanding characteristics – it is too big (volume), it is accumulating too fast (velocity) […]

    Domino Data Lab: Run, scale, track, and deploy your models in the cloud or on-premise

    Domino is a workbench that accelerates the entire analytical lifecycle, from early exploratory work all the way to deploying your models, allowing you to track and share your work along the way. It works alongside the tools and languages you already use, including R, Python, Julia and more. A workbench for enterprise data science, facilitating […]

    EverLaw: Another Useful Artificial Intelligence Capability

    Our list of Truly Useful Artificial Intelligence Tools You Can Use Today was out of date the minute we published it. We knew that would happen and are absolutely thrilled when we discover new capabilities that belong on this list.  One we just learned about is EverLaw, provider of perhaps the world’s most advanced litigation platform, […]

    Factual: Global Data, Local Context

    Factual’s location platform enriches mobile location signals with definitive global data, enabling personalized and contextually relevant mobile experiences. Built from billions of inputs, the data is constantly updated by Factual’s real-time data stack. Meet founder and CEO of Factual Gil Elbaz in this video:  

    FMS: Data Visualization, Link Analysis, and Social Network Analysis

    The FMS Advanced Systems Group is a division of FMS, Inc. an award-winning small minority-owned business with over 25 years of experience delivering state-of-the-art solutions to a wide range of customers. For an overview of their solution see the video at this link and embedded below:   With tens of thousands of customers worldwide, FMS […]

    geoIQ: Changing the Way Organizations Visualize and Analyze Real-Time Data

    geoIQ is now part of esri. In 2005, devastating worldwide events such as the London bombings and Hurricane Katrina proved that legacy data analysis tools and techniques which used dated, static location information were no longer an effective means for data sharing, risk mitigation or real-time analysis. To answer this need, we created GeoIQ, the […]

    Geosemble Technologies: Content Search and Discovery

    Geosemble is now part of TerragoTech Founded in December of 2004, Geosemble Technologies is a spin-off from the University of Southern California (USC). Geosemble’s founders were faculty members in Computer Science when they developed the company’s core Artificial Intelligence (AI) algorithms. Since that time, the technology has been strengthened and refined to apply to a […]

    Hortonworks: An Apache Hadoop stack with management tools

    Hortonworks Data Platform Hortonworks Data Platform (HDP) is a  open source Apache Hadoop distribution.   It is ideal for organizations that want to combine the power and cost-effectiveness of Apache Hadoop with the advanced services and reliability required for enterprise deployments. Hortonworks Data Platform (HDP) is the only 100% open source data management platform based on Apache […]

    Immuta: Managing the chaos of big data systems

    Editor’s note: The highly respected venture capital firms Blu Venture, Sequoia, and Conversion Capital have announced their support and funding of Immuta, a next-gen enterprise data management startup. We have had the pleasure of working with Immuta and have known their founding team for years and are very excited to see this. -bg From Sequoia’s 9 […]

    Informatica: the Data Integration Company, Provides Innovation Towards a More Data-Centric World

    Informatica provides data integration software. Their products help with data integration, replication, virtualization, masking, and quality. Their flagship product is Vibe, a virtual data machine which is an “embeddable data management engine that powers Informatica’s Intelligent Data Platform and can access, aggregate, and manage any type of data.” Watch this quick introduction on what Vibe […]

    Karmasphere: Powering full fidelity analytics on Hadoop

    Karmasphere powers full-fidelity analytics on Hadoop with the most streamlined, open and enterprise-ready approach to Big Data analytics on the market today. The Karmasphere Workspace for Big Data Analytics is uniquely designed to natively extract value from Big Data without the need for abstraction or replication, which significantly reduces total cost of ownership and complexity. […]

    KeyLines – HTML5 graph visualization toolkit

    KeyLines is a commercial JavaScript toolkit for visualizing networks. It works in all major browsers, and on all platforms, including the iPad. It uses HTML5 but also works on old versions of Internet Explorer. It could be used for tackling crime or terrorism, tracking patterns in customer behaviour or following the spread of news on […]

    LucidWorks: Trusted Lucene/Solr Solutions and Support

    LucidWorks, the trusted name in Search, Discovery and Analytics, transforms the way people access information to enable data-driven decisions. Leveraging both structured and unstructured data built on the power of Apache Lucene/Solr open source search, LucidWorks delivers unmatched stability, scalability, and time-to-delivery for search applications. LucidWorks Search provides ease of use development to access up […]

    MapR: Seeking To Improve Apache Hadoop

    MapR develops and sells Apache Hadoop derived software. They contribute to many Apache Hadoop related projects including HBase, Pig, Hive and ZooKeeper. From Wikipedia: MapR entered a technology licensing agreement withEMC Corporation on 25 May 2011, supporting an EMC-specific distribution ofApache Hadoop.[2]. In addition, MapR was selected by Amazon to provide an upgraded version of Amazon’s […]

    MemSQL: Incredible (fastest in world) performance

    MemSQL, The Real-Time Analytics Platform For a great overview see the video embedded below and at this link: MemSQL’s real-time analytics platform is built on the world’s fastest, most scalable in-memory database, capable of simultaneously handling real-time transactions and analytic workloads. MemSQL unleashes the full potential of Big Data by consuming and returning data instantly. […]

    MetaScale: Hadoop and NoSQL Platform Services

    We met MetaScale last year and have written about them on our CTOvision.com site, and have had the pleasure of interviewing their senior leadership. We believe they are worth tracking very closely because of their proven past performance. When they say they can do something it is because they have done it and that is […]

    MongoDB (10gen): Offers production support, training, and consulting for MongoDB

    MongoDB (once named 10gen) provides a comprehensive range of services to enable you to get the most out of commercial-grade deployments of MongoDB. They develop MongoDB, and offers production support, training, and consulting for the open source database. MongoDB was founded by former DoubleClick Founder and CTO Dwight Merriman and former DoubleClick engineer and ShopWiki […]

    Nexenta: Enterprise Class Storage for Everyone

    Nexenta Systems is the world’s leading provider of enterprise-class OpenStorage solutions. The company’s flagship software-only platform, NexentaStor, delivers high-performance, ultra-scalable, cloud- and virtualization-optimized storage management. Privately held, Nexenta is headquartered in Mountain View, California. The core storage platform, NexentaStor, is based on OpenSolaris / Open Storage ZFS technology. Nexenta products, built on top of NexentaStor […]

    Object Video: The Leader in Intelligent Video

    Reston VA based ObjectVideo is the world’s leading provider of intelligent video software for security, public safety, business intelligence, process improvement and other applications. For a quick video overview see this link or the embed below: Founded in 1998 by world-renowned scientists and program managers from the Defense Advanced Research Projects Agency (DARPA) and headquartered […]

    OpenGov: On Track To Lead Market In Financial Transparency and BI for Governments

    With this post we are initiating coverage of OpenGov, the award-winning, web-based platform for governments that enables governments and citizens to easily access, explore, and share finance and budget information. There are many other players in this field, but we believe OpenGov now has what it takes to be the market leader and strongly believe […]

    Oracle Corporation: Addressing enterprise IT infrastructure

    Oracle (NYSE:ORCL) is one of the most widely known Tech Titans, and at a $169B market cap has plenty on hand to continue innovating and acquiring new capabilities. They will be around. Here is how they describe themselves: Oracle Corporation (Oracle) provides products and services that address all aspects of corporate information technology (IT) environments, including […]

    Paxata: Adaptive Data Preparation

    Paxata developed the first Adaptive Data Preparation™ platform built for the business analyst. The company’s technology dramatically reduces the most painful and manual steps of any analytic exercise, turning raw data into ready data for analytics, and empowering analysts to drive greater value for the business. With seamless connections to BI tools like Tableau, QlikView, […]

    Planet OS: A platform for real-world sensor data integration

    Planet OS is a platform for real-world sensor data integration for ocean, land, air and space sensors. Planet OS has developed a powerful suite specifically focused on indexing sensor and machine data that combines data mining, integration, search and discovery, and data exchange. The data types can be virtually anything from overhead imagery, to sonar […]

    Rescale: Cloud Engineering Simulation Platform

    Editor’s note: If you do research or analysis in a company that is looking for competitive advantage or an academic institution looking to optimize research efforts this capability could be critical to your efforts. It is one of the most virtuous applications of cloud computing I have evaluated, with great potential to improve how models […]

    SAP NS2: Supporting US national security and critical infrastructure customers

    SAP National Security Services (NS2) is a US-based provider of software, services and support for US national security and critical infrastructure customers. They focus on the missions of national security, leveraging technology that is optimized to deliver precision outcomes at mission speeds. Products include SAP HANA, an innovative in-memory platform that runs analytics applications smarter, […]

    SAS: Giving us the power to know since 1976

    With this post CTOvision is initiating coverage of SAS. Widely know in enterprise technology circles for their advanced analytics, business intelligence and enterprise data management capabilities, SAS is continuing to invest in innovation, partnering and mission focused application development. SAS is highly regarded for their mission focused applications in domains like: Fraud and Security Intelligence Big Data Solutions […]

    Scality: Object Storage With Full Scale-Out File System Support

    With this post we are initiating coverage of Scality, a firm we are hearing quite a bit of buzz about lately. We first heard of Scality in news of their VC funding rounds in early 2011, and soon thereafter began to get questions from government infrastructure professionals asking how Scality compares to Cleversafe. Cleversafe has […]

    SearchBlox: Leading provider of enterprise search solutions based on Apache Lucene

    SearchBlox is a leading provider of enterprise search solutions based on Apache Lucene. Over 300 customers in 30 countries use SearchBlox to power their website, intranet and custom search. SearchBlox Software, Inc. was founded in 2003 with the aim to develop commercial search products based on Apache Lucene. SearchBlox provides web based administration of your search […]

    Semantic Research: Providing netcentric intelligence for good guys

    For an overview of Semantic Research we recommend starting with a look at the video embedded below and at this link: From their website: Based on the early, ground-breaking work on knowledge representation for education by our founders, Semantic Research has been redefining the way users visualize, interact with, and understand data and information for […]

    Signal Innovations Group: Innovative Technology to Help Interpret Complex Data

    Dr. Paul Runkle and Professor Larry Carin founded SIG in 2004 to continue and extend applied defense research that originated in an academic setting at Duke University. SIG’s initial work focused on radar signal processing for combat target identification, sonar-based automatic target recognition (ATR), and airborne minefield detection. SIG’s applica tion areas quickly grew to […]

    Sitscape: Award-Winning Situational Software

    SitScape has generated incredible excitement in the analytical community in government and Fortune 100 companies. This post will provide insight on why. SitScape is an enterprise software company headquartered in Tyson’s Corner, Northern Virginia, right outside of Washington D.C. SitScape was founded with a single vision to empower business and mission users to visually access […]

    Tamr: Connect and enrich all your data for analytics and decision making

    With this post we are initiating coverage of Tamr. Tamr was founded in 2013 by a distinguished cadre of database industry veterans including Andy Palmer and Turing award winner Mike Stonebreaker. Tamr enables enterprises to make use of 100% of available data by unifying and enriching data holdings. Tamr’s data unification platform catalogues, connects and curates […]

    VoltDB for faster, scalable relational databases

    VoltDB is a blazingly fast relational database system. It is specifically designed for modern software applications that are pushed beyond their limits by high velocity data sources. This new generation of systems – real-time feeds, machine-generated data, micro-transactions, high performance content serving – requires database throughput that can reach millions of operations per second. What’s […]

    Xplenty: Offering Coding-free Hadoop to All

    Xplenty puts Big Data within reach for companies of all sizes. The innovative platform is an easy-to-use cloud service that takes the complexity out of Apache Hadoop, so you can get started right away. For more see the video at this link and embedded below:   Even for companies that don’t have really “big” data […]