Data is everywhere. As the sheer volume and number of data sources continue to explode, so do new opportunities for modern businesses to create and act on insights. That is if they are equipped with the right analytics technology. Historically, many businesses have settled for “good enough” analytics tools, putting up with lackluster bundles from full-stack vendors in an attempt to minimize cost or risk.
Since the release of Cloudera Data Engineering (CDE) more than a year ago, our number one goal was operationalizing Spark pipelines at scale with first class tooling designed to streamline automation and observability. In working with thousands of customers deploying Spark applications, we saw significant challenges with managing Spark as well as automating, delivering, and optimizing secure data pipelines.
“Water, water, everywhere, nor any drop to drink.” The famous line from Samuel Taylor Coleridge’s epic poem “The Rime of the Ancient Mariner” has a fitting application to today’s data problem. Enterprises are deluged with data, but they often have no way to leverage it. According to most experts, only a small percentage of data is usable and made useful, and most of it is in the dark — thus the term, “dark data.”