Systems | Development | Analytics | API | Testing

How to run queries periodically in Apache Hive

In the lifecycle of a data warehouse in production, there are a variety of tasks that need to be executed on a recurring basis. To name a few concrete examples, scheduled tasks can be related to data ingestion (inserting data from a stream into a transactional table every 10 minutes), query performance (refreshing a materialized view used for BI reporting every hour), or warehouse maintenance (executing replication from one cluster to another on a daily basis).

New Connector: YouTube Analytics

The value of YouTube has grown significantly for companies looking to bolster their brands with video content. The YouTube API is report-based, and its prebuilt reports fall into one of two categories: channel reporting and content owner reporting. Channel reports refer to the videos on a specific YouTube channel, while content owner reports contain data on all the channels owned by a particular individual.

Introducing FlinkSQL in Cloudera Streaming Analytics

Our 1.2.0.0 release of Cloudera Streaming Analytics Powered by Apache Flink brings a wide range of new functionality, including support for lineage and metadata tracking via Apache Atlas, support for connecting to Apache Kudu and the first iteration of the much-awaited FlinkSQL API. Flink’s SQL interface democratizes stream processing, as it caters to a much larger community than the currently widely used Java and Scala APIs focusing on the Data Engineering crowd.

A Message To You Kafka - The Advantages of Real-time Data Streaming

In these uncertain times of the COVID-19 crisis, one thing is certain – data is key to decision making, now more than ever. And, the need for speed in getting access to data as it changes has only accelerated. It’s no wonder, then, that organisations are looking to technologies that help solve the problem of streaming data continuously, so they can run their businesses in real-time.

Managing ML Projects - Allegro Trains vs GitHub

The resurrection of AI due to the drastic increase in computing power has allowed its loyal enthusiasts, casual spectators, and experts alike to experiment with ideas that were pure fantasies a mere two decades ago. The biggest benefactor of this explosion in computing power and ungodly amounts of datasets (thank you, internet!) is none other than deep learning, the sub-field of machine learning(ML) tasked with extracting underlining features, patterns, and identifying cat images.

Welcome and Introduction to DataOps.NEXT

DataOps matters, especially in today’s uncertain times. Data management and analytics are crucial to respond faster and drive results for your business, your customers and society. That’s why we built DataOps.NEXT to help you get from now to what’s next, with data. We’ll bring out Dr. Jennifer Hall, the chief of data science for American Heart Association (AHA) to discuss how Hitachi Vantara and AHA have worked together to support research for COVID-19. Tune in for Pedro Alves, Hitachi Vantara’s head of product design and designated “Community Guy.” He’ll provide our vision and strategy for DataOps, including an update on Pentaho Open Source and Enterprise Edition