The CDP Operational Database (COD) builds on the foundation of existing operational database capabilities that were available with Apache HBase and/or Apache Phoenix in legacy CDH and HDP deployments.
Spark is known for being extremely difficult to debug. But this is not all Spark’s fault. Problems in running a Spark job can be the result of problems with the infrastructure Spark is running on, inappropriate configuration of Spark, Spark issues, the currently running Spark job, other Spark jobs running at the same time – or interactions among these layers.
When you build a data warehouse, the important question is how to ingest data from the source system to the data warehouse. If the table is small you can fully reload a table on a regular basis, however, if the table is large a common technique is to perform incremental table updates. This post demonstrates how you can enhance incremental pipeline performance when you ingest data into BigQuery.