How policies, processes, roles and technology come together to ensure data integrity, data quality and access control.
First, we collect data from an existing Kafka stream into an Iguazio time series table. Next, we visualize the stream with a Grafana dashboard; and finally, we access the data in a Jupyter notebook using Python code. We use a Nuclio serverless function to “listen” to a Kafka stream and then ingest its events into our time series table. Iguazio gets you started with a template for Kafka to time series.
At the DataOps Unleashed 2022 virtual conference, AWS Principal Solutions Architect Angelo Carvalho presented How AWS & Unravel help customers modernize their Big Data workloads with Amazon EMR. The full session recording is available on demand, but here are some of the highlights.
Without a central place to manage models, those responsible for operationalizing ML models have no way of knowing the overall status of trained models and data. This lack of manageability can impact the review and release process of models into production, which often requires offline reviews with many stakeholders.
At Mercado Libre, we are obsessed with unlocking the power and potential of data. One of our key cultural principles is to have a Beta Mindset. This means that we operate in a “state of beta”, constantly asking new questions of our data, experimenting with technologies and iterating our business operations in service of creating the best experiences for our customers.
At Talend, we tend to describe poorly organized, unhealthy data as “digital landfills.” But we don’t often talk about actual landfills. That’s right, the ones filled with trash. As anyone watching real estate prices will know, land is a finite resource. It’s crazy to think that we’re still dedicating land to storing our garbage, where it will sit releasing pollutants and greenhouse gases for decades to come.