Systems | Development | Analytics | API | Testing

ClearML-Data Lemonade: getting local datasets quickly and easily

Congratulations on creating a clean(ish) dataset to use for training! Now while the dataset is stored where it’s accessible to everyone, the distribution itself is a hassle! Local workstations, local GPU machines, and cloud machines (that may be spun up and down without disk persistence) are getting data everywhere. …and to say it is annoying is an understatement!

Data management is ALL THE RAGE!

Everyone wants to manage their data, and if it’s a feature store, even better! But for optimal data management, we must first discuss lightweight zero upfront setup costs and maximizing utility with ClearML-data. ClearML-data mimics the light weightiness of git for data (who doesn’t know git?) and gives it a spin. It is an open-source dataset management tool which is extremely efficient and conveys how we view DataOps and its distinction from git-like solutions, including.

DataStore vs FeatureStore

I think it’s safe to say that one of the worst things in Machine Learning is the terminology. The maths and statistics are definitely part of the learning curve, but more than that, it feels like you are learning a new language. In some ways, you are. DataStore and FeatureStore are two of the current buzzwords that people are trying to understand. To be fair, DataStore and FeatureStore feel like family rather than strangers.

ClearML hits 1.0

May 3rd 2021 – With over 11 man-years of working, and tinkering, long into the night, I am pleased to announce we have hit version 1.0. Following quickly after the release of ClearML 0.17.5, we added the last remaining features we felt 1.0 needed. Namely multi-model support, as well as improved batch operations. With these in place, the choice was clear. The next version released should be the baseline moving forward.

Construction feat. TF2 Object Detection API

Although the title might sound like a collaboration of two music bands with really bad names, this blog is all about understanding how computer vision and machine learning can be used to improve safety and security in a harsh and dangerous environment of a construction site. The construction industry is one of the most dangerous industries according to the common stats from OSHA.

Stacking up against the Competition

One of the most leading questions we often receive is, “How does ClearML Compare to..”. I am sure this is the same for any Open Source product. People always want to find the best. The sad truth is, of course, there usually is no “right answer”. What one person needs, another may not. I am sure that, whichever language you speak natively, there is some saying. In English it would be “one mans rubbish, is another mans gold”.

Good Testing Data is All You Need - Guest Post

Building machine learning (ML) and deep learning (DL) models obviously require plenty of data as a training-set and a test-set on which the model is tested against and evaluated. Best practices related to the setup of train-sets and test-sets have evolved in academic circles, however, within the context of applied data science, organizations need to take into consideration a very different set of requirements and goals. Ultimately, any model that a company builds aims to address a business problem.