Data-Centric AI: Is it Real? For Everyone? Are We Ready?

Check out this deep dive into Data-Centric AI.



Data-Centric AI: Is it Real? For Everyone? Are We Ready?
Image by Author (created on Canva)

 

TL;DR;

  • The rise of modern deep learning algorithms has forced the world to melt in its embrace.
  • Synergising Data Science and AI are practically coming into practice among industry giants.
  • Data-centric AI is becoming the next big thing, and we should gear up for it!

Earlier this year, the growing AI community began pondering a lot about the possibilities of shifting from a model-centric approach to data-centric AI development. To keep pace with the momentum around data-centric approaches to AI projects, we have extracted insightful information from industry leaders and think tanks from across the globe.

 

The Data-Centric AI Concept

 
Over the last several decades, the dominant paradigm for AI development was a model or a software-centric approach. For building a machine learning system, you need to, both, write code to implement them to your algorithms and model, and take the code and train on data to extract meaningful insights. For the last several decades, most of us downloaded the data set, held the data set as fixed, and then modified the software code to understand the data.

Nonetheless, the last few years observed tremendous progress in neural networks and other algorithms (thanks to this paradigm of machine learning research). And because of that, the open-source codes on many applications are becoming a supportive collaborator for some and non-supportive for others. For instance, GitHub is an excellent example of open-source software, proven handy for some, but not for others. You may be wondering why is that so? Well, the application of the system comes into play here!

 

Is the Data-Centric Approach for Everyone?

 
It’s not more fruitful for you and for all of us to consider a data-centric approach even if we can even hold the code fix. Instead, we should manoeuvre our focus on generating or creating the right data to feed to the learning algorithm. Landing AI, founded by the celebrated computer scientist and technology entrepreneur Andrew Ng has been working on a data-centric MLOps platform for computer vision for a couple of years. Ng is a strong believer in the data-centric concept and a deep learning enthusiast. The concepts backed by Ng have garnered significant attraction in the technical community. The concept of synergising Data Science and AI is practically coming into practice among industry giants.

 

Is Data-Centric AI for Real?

 
Predictive analysis from data-centric AI has made it the thing of the future! The community support has been tremendous, and early adopters are rising. Even more, supporters of the concept have started encouraging others to jump in and adapt the concept in practice.

Whenever AI adopts, or comes up with a new technology approach; usually, it is a handful of experts carrying out the practice intuitively. 

For example, take the rise of deep learning. For a long time, there were a handful of people coding up neural networks in C++ in a very raw and manual way. Eventually, the ideas of neural networks became more widespread and many began coding up neural networks in C++. Somewhere between 2015/2016, deep learning frameworks like TensorFlow and PyTorch were introduced, making the application of these ideas more systematic and less error-pro. In the case of data-centric AI, there have been a bunch of people doing it intuitively for years. 

The widespread acceptance and support for the concept are expected to rise and dominate the machine learning ecosystem. Tech geeks are now intuitively able to engineer speech and NLP data. Data-centric AI is becoming the next big thing, and we should gear up for it!

Something considered a wild goose chase is becoming the real thing now!

 

Is the World Ready for the Data-Centric Approach?

 
The rise of modern deep learning algorithms has forced the world to melt in the embrace of this approach. For experts, sophisticated ways of understanding algorithms made it easier to focus on data.
Now, the code is more mature and expected to evolve rapidly.

As the concept’s rapid evolution becomes imminent, practical implementation of the concepts becomes crucial to business survival. The time is close when we will be enjoying hands-on experience using mature tools making the application of these ideas much more repeatable, smooth, and systematic.

Data-Centric AI: Is it Real? For Everyone? Are We Ready?
Machine Learning, AI, and Humans by Freepik.com

 

How to Get Ready for the Data-Centric Approach

 
It is wise to start preparing for the data-centric approach today! Ng shares the top five tips for data-centric AI development poised to help the technical community deal with the onslaught of technological advancements:

  • Label consistency
  • Noisy label impact
  • Spot inconsistencies
  • Clarify label instructions
  • Structured error analysis

Sometimes, the thought of working on the data, or to simply put data cleaning, is considered a pre-processing step. You may have heard people say things like that! But with a data-centric AI approach, improving the data is not a pre-processing step that you do once. It is a real repetitive work of algorithm training and learning. 

Data improvement is a core part of the iterative process of model development. Experts believe it is best practice to monitor and maintain the model post-multi-development and deployment. Continuing to improve the data systematically is also a core part of the deployment, monitoring, and maintenance.

But with a data-centric AI approach, improving the data is not a pre-processing step that you do once. It is a real repetitive work of algorithm training and learning. Data improvement is a core part of the iterative process of model development.

Training is just a smaller portion of the machine learning model lifecycle. Acquiring and preparing quality data covers 80% of the ML process; whereas, the training part takes the remaining 20%. But that does not mean the training part is any less insignificant than the latter.

In academic research and industrial applications, the training aspect of machine learning holds significant importance. Surveys and research suggest how labelling and data annotation is important for accurate and innovative AI, as well as getting complementary or external data, or even generate new one — synthetic data.

 

Where Are We Headed?

 
The top names in machine learning are steering our attention towards the significance of systematic data working. In the coming years, the world will witness a paradigm shift in the machine learning ecosystem. And we are sure about one thing; the 2010s were all about model improvement; the 2020s will all be about data.

 

How Do I Get Started?

 
Data-centric AI is pioneering the future development in AI to an extent where limited data sets can realise the operational and business value of integrating AI from concept to production. We believe it is the real thing; we believe it’s time we begin learning and adopting the concept by collaborating with the data-centric AI community, resources, industry leaders, think tanks and thought leaders.

 
 
Gonçalo Martins Ribeiro is a product builder and passionate about startups and technology. Has a computer science background but later on developed himself in the management field. Built products in startups and enterprise, led development teams, and now is creating new standards for data science, by helping companies become data-centric and adopt AI that works.