- Dealing with Data Leakage - Oct 8, 2021.
Target leakage and data leakage represent challenging problems in machine learning. Be prepared to recognize and avoid these potentially messy problems.
Cross-validation, Data Science, Datasets, Machine Learning, Modeling, Training Data
- Budgeting For Your AI Training Data: Consider These 3 Factors - May 26, 2021.
Before you even plan to procure the data, one of the most important considerations in determining how much you should spend on your AI training data. In this article, we will give you insights to develop an effective budget for AI training data.
AI, Data Preparation, Training Data
- Continuous Training for Machine Learning – a Framework for a Successful Strategy - Apr 14, 2021.
A basic appreciation by anyone who builds machine learning models is that the model is not useful without useful data. This doesn't change after a model is deployed to production. Effectively monitoring and retraining models with updated data is key to maintaining valuable ML solutions, and can be accomplished with effective approaches to production-level continuous training that is guided by the data.
Machine Learning, MLOps, Model Performance, Production, Real-time, Training Data
- 5 Essential Papers on AI Training Data - Jun 4, 2020.
Data pre-processing is not only the largest time sink for most Data Scientists, but it is also the most crucial aspect of the work. Learn more about training data and data processing tasks from 5 leading academic papers.
AI, Data Preparation, Data Preprocessing, Research, Training Data
- Dataset Splitting Best Practices in Python - May 26, 2020.
If you are splitting your dataset into training and testing data you need to keep some things in mind. This discussion of 3 best practices to keep in mind when doing so includes demonstration of how to implement these particular considerations in Python.
Datasets, Python, scikit-learn, Training Data, Validation
- Achieving Accuracy with your Training Dataset - Mar 5, 2020.
How do we make sure our training data is more accurate than the rest? Partners like Supahands eliminate the headache that comes with labeling work by providing end-to-end managed labeling solutions, completed by a fully managed workforce that is trained to work on your model specifics.
Accuracy, Data Labeling, Data Preparation, Training Data
- Hand labeling is the past. The future is #NoLabel AI - Feb 19, 2020.
Data labeling is so hot right now… but could this rapidly emerging market face disruption from a small team at Stanford and the Snorkel open source project, which enables highly efficient programmatic labeling that is 10 to 1,000x as efficient as hand labeling?
AI, Data Labeling, Data Preparation, Training Data
- Why are Machine Learning Projects so Hard to Manage? - Feb 3, 2020.
What makes deploying a machine learning project so difficult? Is it the expectations? The people? The tech? There are common threads to these challenges, and best practices exist to deal with them.
Deployment, Kaggle, Lukas Biewald, Machine Learning, Project Fail, Training Data
- The Ultimate Guide to Model Retraining - Dec 16, 2019.
Once you have deployed your machine learning model into production, differences in real-world data will result in model drift. So, retraining and redeploying will likely be required. In other words, deployment should be treated as a continuous process. This guide defines model drift and how to identify it, and includes approaches to enable model training.
Deployment, Machine Learning, Model Drift, Model Performance, Monitoring, Production, Training Data
- Generalization in Neural Networks - Nov 18, 2019.
When training a neural network in deep learning, its performance on processing new data is key. Improving the model's ability to generalize relies on preventing overfitting using these important methods.
Complexity, Deep Learning, Dropout, Neural Networks, Overfitting, Regularization, Training Data
- 5 Fundamental AI Principles - Oct 3, 2019.
While AI may appear magical at times, these five principles will help guide you to avoid pitfalls when leveraging this tech.
AI, Data Cleaning, Deployment, Training Data
- 6 Tips for Building a Training Data Strategy for Machine Learning - Sep 2, 2019.
Without a well-defined approach for collecting and structuring training data, launching an AI initiative becomes an uphill battle. These six recommendations will help you craft a successful strategy.
Advice, Machine Learning, Training Data
- 6 Key Concepts in Andrew Ng’s “Machine Learning Yearning” - Aug 12, 2019.
If you are diving into AI and machine learning, Andrew Ng's book is a great place to start. Learn about six important concepts covered to better understand how to use these tools from one of the field's best practitioners and teachers.
AI, Andrew Ng, Best Practices, Deployment, Machine Learning, Metrics, Training Data
- Overview of Different Approaches to Deploying Machine Learning Models in Production - Jun 12, 2019.
Learn the different methods for putting machine learning models into production, and to determine which method is best for which use case.
Deployment, Jupyter, Machine Learning, Production, Training Data
- How the Lottery Ticket Hypothesis is Challenging Everything we Knew About Training Neural Networks - May 30, 2019.
The training of machine learning models is often compared to winning the lottery by buying every possible ticket. But if we know how winning the lottery looks like, couldn’t we be smarter about selecting the tickets?
Deep Learning, Lottery, Machine Learning, Neural Networks, Training Data
- What to do when your training and testing data come from different distributions - Jan 4, 2019.
However, sometimes only a limited amount of data from the target distribution can be collected. It may not be sufficient to build the needed train/dev/test sets. What to do in such a case? Let us discuss some ideas!
Distribution, Machine Learning, Training Data
- How (dis)similar are my train and test data? - Jun 7, 2018.
This articles examines a scenario where your machine learning model can fail.
Data Science, Datasets, Feature Selection, Machine Learning, Training Data
- How to Organize Data Labeling for Machine Learning: Approaches and Tools - May 16, 2018.
The main challenge for a data science team is to decide who will be responsible for labeling, estimate how much time it will take, and what tools are better to use.
Pages: 1 2
Altexsoft, Crowdsourcing, Data Labeling, Data Preparation, Image Recognition, Machine Learning, Training Data
- Learning Curves for Machine Learning - Jan 17, 2018.
But how do we diagnose bias and variance in the first place? And what actions should we take once we've detected something? In this post, we'll learn how to answer both these questions using learning curves.
Pages: 1 2
Bias, Machine Learning, Metrics, Training Data, Variance
- How (and Why) to Create a Good Validation Set - Nov 24, 2017.
The definitions of training, validation, and test sets can be fairly nuanced, and the terms are sometimes inconsistently used. In the deep learning community, “test-time inference” is often used to refer to evaluating on data in production, which is not the technical definition of a test set.
Cross-validation, Datasets, Rachel Thomas, Training Data, Validation
- How to squeeze the most from your training data - Jul 27, 2017.
In many cases, getting enough well-labelled training data is a huge hurdle for developing accurate prediction systems. Here is an innovative approach which uses SVM to get the most from training data.
Data Analysis, Data Preparation, Machine Learning, Support Vector Machines, SVM, Training Data
- 7 Ways to Get High-Quality Labeled Training Data at Low Cost - Jun 13, 2017.
Having labeled training data is needed for machine learning, but getting such data is not simple or cheap. We review 7 approaches including repurposing, harvesting free sources, retrain models on progressively higher quality data, and more.
Crowdsourcing, Data Preparation, Gamification, Machine Learning, Training Data
- Do We Need More Training Data or More Complex Models? - Mar 23, 2015.
Do we need more training data? Which models will suffer from performance saturation as data grows large? Do we need larger models or more complicated models, and what is the difference?
Big Data, convnet, Generalized Linear Models, K-nearest neighbors, Training Data, Zachary Lipton