- Four Basic Steps in Data Preparation - Oct 26, 2021.
What we would like to do here is introduce four very basic and very general steps in data preparation for machine learning algorithms. We will describe how and why to apply such transformations within a specific example.
Data Preparation, Data Preprocessing, Data Science, Missing Values, Normalization, Sampling
- 10 Must-Know Statistical Concepts for Data Scientists - Apr 21, 2021.
Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.
Bayes Theorem, Correlation, Normal Distribution, P-value, Sampling, Statistics, Variance
- 10 Statistical Concepts You Should Know For Data Science Interviews - Feb 23, 2021.
Data Science is founded on time-honored concepts from statistics and probability theory. Having a strong understanding of the ten ideas and techniques highlighted here is key to your career in the field, and also a favorite topic for concept checks during interviews.
Bayes Theorem, Interview Questions, Linear Regression, Logistic Regression, P-value, Sampling, Statistics
- Adversarial generation of extreme samples - Feb 2, 2021.
In order to mitigate risks when modelling extreme events, it is vital to be able to generate a wide range of extreme, and realistic, scenarios. Researchers from the National University of Singapore and IIT Bombay have developed an approach to do just that.
AI, GANs, Generative Adversarial Network, Generative Models, Sampling
- Resampling Imbalanced Data and Its Limits - Dec 22, 2020.
Can resampling tackle the problem of too few fraudulent transactions in credit card fraud detection?
Balancing Classes, Bootstrap sampling, Fraud Detection, Knime, Sampling, Unbalanced
- Undersampling Will Change the Base Rates of Your Model’s Predictions - Dec 17, 2020.
In classification problems, the proportion of cases in each class largely determines the base rate of the predictions produced by the model. Therefore if you use sampling techniques that change this proportion, there is a good chance you will want to rescale / calibrate your predictions before using them in the wild.
Classification, Modeling, Predictions, R, Sampling
- The 5 Most Useful Techniques to Handle Imbalanced Datasets - Jan 22, 2020.
This post is about explaining the various techniques you can use to handle imbalanced datasets.
Balancing Classes, Datasets, Metrics, Python, Sampling, Unbalanced
- The 5 Sampling Algorithms every Data Scientist need to know - Sep 18, 2019.
Algorithms are at the core of data science and sampling is a critical technical that can make or break a project. Learn more about the most common sampling techniques used, so you can select the best approach while working with your data.
Algorithms, Sampling
- A Gentle Introduction to Noise Contrastive Estimation - Jul 25, 2019.
Find out how to use randomness to learn your data by using Noise Contrastive Estimation with this guide that works through the particulars of its implementation.
Deep Learning, Logistic Regression, Neural Networks, Noise, Random, Sampling, word2vec
- 4 Myths of Big Data and 4 Ways to Improve with Deep Data - Jan 9, 2019.
There is a fundamental misconception that bigger data produces better machine learning results. However bigger data lakes / warehouses won’t necessarily help to discover more profound insights. It is better to focus on data quality, value and diversity not just size. "Deep Data" is better than Big Data.
Big Data, Data Lakes, Data Warehouse, Hype, Machine Learning, Sampling
- Iterative Initial Centroid Search via Sampling for k-Means Clustering - Sep 12, 2018.
Thinking about ways to find a better set of initial centroid positions is a valid approach to optimizing the k-means clustering process. This post outlines just such an approach.
Clustering, K-means, Python, Sampling, scikit-learn
- Scalable Select of Random Rows in SQL - Apr 5, 2018.
Performance boosts are achieved by selecting random rows or the sampling technique. Let’s learn how to select random rows in SQL.
Sampling, SQL, Statsbot
- Sampling: A Primer - Aug 8, 2017.
Though it doesn’t get a lot of buzz, sampling is fundamental to any field of science. Marketing scientist Kevin Gray asks Dr. Stas Kolenikov, Senior Scientist at Abt Associates, what marketing researchers and data scientists most need to know about it.
Marketing, Sampling
- Learning from Imbalanced Classes - Aug 31, 2016.
Imbalanced classes can cause trouble for classification. Not all hope is lost, however. Check out this article for methods in which to deal with such a situation.
Pages: 1 2
Balancing Classes, Bayesian, Learning from Data, Sampling, Tom Fawcett
- New Hybrid Rare-Event Sampling Technique for Fraud Detection - Apr 26, 2015.
Proposed hybrid sampling methodology may prove useful when building and validating machine learning models for applications where target event is rare, such as fraud detection.
Bootstrap sampling, Fraud Detection, Sampling