- Top Resources for Learning Statistics for Data Science - Dec 16, 2021.
Let’s take a look at the current state of statistics in data science, and what you can do to accelerate your learning.
Courses, Data Science, Springboard, Statistics
- Feature Selection: Where Science Meets Art - Dec 14, 2021.
From heuristic to algorithmic feature selection techniques for data science projects.
Data Preprocessing, Feature Selection, Machine Learning, Statistics
- How to Use Permutation Tests - Dec 2, 2021.
A walkthrough of permutation tests and how they can be applied to time series data.
Statistics
- Find the Best-Matching Distribution for Your Data Effortlessly - Oct 22, 2021.
How to find the best-matching statistical distributions for your data points — in an automated and easy way. And, then how to extend the utility further.
Distribution, Python, Statistics, Synthetic Data
- How to calculate confidence intervals for performance metrics in Machine Learning using an automatic bootstrap method - Oct 15, 2021.
Are your model performance measurements very precise due to a “large” test set, or very uncertain due to a “small” or imbalanced test set?
Machine Learning, Metrics, Statistics
- How to do “Limitless” Math in Python - Oct 7, 2021.
How to perform arbitrary-precision computation and much more math (and fast too) than what is possible with the built-in math library in Python.
Linear Algebra, Mathematics, Probability, Python, Statistics
- Advanced Statistical Concepts in Data Science - Sep 30, 2021.
The article contains some of the most commonly used advanced statistical concepts along with their Python implementation.
Career Advice, Data Science, Distribution, Probability, Statistics
- Important Statistics Data Scientists Need to Know - Sep 29, 2021.
Several fundamental statistical concepts must be well appreciated by every data scientist -- from the enthusiast to the professional. Here, we provide code snippets in Python to increase understanding to bring you key tools that bring early insight into your data.
Bayes Theorem, Data Science, Probability, Statistics
- Real-Time Histogram Plots on Unbounded Data - Sep 24, 2021.
Using histograms on real-time data is not possible in most of the popular data science libraries. In this article you will learn how dynamically compute and display a histogram within a Python notebook.
Data Visualization, Histogram, Real-time, Statistics
- How to Find Weaknesses in your Machine Learning Models - Sep 20, 2021.
FreaAI: a new method from researchers at IBM.
Interpretability, Machine Learning, Modeling, Statistics
- Paradoxes in Data Science - Sep 17, 2021.
Have a look into some of the main paradoxes associate with Data Science and it’s statistical foundations.
Data Science, Statistics
- KDnuggets™ News 21:n34, Sep 8: Do You Read Excel Files with Python? There is a 1000x Faster Way; Hypothesis Testing Explained - Sep 8, 2021.
Do You Read Excel Files with Python? There is a 1000x Faster Way; Hypothesis Testing Explained; Data Science Cheat Sheet 2.0; 6 Cool Python Libraries That I Came Across Recently; Best Resources to Learn Natural Language Processing in 2021
AI, Cheat Sheet, Data Science, Excel, Hypothesis Testing, Machine Learning, Python, Statistics
- Learning Data Science and Machine Learning: First Steps After The Roadmap - Aug 24, 2021.
Just getting into learning data science may seem as daunting as (if not more than) trying to land your first job in the field. With so many options and resources online and in traditional academia to consider, these pre-requisites and pre-work are recommended before diving deep into data science and AI/ML.
Data Science, Machine Learning, Mathematics, Python, Roadmap, Statistics
- Introduction to Statistical Learning Second Edition - Aug 13, 2021.
The second edition of the classic "An Introduction to Statistical Learning, with Applications in R" was published very recently, and is now freely-available via PDF on the book's website.
Books, Data Science, Machine Learning, R, Statistical Learning, Statistics
- Be Wary of Automated Feature Selection — Chi Square Test of Independence Example - Aug 5, 2021.
When Data Scientists use chi square test for feature selection, they just merely go by the ritualistic “If your p-value is low, the null hypothesis must go”. The automated function they use behaves no differently.
Automated Data Science, Automated Machine Learning, Feature Selection, Statistics
- A Brief Introduction to the Concept of Data - Jul 29, 2021.
Every aspiring data scientist must know the concept of data and the kind of analysis they can run. This article introduces the concept of data (quantitative and qualitative) and the types of analysis.
Beginners, Data Analytics, Data Science, Qualitative Analytics, Quantitative Analytics, Statistics
- The Lost Art of Decile Analysis - Jul 22, 2021.
The goal of classification is a primary and widely-used application of machine learning algorithms. However, if careful consideration through additional analysis is not taken into the subtlety in the results of an even an apparently straightforward binary classifier, then the deeper meaning of your prediction may be obscured.
Lift charts, Predictive Models, Statistics
- WHT: A Simpler Version of the fast Fourier Transform (FFT) you should know - Jul 21, 2021.
The fast Walsh Hadamard transform is a simple and useful algorithm for machine learning that was popular in the 1960s and early 1970s. This useful approach should be more widely appreciated and applied for its efficiency.
Algorithms, Statistics, Time Series
- 11 Important Probability Distributions Explained - Jul 20, 2021.
There are many distribution functions considered in statistics and machine learning, which can seem daunting to understand at first. Many are actually closely related, and with these intuitive explanations of the most important probability distributions, you can begin to appreciate the observations of data these distributions communicate.
Explained, Probability, Statistics
- This Data Visualization is the First Step for Effective Feature Selection - Jun 8, 2021.
Understanding the most important features to use is crucial for developing a model that performs well. Knowing which features to consider requires experimentation, and proper visualization of your data can help clarify your initial selections. The scatter pairplot is a great place to start.
Data Visualization, Feature Selection, Statistics, Stocks
- 10 Must-Know Statistical Concepts for Data Scientists - Apr 21, 2021.
Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.
Bayes Theorem, Correlation, Normal Distribution, P-value, Sampling, Statistics, Variance
- Data Science 101: Normalization, Standardization, and Regularization - Apr 20, 2021.
Normalization, standardization, and regularization all sound similar. However, each plays a unique role in your data preparation and model building process, so you must know when and how to use these important procedures.
Data Preprocessing, Feature Engineering, Normalization, Regression, Regularization, Statistics
- Top 3 Statistical Paradoxes in Data Science - Apr 15, 2021.
Observation bias and sub-group differences generate statistical paradoxes.
Bias, Data Science, Simpson's Paradox, Statistics
- Data Science Curriculum for Professionals - Mar 25, 2021.
If you are looking to expand or transition your current professional career that is buried in spreadsheet analysis into one powered by data science, then you are in for an exciting but complex journey with much to explore and master. To begin your adventure, following this complete road map to guide you from a gnome in the forest of spreadsheets to an AI wizard known far and wide throughout the kingdom.
Cloud Computing, Data Science Education, Data Visualization, Machine Learning, Python, R, Roadmap, Statistics
- Must Know for Data Scientists and Data Analysts: Causal Design Patterns - Mar 12, 2021.
Industry is a prime setting for observational causal inference, but many companies are blind to causal measurement beyond A/B tests. This formula-free primer illustrates analysis design patterns for measuring causal effects from observational data.
Causality, Data Science, Design, Design of Experiments, Statistics
- The Inferential Statistics Data Scientists Should Know - Mar 11, 2021.
The foundations of Data Science and machine learning algorithms are in mathematics and statistics. To be the best Data Scientists you can be, your skills in statistical understanding should be well-established. The more you appreciate statistics, the better you will understand how machine learning performs its apparent magic.
Data Science Education, Statistics
- 10 Statistical Concepts You Should Know For Data Science Interviews - Feb 23, 2021.
Data Science is founded on time-honored concepts from statistics and probability theory. Having a strong understanding of the ten ideas and techniques highlighted here is key to your career in the field, and also a favorite topic for concept checks during interviews.
Bayes Theorem, Interview Questions, Linear Regression, Logistic Regression, P-value, Sampling, Statistics
- Want to Be a Data Scientist? Don’t Start With Machine Learning - Jan 26, 2021.
Machine learning may appear like the go-to topic to start learning for the aspiring data scientist. But. thinking these techniques are the key aspects of the role is the biggest misconception. So much more goes into becoming a successful data scientist, and machine learning is only one component of broader skills around processing, managing, and understanding the science behind the data.
Career Advice, Data Scientist, Machine Learning, Statistics
- Null Hypothesis Significance Testing is Still Useful - Jan 25, 2021.
Even in the aftermath of the replication crisis, statistical significance lingers as an important concept for Data Scientists to understand.
Hypothesis Testing, P-value, Statistical Significance, Statistics
- Comprehensive Guide to the Normal Distribution - Jan 18, 2021.
Drop in for some tips on how this fundamental statistics concept can improve your data science.
Distribution, Normal Distribution, Python, SciPy, Statistics
- 15 Free Data Science, Machine Learning & Statistics eBooks for 2021 - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
Automated Machine Learning, Data Science, Deep Learning, Free ebook, Machine Learning, NLP, Python, R, Statistics
- Monte Carlo integration in Python - Dec 24, 2020.
A famous Casino-inspired trick for data science, statistics, and all of science. How to do it in Python?
Monte Carlo, Python, Simulation, Statistics
- 5 Free Books to Learn Statistics for Data Science - Dec 8, 2020.
Learn all the statistics you need for data science for free.
Data Science, Free ebook, Statistics
- Essential Math for Data Science: Probability Density and Probability Mass Functions - Dec 7, 2020.
In this article, we’ll cover probability mass and probability density function in this sample. You’ll see how to understand and represent these distribution functions and their link with histograms.
Data Science, Mathematics, Probability, Statistics
- 10 Principles of Practical Statistical Reasoning - Nov 3, 2020.
Practical Statistical Reasoning is a term that covers the nature and objective of applied statistics/data science, principles common to all applications, and practical steps/questions for better conclusions. The following principles have helped me become more efficient with my analyses and clearer in my conclusions.
Data Analysis, Data Quality, Data Science, Statistical Analysis, Statistics
- The Best Free Data Science eBooks: 2020 Update - Sep 30, 2020.
The author has updated their list of best free data science books for 2020. Read on to see what books you should grab.
Books, Data Science, Free ebook, Probability, Programming, Statistics
- Causal Inference: The Free eBook - Sep 25, 2020.
Here's another free eBook for those looking to up their skills. If you are seeking a resource that exhaustively treats the topic of causal inference, this book has you covered.
Books, Free ebook, Inference, Statistics
- What is Simpson’s Paradox and How to Automatically Detect it - Sep 18, 2020.
Looking at data one way can tell one story, but sometimes looking at it another way will tell the opposite story. Understanding this paradox and why it happens is essential, and new tools are available to help automatically detect this tricky issue in your datasets.
Simpson's Paradox, Statistics
- Statistics with Julia: The Free eBook - Sep 14, 2020.
This free eBook is a draft copy of the upcoming Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence. Interested in learning Julia for data science? This might be the best intro out there.
Books, Data Science, Free ebook, Julia, Statistics
- Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills - Sep 8, 2020.
We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.
Communication, Data Preparation, Data Science Skills, Data Visualization, Excel, GitHub, Mathematics, Poll, Python, Reinforcement Learning, scikit-learn, SQL, Statistics
- Book Chapter: The Art of Statistics: Learning from Data - Sep 3, 2020.
Get a free book chapter from "The Art of Statistics: Learning from Data" by a leading researcher Sir David John Spiegelhalter. This excerpt takes a forensic look at data surrounding the victims of the UK most prolific serial killer and shows how a simple search for patterns reveals critical details.
Book, Crime, JMP, Statistics
- Which methods should be used for solving linear regression? - Sep 2, 2020.
As a foundational set of algorithms in any machine learning toolbox, linear regression can be solved with a variety of approaches. Here, we discuss. with with code examples, four methods and demonstrate how they should be used.
Gradient Descent, Linear Regression, numpy, Python, Statistics, SVD
- These Data Science Skills will be your Superpower - Aug 20, 2020.
Learning data science means learning the hard skills of statistics, programming, and machine learning. To complete your training, a broader set of soft skills will round out your capabilities as an effective and successful professional Data Scientist.
Communication, Data Preparation, Data Science Skills, Data Visualization, Mathematics, Statistics
- Hypothesis Test for Real Problems - Aug 14, 2020.
Hypothesis tests are significant for evaluating answers to questions concerning samples of data.
Hypothesis Testing, P-value, Statistics
- Introduction to Statistics for Data Science - Aug 12, 2020.
Statistics is foundational for Data Science and a crucial skill to master for any practitioner. This advanced introduction reviews with examples the fundamental concepts of inferential statistics by illustrating the differences between Point Estimators and Confidence Intervals Estimates.
Beginners, Data Science, Statistics
- R squared Does Not Measure Predictive Capacity or Statistical Adequacy - Jul 31, 2020.
The fact that R-squared shouldn't be used for deciding if you have an adequate model is counter-intuitive and is rarely explained clearly. This demonstration overviews how R-squared goodness-of-fit works in regression analysis and correlations, while showing why it is not a measure of statistical adequacy, so should not suggest anything about future predictive performance.
Predictive Analytics, Regression, Statistics
- A Complete Guide To Survival Analysis In Python, part 3 - Jul 30, 2020.
Concluding this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter based on different groups, a Log-Rank test, and Cox Regression, all with examples and shared code.
Jupyter, Python, Regression, Statistics, Survival Analysis
- Essential Resources to Learn Bayesian Statistics - Jul 28, 2020.
If you are interesting in becoming better at statistics and machine learning, then some time should be invested in diving deeper into Bayesian Statistics. While the topic is more advanced, applying these fundamentals to your work will advance your understanding and success as an ML expert.
Bayesian, Machine Learning, Markov Chain, Statistics
- Demystifying Statistical Significance - Jul 17, 2020.
With more professionals from a wide range of less technical fields diving into statistical analysis and data modeling, these experimental techniques can seem daunting. To help with these hurdles, this article clarifies some misconceptions around p-values, hypothesis testing, and statistical significance.
P-value, Statistical Significance, Statistics
- Before Probability Distributions - Jul 16, 2020.
Why do we use probability distributions, and why do they matter?
Distribution, Probability, Statistics
- A Complete Guide To Survival Analysis In Python, part 2 - Jul 14, 2020.
Continuing with the second of this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter theory as well as the Nelson-Aalen fitter theory, both with examples and shared code.
Python, Statistics, Survival Analysis
- A Complete Guide To Survival Analysis In Python, part 1 - Jul 7, 2020.
This three-part series covers a review with step-by-step explanations and code for how to perform statistical survival analysis used to investigate the time some event takes to occur, such as patient survival during the COVID-19 pandemic, the time to failure of engineering products, or even the time to closing a sale after an initial customer contact.
Python, Statistics, Survival Analysis
- 4 Free Math Courses to do and Level up your Data Science Skills - Jun 22, 2020.
Just as there is no Data Science without data, there's no science in data without mathematics. Strengthening your foundational skills in math will level you up as a data scientist that will enable you to perform with greater expertise.
Bayesian, Coursera, edX, Inference, Linear Algebra, Mathematics, Online Education, Principal component analysis, Probability, Python, Statistics
- Overview of data distributions - Jun 10, 2020.
With so many types of data distributions to consider in data science, how do you choose the right one to model your data? This guide will overview the most important distributions you should be familiar with in your work.
Binomial, Distribution, Normal Distribution, Poisson Distribution, Probability, Statistics
- If you had to start statistics all over again, where would you start? - Jun 5, 2020.
If you are just diving into learning statistics, then where do you begin? Find insight from those who have tread in these waters before, and see what they might have done differently along their personal journeys in statistics.
Advanced Statistics, Advice, Bayesian, Career Advice, Statistician, Statistics
- Appropriately Handling Missing Values for Statistical Modelling and Prediction - May 22, 2020.
Many statisticians in industry agree that blindly imputing the missing values in your dataset is a dangerous move and should be avoided without first understanding why the data is missing in the first place.
Advice, Analytics, Business Analytics, Data Preparation, Data Science, Data Scientist, Missing Values, Statistics
- Looking Normal(ly Distributed) - May 20, 2020.
This article investigates when some probability distributions look normal "enough" for a statistical test.
Data Visualization, Distribution, Normal Distribution, Probability, Statistics
- Evidence Counterfactuals for explaining predictive models on Big Data - May 18, 2020.
Big Data generated by people -- such as, social media posts, mobile phone GPS locations, and browsing history -- provide enormous prediction value for AI systems. However, explaining how these models predict with the data remains challenging. This interesting explanation approach considers how a model would behave if it didn't have the original set of data to work with.
Big Data, Explainability, Predictive Modeling, Predictive Models, Statistics
- A Concise Course in Statistical Inference: The Free eBook - Apr 27, 2020.
Check out this freely available book, All of Statistics: A Concise Course in Statistical Inference, and learn the probability and statistics needed for success in data science.
Book, Free ebook, Mathematics, Statistics
- Should Data Scientists Model COVID19 and other Biological Events - Apr 22, 2020.
Biostatisticians use statistical techniques that your current everyday data scientists have probably never heard of. This is a great example where lack of domain knowledge exposes you as someone that does not know what they are doing and are merely hopping on a trend.
Advice, COVID-19, Data Science, Data Scientist, Statistics
- Statistical Thinking for Industrial Problem Solving – a free online statistics course - Apr 9, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Course, JMP, Online Education, Statistics
- Data Science Curriculum for self-study - Feb 26, 2020.
Are you asking the question, "how do I become a Data Scientist?" This list recommends the best essential topics to gain an introductory understanding for getting started in the field. After learning these basics, keep in mind that doing real data science projects through internships or competitions is crucial to acquiring the core skills necessary for the job.
Advice, Data Science, Data Science Education, Data Visualization, Mathematics, Probability, Programming, Statistics
- An Eight-Step Checklist for An Analytics Project - Nov 6, 2019.
Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.
Analytics, Checklist, Deployment, Feature Selection, Statistics
- Probability Learning: Maximum Likelihood - Nov 5, 2019.
The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.
Learning, Probability, Statistics
- 5 Statistical Traps Data Scientists Should Avoid - Oct 30, 2019.
Here are five statistical fallacies — data traps — which data scientists should be aware of and definitely avoid.
Bias, Fallacies, Simpson's Paradox, Statistics
- How to Become a (Good) Data Scientist – Beginner Guide - Oct 16, 2019.
A guide covering the things you should learn to become a data scientist, including the basics of business intelligence, statistics, programming, and machine learning.
Beginners, BI, Data Scientist, Sciforce, Statistics
- An Overview of Density Estimation - Oct 14, 2019.
Density estimation is estimating the probability density function of the population from the sample. This post examines and compares a number of approaches to density estimation.
Generative Adversarial Network, Probability, Statistics
- 6 bits of advice for Data Scientists - Sep 25, 2019.
As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.
Advice, Data Cleaning, Data Scientist, Metrics, Overfitting, Statistics
- Beta Distribution: What, When & How - Sep 25, 2019.
This article covers the beta distribution, and explains it using baseball batting averages.
Distribution, Probability, Statistics
- Which Data Science Skills are core and which are hot/emerging ones? - Sep 17, 2019.
We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.
Career, Data Science Skills, Data Visualization, Deep Learning, Excel, Machine Learning, Poll, Python, PyTorch, Scala, Skills, Statistics, TensorFlow
- How Bad is Multicollinearity? - Sep 17, 2019.
For some people anything below 60% is acceptable and for certain others, even a correlation of 30% to 40% is considered too high because it one variable may just end up exaggerating the performance of the model or completely messing up parameter estimates.
Analytics, Multicollinearity, Regression, Statistics
- What’s the difference between analytics and statistics? - Sep 6, 2019.
From asking the best questions about data to answering those questions with certainty, understanding the value of these two seemingly different professions is clarified when you see how they should work together.
Analytics, Explained, Statistics
- Statistical Modelling vs Machine Learning - Aug 14, 2019.
At times it may seem Machine Learning can be done these days without a sound statistical background but those people are not really understanding the different nuances. Code written to make it easier does not negate the need for an in-depth understanding of the problem.
Advice, Data Science, Machine Learning, Statistics
- What is Poisson Distribution? - Aug 14, 2019.
An solid overview of the Poisson distribution, starting from why it is needed, how it stacks up to binomial distribution, deriving its formula mathematically, and more.
Distribution, Poisson Distribution, Probability, Statistics
- P-values Explained By Data Scientist - Jul 30, 2019.
This article is designed to give you a full picture from constructing a hypothesis testing to understanding p-value and using that to guide our decision making process.
Data Science, Data Scientist, Hypothesis Testing, P-value, Statistics
- Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps - Jul 9, 2019.
A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.
Data Visualization, Python, Statistics
- How do you check the quality of your regression model in Python? - Jul 2, 2019.
Linear regression is rooted strongly in the field of statistical learning and therefore the model must be checked for the ‘goodness of fit’. This article shows you the essential steps of this task in a Python ecosystem.
Data Science, Multicollinearity, Python, Regression, Statistics
- 5 Useful Statistics Data Scientists Need to Know - Jun 14, 2019.
A data scientist should know how to effectively use statistics to gain insights from data. Here are five useful and practical statistical concepts that every data scientist must know.
Data Science, Data Scientist, Statistics
- All Models Are Wrong – What Does It Mean? - Jun 12, 2019.
During your adventures in data science, you may have heard “all models are wrong.” Let’s unpack this famous quote to understand how we can still make models that are useful.
Advice, Linear Regression, Modeling, Statistics
- Top 10 Statistics Mistakes Made by Data Scientists - Jun 7, 2019.
The following are some of the most common statistics mistakes made by data scientists. Check this list often to make sure you are not making any of these while applying statistics to data science.
Data Science, Data Scientist, GitHub, Mistakes, Statistics
- Separating signal from noise - Jun 4, 2019.
When we are building a model, we are making the assumption that our data has two parts, signal and noise. Signal is the real pattern, the repeatable process that we hope to capture and describe. The noise is everything else that gets in the way of that.
Noise, Regression, Statistics, Time Series
- What Does a Lady Tasting Tea Have to Do with Science? - May 31, 2019.
Design of Experiments (DOE) is a statistical concept used to find the cause-and-effect relationships. Surprisingly, an experiment arising from a casual conversation about tea-drinking is one of the first examples of an experiment designed using statistical ideas.
Design of Experiments, Randomization, Statistics
- Probability Mass and Density Functions - May 21, 2019.
This content is part of a series about the chapter 3 on probability from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts.
Pages: 1 2
Mathematics, Probability, Statistics
- Naive Bayes: A Baseline Model for Machine Learning Classification Performance - May 7, 2019.
We can use Pandas to conduct Bayes Theorem and Scikitlearn to implement the Naive Bayes Algorithm. We take a step by step approach to understand Bayes and implementing the different options in Scikitlearn.
Pages: 1 2
Algorithms, Data Science, Machine Learning, Naive Bayes, Python, scikit-learn, Statistics
- Spatio-Temporal Statistics: A Primer - Apr 5, 2019.
Marketing scientist Kevin Gray asks University of Missouri Professor Chris Wikle about Spatio-Temporal Statistics and how it can be used in science and business.
Interview, Spatio-Temporal, Statistics
- Beating the Bookies with Machine Learning - Mar 8, 2019.
We investigate how to use a custom loss function to identify fair odds, including a detailed example using machine learning to bet on the results of a darts match and how this can assist you in beating the bookmaker.
Machine Learning, PyTorch, Sports, Statistics
- From Good to Great Data Science, Part 1: Correlations and Confidence - Feb 5, 2019.
With the aid of some hospital data, part one describes how just a little inexperience in statistics could result in two common mistakes.
Correlation, Data Science, Python, Statistics
- The Essential Data Science Venn Diagram - Feb 4, 2019.
A deeper examination of the interdisciplinary interplay involved in data science, focusing on automation, validity and intuition.
Analytics, Data Science, Machine Learning, Statistics, Venn Diagram
- Introduction to Statistics for Data Science - Dec 17, 2018.
This tutorial helps explain the central limit theorem, covering populations and samples, sampling distribution, intuition, and contains a useful video so you can continue your learning.
Data Science, Statistics
- A comprehensive list of Machine Learning Resources: Open Courses, Textbooks, Tutorials, Cheat Sheets and more - Dec 7, 2018.
A thorough collection of useful resources covering statistics, classic machine learning, deep learning, probability, reinforcement learning, and more.
Cheat Sheet, Data Science Education, Deep Learning, Machine Learning, Mathematics, Open Source, Reinforcement Learning, Resources, Statistics
- The 5 Basic Statistics Concepts Data Scientists Need to Know - Nov 13, 2018.
Today, we’re going to look at 5 basic statistics concepts that data scientists need to know and how they can be applied most effectively!
Data Science, Data Scientist, Statistics
- Unfolding Naive Bayes From Scratch - Sep 25, 2018.
Whether you are a beginner in Machine Learning or you have been trying hard to understand the Super Natural Machine Learning Algorithms and you still feel that the dots do not connect somehow, this post is definitely for you!
Pages: 1 2
Bayesian, Classification, Naive Bayes, Probability, Statistics
- Machine Learning Cheat Sheets - Sep 11, 2018.
Check out this collection of machine learning concept cheat sheets based on Stanord CS 229 material, including supervised and unsupervised learning, neural networks, tips & tricks, probability & stats, and algebra & calculus.
Cheat Sheet, Deep Learning, Machine Learning, Mathematics, Neural Networks, Probability, Statistics, Supervised Learning, Tips, Unsupervised Learning
- 5 Things to Know About A/B Testing - Sep 7, 2018.
This article presents 5 things to know about A/B testing, from appropriate sample sizes, to statistical confidence, to A/B testing usefulness, and more.
A/B Testing, Applied Statistics, Psychology, Statistics
- What on earth is data science? - Sep 4, 2018.
An overview and discussion around data science, covering the history behind the term, data mining, statistical inference, machine learning, data engineering and more.
Data Mining, Data Science, Decision Making, Statistics
- Basic Statistics in Python: Probability - Aug 21, 2018.
At the most basic level, probability seeks to answer the question, "What is the chance of an event happening?" To calculate the chance of an event happening, we also need to consider all the other events that can occur.
Normal Distribution, Probability, Python, Statistics
- Interpreting a data set, beginning to end - Aug 20, 2018.
Detailed knowledge of your data is key to understanding it! We review several important methods that to understand the data, including summary statistics with visualization, embedding methods like PCA and t-SNE, and Topological Data Analysis.
Analytics, Big Data, Data Science, Data Visualization, Machine Learning, SAS, Statistics, t-SNE
- Basic Statistics in Python: Descriptive Statistics - Aug 1, 2018.
This article covers defining statistics, descriptive statistics, measures of central tendency, and measures of spread. This article assumes no prior knowledge of statistics, but does require at least a general knowledge of Python.
Descriptive Analytics, Python, Statistics
- Causation in a Nutshell - Jul 20, 2018.
Every move we make, every breath we take, and every heartbeat is an effect that is caused. Even apparent randomness may just be something we cannot explain.
Causality, Causation, Statistics
- Explaining the 68-95-99.7 rule for a Normal Distribution - Jul 19, 2018.
This post explains how those numbers were derived in the hope that they can be more interpretable for your future endeavors.
Data Analysis, Data Science, Normal Distribution, Python, Statistics
- Why Data Scientists Love Gaussian - Jun 26, 2018.
Gaussian distribution model, often identified with its iconic bell shaped curve, also referred as Normal distribution, is so popular mainly because of three reasons.
Distribution, Probability, Statistics
- Every time someone runs a correlation coefficient on two time series, an angel loses their wings - Jun 18, 2018.
We all know correlation doesn’t equal causality at this point, but when working with time series data, correlation can lead you to come to the wrong conclusion.
Correlation, Data Mining, Statistics, Time Series
- Statistics, Causality, and What Claims are Difficult to Swallow: Judea Pearl debates Kevin Gray - Jun 15, 2018.
While KDnuggets takes no side, we present the informative and respectful back and forth as we believe it has value for our readers. We hope that you agree.
AI, Computer Science, Data Science, Judea Pearl, Statistics
- The Book of Why - Jun 1, 2018.
Judea Pearl has made noteworthy contributions to artificial intelligence, Bayesian networks, and causal analysis. These achievements notwithstanding, Pearl holds some views many statisticians may find odd or exaggerated.
Bayesian Networks, Causality, Data Science, Judea Pearl, Simpson's Paradox, Statistics
- Skewness vs Kurtosis – The Robust Duo - May 4, 2018.
Kurtosis and Skewness are very close relatives of the “data normalized statistical moment” family – Kurtosis being the fourth and Skewness the third moment, and yet they are often used to detect very different phenomena in data. At the same time, it is typically recommendable to analyse the outputs of both together to gather more insight and understand the nature of the data better.
Data Science, Descriptive Analytics, Statistics
- Key Algorithms and Statistical Models for Aspiring Data Scientists - Apr 16, 2018.
This article provides a summary of key algorithms and statistical techniques commonly used in industry, along with a short resource related to these techniques.
Algorithms, Data Science, Machine Learning, Online Education, Statistics
- Descriptive Statistics: The Mighty Dwarf of Data Science – Crest Factor - Apr 6, 2018.
No other mean of data description is more comprehensive than Descriptive Statistics and with the ever increasing volumes of data and the era of low latency decision making needs, its relevance will only continue to increase.
Data Science, Descriptive Analytics, Statistics
- Descriptive Statistics: The Mighty Dwarf of Data Science - Mar 20, 2018.
No other mean of data description is more comprehensive than Descriptive Statistics and with the ever increasing volumes of data and the era of low latency decision making needs, its relevance will only continue to increase.
Data Science, Descriptive Analytics, Statistics
- Multiscale Methods and Machine Learning - Mar 19, 2018.
We highlight recent developments in machine learning and Deep Learning related to multiscale methods, which analyze data at a variety of scales to capture a wider range of relevant features. We give a general overview of multiscale methods, examine recent successes, and compare with similar approaches.
Algorithms, Data Science, Deep Learning, Machine Learning, Statistics
- Histogram 202: Tips and Tricks for Better Data Science - Feb 15, 2018.
We show how to make an ideal histogram, share some tips, and give examples. Let's dive into the world of binning.
Data Science, Histogram, Statistics
- Propensity Score Matching in R - Jan 18, 2018.
Propensity scores are an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible.
Pages: 1 2
Bias, R, Statistics
- How Not To Lie With Statistics - Jan 11, 2018.
Darrell Huff's classic How to Lie with Statistics is perhaps more relevant than ever. In this short article, I revisit this theme from some different angles.
Statistics, Trust
- You have created your first Linear Regression Model. Have you validated the assumptions? - Nov 15, 2017.
Linear Regression is an excellent starting point for Machine Learning, but it is a common mistake to focus just on the p-values and R-Squared values while determining validity of model. Here we examine the underlying assumptions of a Linear Regression, which need to be validated before applying the model.
Data Science, Linear Regression, Machine Learning, Multicollinearity, Statistics
- The 10 Statistical Techniques Data Scientists Need to Master - Nov 15, 2017.
The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.
Pages: 1 2
Algorithms, Data Science, Data Scientist, Machine Learning, Statistical Learning, Statistics
- How Bayesian Networks Are Superior in Understanding Effects of Variables - Nov 9, 2017.
Bayes Nets have remarkable properties that make them better than many traditional methods in determining variables’ effects. This article explains the principle advantages.
Bayesian, Bayesian Networks, Predictive Models, Probability, Regression, Statistics
- Conjoint Analysis: A Primer - Nov 1, 2017.
Conjoint is another of those things everyone talks about but many are confused about…
Statistical Analysis, Statistics
- Statistical Mistakes Even Scientists Make - Oct 3, 2017.
Scientists are all experts in statistics, right? Wrong.
Scientist, Statistician, Statistics
- 30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets - Sep 22, 2017.
This collection of data science cheat sheets is not a cheat sheet dump, but a curated list of reference materials spanning a number of disciplines and tools.
Pages: 1 2 3
Cheat Sheet, Data Science, Deep Learning, Machine Learning, Neural Networks, Probability, Python, R, SQL, Statistics
- How To Lie With Numbers - Sep 21, 2017.
It takes less effort to lie without numbers, but there are now more numbers and more ways to lie with them than ever before. Poor Reverend Bayes, who understood the true meaning of "evidence".
Quantitative Analytics, Statistics
- Vital Statistics You Never Learned… Because They’re Never Taught - Aug 29, 2017.
Marketing scientist Kevin Gray asks Professor Frank Harrell about some important things we often get wrong about statistics.
Bayesian, Data Science, Machine Learning, Statistics
- Machine Learning vs. Statistics: The Texas Death Match of Data Science - Aug 23, 2017.
Throughout its history, Machine Learning (ML) has coexisted with Statistics uneasily, like an ex-boyfriend accidentally seated with the groom’s family at a wedding reception: both uncertain where to lead the conversation, but painfully aware of the potential for awkwardness.
Machine Learning, Statistics
- Data Science Primer: Basic Concepts for Beginners - Aug 11, 2017.
This collection of concise introductory data science tutorials cover topics including the difference between data mining and statistics, supervised vs. unsupervised learning, and the types of patterns we can mine from data.
Bias, Data Mining, Data Science, Distribution, Ensemble Methods, Statistics
- Is Regression Analysis Really Machine Learning? - Jun 5, 2017.
What separates "traditional" applied statistics from machine learning? Is statistics the foundation on top of which machine learning is built? Is machine learning a superset of "traditional" statistics? Do these 2 concepts have a third unifying concept in common? So, in that vein... is regression analysis actually a form of machine learning?
Applied Statistics, Linear Regression, Machine Learning, Regression, Statistics
- Propensity Scores: A Primer - May 16, 2017.
Propensity scores are used in quasi-experimental and non-experimental research when the researcher must make causal inferences, for example, that exposure to a chemical increases the risk of cancer.
Customer Experience, Statistics
- Stuff Happens: A Statistical Guide to the “Impossible” - Apr 6, 2017.
Why are some people struck by lightning multiple times or, more encouragingly, how could anyone possibly win the lottery more than once? The odds against these sorts of things are enormous.
Probability, Statistics
- How to think like a data scientist to become one - Mar 23, 2017.
The author went from securities analyst to Head of Data Science at Amazon. He describes what he learned in his journey and gives 4 useful rules based on his experience.
Amazon, Data Science Skills, Data Scientist, SQL, Statistics
- What Top Firms Ask: 100+ Data Science Interview Questions - Mar 22, 2017.
Check this out: A topic wise collection of 100+ data science interview questions from top companies.
Algorithms, Data Science, Google, Hadoop, Interview Questions, Machine Learning, Microsoft, Statistics, Uber
- Analytics 101: Comparing KPIs - Mar 20, 2017.
Different business units in the organisation have different behaviours (e.g. turnover rate) and they can’t be compared with each other. So, how can we tell whether the changes in their behaviour are reasons for concern?
KPI, Metrics, Statistics
- 17 More Must-Know Data Science Interview Questions and Answers, Part 3 - Mar 15, 2017.
The third and final part of 17 new must-know Data Science interview questions and answers covers A/B testing, data visualization, Twitter influence evaluation, and Big Data quality.
Pages: 1 2
3Vs of Big Data, A/B Testing, Big Data, Data Quality, Data Science, Data Visualization, Influencers, Interview Questions, Statistics, Twitter
- Introduction to Correlation - Feb 22, 2017.
Correlation is one of the most widely used (and widely misunderstood) statistical concepts. We provide the definitions and intuition behind several types of correlation and illustrate how to calculate correlation using the Python pandas library.
Beginners, Correlation, Datascience.com, Pandas, Python, Statistics