- How to solve machine learning problems in the real world - Sep 2, 2021.
Becoming a machine learning engineer pro is your goal? Sure, online ML courses and Kaggle-style competitions are great resources to learn the basics. However, the daily job of a ML engineer requires an additional layer of skills that you won’t master through these approaches.
Advice, Business, Data Quality, Machine Learning, SQL, Tips, XGBoost
- Data Validation in Machine Learning is Imperative, Not Optional - May 24, 2021.
Before we reach model training in the pipeline, there are various components like data ingestion, data versioning, data validation, and data pre-processing that need to be executed. In this article, we will discuss data validation, why it is important, its challenges, and more.
Data Quality, Machine Learning, Production, Validation
- How to get started managing data quality with SQL and scale - May 4, 2021.
Silent data quality issues are the biggest problem facing data teams today, who are flying blind with no systems or processes in place to monitor and detect bad data before it has a downstream impact.
Data Preparation, Data Quality, Scalability, SQL
- Data Validation and Data Verification – From Dictionary to Machine Learning - Mar 16, 2021.
In this article, we will understand the difference between data verification and data validation, two terms which are often used interchangeably when we talk about data quality. However, these two terms are distinct.
Data Quality, Machine Learning, Validation
- Data Observability, Part II: How to Build Your Own Data Quality Monitors Using SQL - Feb 23, 2021.
Using schema and lineage to understand the root cause of your data anomalies.
Data Engineering, Data Quality, Data Science, Data Science Platform, SQL
- Inside the Architecture Powering Data Quality Management at Uber - Feb 22, 2021.
Data Quality Monitor implements novel statistical methods for anomaly detection and quality management in large data infrastructures.
Architecture, Data Quality, Uber
- Data Observability: Building Data Quality Monitors Using SQL - Feb 16, 2021.
To trigger an alert when data breaks, data teams can leverage a tried and true tactic from our friends in software engineering: monitoring and observability. In this article, we walk through how you can create your own data quality monitors for freshness and distribution from scratch using SQL.
Data Engineering, Data Quality, Data Science, Data Science Platform, SQL
- My machine learning model does not learn. What should I do? - Feb 10, 2021.
This article presents 7 hints on how to get out of the quicksand.
Algorithms, Business Context, Data Quality, Hyperparameter, Machine Learning, Modeling, Tips
- 10 Principles of Practical Statistical Reasoning - Nov 3, 2020.
Practical Statistical Reasoning is a term that covers the nature and objective of applied statistics/data science, principles common to all applications, and practical steps/questions for better conclusions. The following principles have helped me become more efficient with my analyses and clearer in my conclusions.
Data Analysis, Data Quality, Data Science, Statistical Analysis, Statistics
- 6 Common Mistakes in Data Science and How To Avoid Them - Sep 10, 2020.
As a novice or seasoned Data Scientist, your work depends on the data, which is rarely perfect. Properly handling the typical issues with data quality and completeness is crucial, and we review how to avoid six of these common scenarios.
Advice, Data Quality, Data Science, Hyperparameter, Mistakes, Overfitting
- How A Single Source of Truth Can Benefit Your Organization - Aug 7, 2020.
A single source of truth provides stakeholders with a clear picture of the enterprise assets and the potential complications that can disrupt the data strategy. Find out how you can implement this single source of truth in your enterprise ecosystem.
Business Intelligence, Data Management, Data Quality, Decision Making
- How Bad Data is Affecting Your Organization’s Operational Efficiency - Mar 5, 2020.
Despite recognizing the importance of data quality, many companies still fail to implement a data quality framework that could protect them from making costly mistakes. Poor data does not just cause revenue loss – it’s the reason your company could lose employees, customers and reputation!
Business, Data Management, Data Operations, Data Quality, Efficiency
- Data Quality Assessment Is Not All Roses. What Challenges Should You Be Aware Of? - Sep 24, 2019.
Of all data quality characteristics, we consider consistency and accuracy to be the most difficult ones to measure. Here, we describe the challenges that you may encounter and the ways to overcome them.
Challenges, Data Quality
- YouTube videos on database management, SQL, Datawarehousing, Business Intelligence, OLAP, Big Data, NoSQL databases, data quality, data governance and Analytics – free - May 18, 2018.
Watch over 20 hours of YouTube videos on databases and database design, Physical Data Storage, Transaction Management and Database Access, and Data Warehousing, Data Governance and (Big) Data Analytics - all free.
Analytics, Bart Baesens, Big Data, Business Intelligence, Data Governance, Data Quality, Data Warehousing, Databases, NoSQL, SQL, Youtube
- Must-Know: What are common data quality issues for Big Data and how to handle them? - May 16, 2017.
Let's have a look at common quality issues facing Big Data in terms of the key characteristics of Big Data – Volume, Velocity, Variety, Veracity, and Value.
3Vs of Big Data, Big Data, Data Quality, Interview Questions
- 17 More Must-Know Data Science Interview Questions and Answers, Part 3 - Mar 15, 2017.
The third and final part of 17 new must-know Data Science interview questions and answers covers A/B testing, data visualization, Twitter influence evaluation, and Big Data quality.
Pages: 1 2
3Vs of Big Data, A/B Testing, Big Data, Data Quality, Data Science, Data Visualization, Influencers, Interview Questions, Statistics, Twitter
- Bad Data + Good Models = Bad Results - Jan 26, 2017.
No matter how advanced is your Machine Learning algorithm, the results will be bad if the input data
is bad. We examine one popular IMDB dataset and discuss how an analyst can deal with such data.
Data Quality, Face Recognition, IMDb, Kaggle, Movies
- Ten Simple Rules for Effective Statistical Practice: An Overview - Jun 23, 2016.
An overview of 10 simple rules to follow to ensure proper effective statistical data analysis.
Advice, Data Quality, Noise, Replication, Reproducibility, Statistical Analysis
- In Machine Learning, What is Better: More Data or better Algorithms - Jun 17, 2015.
Gross over-generalization of “more data gives better results” is misguiding. Here we explain, in which scenario more data or more features are helpful and which are not. Also, how the choice of the algorithm affects the end result.
Big Data Hype, Data Quality, IMDb, Machine Learning, Quora, Xavier Amatriain