- The Prefect Way to Automate & Orchestrate Data Pipelines - Sep 13, 2021.
I am migrating all my ETL work from Airflow to this super-cool framework.
Airflow, Data Workflow, Pipeline, Prefect, Python
- How to build a DAG Factory on Airflow - Mar 19, 2021.
A guide to building efficient DAGs with half of the code.
Data Engineering, Data Workflow, Graphs, Python, Workflow
- 6 Web Scraping Tools That Make Collecting Data A Breeze - Feb 25, 2021.
The first step of any data science project is data collection. While it can be the most tedious and time-consuming step during your workflow, there will be no project without that data. If you are scraping information from the web, then several great tools exist that can save you a lot of time, money, and effort.
Data Curation, Data Preparation, Data Workflow, Web Scraping
- A Layman’s Guide to Data Science. Part 3: Data Science Workflow - Jul 6, 2020.
Learn and appreciate the typical workflow for a data science project, including data preparation (extraction, cleaning, and understanding), analysis (modeling), reflection (finding new paths), and communication of the results to others.
Beginners, Data Science, Data Workflow, Sciforce, Workflow
- Managing Machine Learning Cycles: Five Learnings from comparing Data Science Experimentation/ Collaboration Tools - Jan 29, 2020.
Machine learning projects require handling different versions of data, source code, hyperparameters, and environment configuration. Numerous tools are on the market for managing this variety, and this review features important lessons learned from an ongoing evaluation of the current landscape.
Collaboration, Comet.ml, Data Operations, Data Workflow, DataOps, MLflow, MLOps, Pipeline, Reproducibility
- Data Pipelines, Luigi, Airflow: Everything you need to know - Mar 27, 2019.
This post focuses on the workflow management system (WMS) Airflow: what it is, what can you do with it, and how it differs from Luigi.
Data Workflow, Pipeline, Python, Workflow
- How A Data Scientist Can Improve Productivity - May 25, 2017.
Data Science projects involve iterative processes and may need changes in data at every iteration. But Data versioning, data pipelines and data workflows make Data Scientist’s life easy, let’s see how.
CRISP-DM, Data Scientist, Data Workflow, DVC, GitHub, Version Control
- Dataiku: The Complete Data Sheet - Apr 20, 2017.
Whether your every day tool is Scala, Python, R, or Excel, you can now use one tool - Dataiku - to transform raw data to predictions without the hassle. Discover the platform!
Automated Data Science, Data Science Platform, Data Workflow, Dataiku