- Data Science Process Lifecycle - Sep 29, 2021.
How would it feel to know that without a doubt, the data projects you were working on would create TRUE ROI for your organization? Stick around until the end to get my data science process lifecycle framework so that each data project you run is a smashing success.
Analytics, Data Science, Data Scientist, Workflow
- Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV - Sep 16, 2021.
This article documents the authors' experience building their custom MLOps approach.
GitHub, Machine Learning, MLOps, Pipeline, Python, Workflow
- Writing Your First Distributed Python Application with Ray - Aug 16, 2021.
Using Ray, you can take Python code that runs sequentially and transform it into a distributed application with minimal code changes. Read on to find out why you should use Ray, and how to get started.
Distributed Computing, Parallelism, Python, Workflow
- Workflow Orchestration with Prefect and Coiled - Jun 23, 2021.
Coiled helps data scientists use Python for ambitious problems, scaling to the cloud for computing power, ease, and speed—all tuned for the needs of teams and enterprises. In this demo example, see how to spin up a Coiled cluster to execute Prefect jobs during runtime.
Coiled.io, Modern Data Stack, Orchestration, Prefect, Python, Workflow
- How to build a DAG Factory on Airflow - Mar 19, 2021.
A guide to building efficient DAGs with half of the code.
Data Engineering, Data Workflow, Graphs, Python, Workflow
- Kedro-Airflow: Orchestrating Kedro Pipelines with Airflow - Mar 12, 2021.
The Kedro team and Astronomer have released Kedro-Airflow 0.4.0 to help you develop modular, maintainable & reproducible code with orchestration superpowers!
Data Science, Interview, Pipeline, Python, Workflow
- How to Speed Up Pandas with Modin - Mar 10, 2021.
The Modin library has the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters. This tutorial goes over how to get started with Modin and how it can speed up your pandas workflows.
Data Science, Distributed Systems, Modin, Pandas, Python, Workflow
- AI Is More Than a Model: Four Steps to Complete Workflow Success - Nov 17, 2020.
The key element for success in practical AI implementation is uncovering any issues early on and knowing what aspects of the workflow to focus time and resources on for the best results—and it’s not always the most obvious steps.
AI, Data Preparation, Data Science Process, Deployment, MathWorks, Simulation, Workflow
- How to be a 10x data scientist - Oct 12, 2020.
If you are a Data Scientist looking to make it to the next level, then there are many opportunities to up your game and your efficiency to stand out from the others. Some of these recommendations that you can follow are straightforward, and others are rarely followed, but they will all pay back in dividends of time and effectiveness for your career.
Advice, Data Scientist, IDE, Jupyter, Uncertainty, Workflow
- Machine Learning Model Deployment - Sep 30, 2020.
Read this article on machine learning model deployment using serverless deployment. Serverless compute abstracts away provisioning, managing severs and configuring software, simplifying model deployment.
Cloud, Deployment, Machine Learning, Modeling, Workflow
- Implementing MLOps on an Edge Device - Aug 4, 2020.
This article introduces developers to MLOps and strategies for implementing MLOps on edge devices.
Edge Analytics, Machine Learning, MLOps, Speech Recognition, Workflow
- A Tour of End-to-End Machine Learning Platforms - Jul 29, 2020.
An end-to-end machine learning platform needs a holistic approach. If you’re interested in learning more about a few well-known ML platforms, you’ve come to the right place!
AirBnB, Data Science Platform, Google, Machine Learning, MLOps, Netflix, Pipeline, Uber, Workflow
- A Layman’s Guide to Data Science. Part 3: Data Science Workflow - Jul 6, 2020.
Learn and appreciate the typical workflow for a data science project, including data preparation (extraction, cleaning, and understanding), analysis (modeling), reflection (finding new paths), and communication of the results to others.
Beginners, Data Science, Data Workflow, Sciforce, Workflow
- How to Extend Scikit-learn and Bring Sanity to Your Machine Learning Workflow - Oct 29, 2019.
In this post, learn how to extend Scikit-learn code to make your experiments easier to maintain and reproduce.
Machine Learning, Python, scikit-learn, Software Engineering, Workflow
- 5 Step Guide to Scalable Deep Learning Pipelines with d6tflow - Sep 16, 2019.
How to turn a typical pytorch script into a scalable d6tflow DAG for faster research & development.
Deep Learning, Pipeline, Python, PyTorch, Workflow
- Data Science with Optimus Part 2: Setting your DataOps Environment - Apr 16, 2019.
Breaking down data science with Python, Spark and Optimus. Today: Data Operations for Data Science. Here we’ll learn to set-up Git, Travis CI and DVC for our project.
Apache Spark, Data Operations, Data Science, Python, Workflow
- Data Science with Optimus Part 1: Intro - Apr 15, 2019.
With Optimus you can clean your data, prepare it, analyze it, create profilers and plots, and perform machine learning and deep learning, all in a distributed fashion, because on the back-end we have Spark, TensorFlow, Sparkling Water and Keras. It’s super easy to use.
Apache Spark, Data Science, Python, Workflow
- Data Pipelines, Luigi, Airflow: Everything you need to know - Mar 27, 2019.
This post focuses on the workflow management system (WMS) Airflow: what it is, what can you do with it, and how it differs from Luigi.
Data Workflow, Pipeline, Python, Workflow
- 4 Reasons Why Your Machine Learning Code is Probably Bad - Feb 26, 2019.
Your current ML workflow probably chains together several functions executed linearly. Instead of linearly chaining functions, data science code is better written as a set of tasks with dependencies between them. That is your data science workflow should be a DAG.
Data Science, Machine Learning, Programming, Python, Workflow
- Data Science Project Flow for Startups - Jan 24, 2019.
The aim of this post, then, is to present the characteristic project flow that I have identified in the working process of both my colleagues and myself in recent years. Hopefully, this can help both data scientists and the people working with them to structure data science projects in a way that reflects their uniqueness.
Data Science, Startups, Workflow
- End To End Guide For Machine Learning Projects - Jan 14, 2019.
Let’s imagine you are attempting to work on a machine learning project. This article will provide you with the step to step guide on the process that you can follow to implement a successful project.
Machine Learning, Workflow
- The Machine Learning Project Checklist - Dec 7, 2018.
In an effort to further refine our internal models, this post will present an overview of Aurélien Géron's Machine Learning Project Checklist, as seen in his bestselling book, "Hands-On Machine Learning with Scikit-Learn & TensorFlow."
Checklist, Machine Learning, Process, Workflow
- Emotion and Sentiment Analysis: A Practitioner’s Guide to NLP - Aug 24, 2018.
Sentiment analysis is widely used, especially as a part of social media analysis for any domain, be it a business, a recent movie, or a product launch, to understand its reception by the people and what they think of it based on their opinions or, you guessed it, sentiment!
NLP, Text Analytics, Workflow
- Named Entity Recognition: A Practitioner’s Guide to NLP - Aug 17, 2018.
Named entity recognition (NER) , also known as entity chunking/extraction , is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.
NLP, Text Analytics, Workflow
- Understanding Language Syntax and Structure: A Practitioner’s Guide to NLP - Aug 10, 2018.
Knowledge about the structure and syntax of language is helpful in many areas like text processing, annotation, and parsing for further operations such as text classification or summarization.
NLP, Text Analytics, Workflow
- Text Wrangling & Pre-processing: A Practitioner’s Guide to NLP - Aug 3, 2018.
I will highlight some of the most important steps which are used heavily in Natural Language Processing (NLP) pipelines and I frequently use them in my NLP projects.
Data Preprocessing, Data Wrangling, NLP, Text Analytics, Workflow
- Data Retrieval with Web Scraping: A Practitioner’s Guide to NLP - Jul 26, 2018.
Proven and tested hands-on strategies to tackle NLPÂ tasks.
Data Preprocessing, NLP, Text Analytics, Workflow
- The Keras 4 Step Workflow - Jun 4, 2018.
In his book "Deep Learning with Python," Francois Chollet outlines a process for developing neural networks with Keras in 4 steps. Let's take a look at this process with a simple example.
Francois Chollet, Keras, Neural Networks, Python, Workflow
- Principles of Guided Analytics - Mar 27, 2018.
KNIME outline their guided analytics system and explain how this can assist data scientists to predict future outcomes.
Analytics, Data Preparation, Knime, Michael Berthold, Workflow
- How to do Machine Learning Efficiently - Mar 13, 2018.
I now believe that there is an art, or craftsmanship, to structuring machine learning work and none of the math heavy books I tended to binge on seem to mention this.
Architecture, fast.ai, Machine Learning, Validation, Workflow
- Using AutoML to Generate Machine Learning Pipelines with TPOT - Jan 29, 2018.
This post will take a different approach to constructing pipelines. Certainly the title gives away this difference: instead of hand-crafting pipelines and hyperparameter optimization, and performing model selection ourselves, we will instead automate these processes.
Automated Machine Learning, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 3: Multiple Models, Pipelines, and Grid Searches - Jan 24, 2018.
In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.
Data Preprocessing, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 2: Integrating Grid Search - Jan 19, 2018.
Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.
Data Preprocessing, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 1: A Gentle Introduction - Dec 7, 2017.
Scikit-learn's Pipeline class is designed as a manageable way to apply a series of data transformations followed by the application of an estimator.
Data Preprocessing, Pipeline, Python, scikit-learn, Workflow
- Machine Learning Workflows in Python from Scratch Part 2: k-means Clustering - Jun 7, 2017.
The second post in this series of tutorials for implementing machine learning workflows in Python from scratch covers implementing the k-means clustering algorithm.
Clustering, K-means, Machine Learning, Python, Workflow
- Machine Learning Workflows in Python from Scratch Part 1: Data Preparation - May 29, 2017.
This post is the first in a series of tutorials for implementing machine learning workflows in Python from scratch, covering the coding of algorithms and related tools from the ground up. The end result will be a handcrafted ML toolkit. This post starts things off with data preparation.
Data Preparation, Machine Learning, Python, Workflow