- Parallelizing Python Code - Oct 4, 2021.
This article reviews some common options for parallelizing Python code, including process-based parallelism, specialized libraries, ipython parallel, and Ray.
Distributed Computing, Parallelism, Programming, Python, Ray
- Writing Your First Distributed Python Application with Ray - Aug 16, 2021.
Using Ray, you can take Python code that runs sequentially and transform it into a distributed application with minimal code changes. Read on to find out why you should use Ray, and how to get started.
Distributed Computing, Parallelism, Python, Workflow
- How to speed up a Deep Learning Language model by almost 50X at half the cost - Jun 9, 2021.
In this blog post, we show how to accelerate fine-tuning the ALBERT language model while also reducing costs by using Determined’s built-in support for distributed training with AWS spot instances.
AWS, Deep Learning, Distributed Computing, Hugging Face, NLP
- Speeding up Scikit-Learn Model Training - Mar 5, 2021.
If your scikit-learn models are taking a bit of time to train, then there are several techniques you can use to make the processing more efficient. From optimizing your model configuration to leveraging libraries to speed up training through parallelization, you can build the best scikit-learn model possible in the least amount of time.
Distributed Computing, Machine Learning, Optimization, scikit-learn
- Dask and Pandas: No Such Thing as Too Much Data - Mar 4, 2021.
Do you love pandas, but don't love it when you reach the limits of your memory or compute resources? Dask provides you with the option to use the pandas API with distributed data and computing. Learn how it works, how to use it, and why it’s worth the switch when you need it most.
Dask, Distributed Computing, Pandas
- Project Hydrogen, new initiative based on Apache Spark to support AI and Data Science - Aug 16, 2018.
An introduction to Project Hydrogen: how it can assist machine learning and AI frameworks on Apache Spark and what distinguishes it from other open source projects.
AI, Apache Spark, Data Science, Databricks, Distributed Computing, Production
- Introducing Dask-SearchCV: Distributed hyperparameter optimization with Scikit-Learn - May 12, 2017.
We introduce a new library for doing distributed hyperparameter optimization with Scikit-Learn estimators. We compare it to the existing Scikit-Learn implementations, and discuss when it may be useful compared to other approaches.
Dask, Distributed Computing, Distributed Systems, Machine Learning, Optimization, scikit-learn
- Introducing Dask for Parallel Programming: An Interview with Project Lead Developer - Sep 7, 2016.
Introducing Dask, a flexible parallel computing library for analytics. Learn more about this project built with interactive data science in mind in an interview with its lead developer.
Analytics, Continuum Analytics, Dask, Data Science, Distributed Computing, Parallelism, Python, Scientific Computing