- Data Engineering Technologies 2021 - Sep 21, 2021.
Emerging technologies supporting the field of data engineering are growing at a rapid clip. This curated list includes the most important offerings available in 2021.
Abacus.ai, Dask, Data Engineering, Databricks, Dataiku, DataRobot, dbt, Fivetran, Pachyderm
- Model Experiments, Tracking and Registration using MLflow on Databricks - Jan 5, 2021.
This post covers how StreamSets can help expedite operations at some of the most crucial stages of Machine Learning Lifecycle and MLOps, and demonstrates integration with Databricks and MLflow.
Data Science, Databricks, DataOps, Experimentation, MLflow, MLOps, Modeling, StreamSets
- Working with Spark, Python or SQL on Azure Databricks - Aug 27, 2020.
Here we look at some ways to interchangeably work with Python, PySpark and SQL using Azure Databricks, an Apache Spark-based big data analytics service designed for data science and data engineering offered by Microsoft.
Apache Spark, Databricks, Microsoft Azure, Python, SQL
- Leaders, Changes, and Trends in Gartner 2020 Magic Quadrant for Data Science and Machine Learning Platforms - Feb 24, 2020.
The Gartner 2020 Magic Quadrant for Data Science and Machine Learning Platforms has the largest number of leaders ever. We examine the leaders and changes and trends vs previous years.
Alteryx, Data Science Platform, Databricks, Dataiku, DataRobot, Domino, Gartner, Google, H2O, IBM, Knime, Machine Learning, Magic Quadrant, MathWorks, Microsoft Azure, RapidMiner, SAS, TIBCO
- [eBook] Standardizing the Machine Learning Lifecycle - Mar 15, 2019.
We explore what makes the machine learning lifecycle so challenging compared to regular software, and share the Databricks approach.
Databricks, ebook, Life Cycle, Machine Learning, MLflow
- [Download] Real-Life ML Examples + Notebooks - Nov 13, 2018.
In this eBook, we will walk you through four Machine Learning use cases on Databricks: Loan Risk Use Case; Advertising Analytics & Prediction Use Case; Market Basket Analysis Problem at Scale; Suspicious Behavior Identification in Video Use Case. Get your copy now!
Databricks, ebook, Jupyter, Machine Learning, Use Cases
- Project Hydrogen, new initiative based on Apache Spark to support AI and Data Science - Aug 16, 2018.
An introduction to Project Hydrogen: how it can assist machine learning and AI frameworks on Apache Spark and what distinguishes it from other open source projects.
AI, Apache Spark, Data Science, Databricks, Distributed Computing, Production
- Deep Learning With Apache Spark: Part 1 - Apr 18, 2018.
First part on a full discussion on how to do Distributed Deep Learning with Apache Spark. This part: What is Spark, basics on Spark+DL and a little more.
Apache Spark, Databricks, Deep Learning, Pipeline
- [ebook] 7 Steps for a Developer to Learn Apache Spark - Apr 17, 2018.
We offer a step-by-step guide to technical content and related assets that to help you learn Apache Spark, whether you're getting started with Spark or are an accomplished developer.
Apache Spark, Databricks, Developer, ebook, Spark SQL
- Apache Spark Key Terms, Explained - Jun 13, 2016.
An overview of 13 core Apache Spark concepts, presented with focus and clarity in mind. A great beginner's overview of essential Spark terminology.
Pages: 1 2
Apache Spark, Databricks, Dataset, Explained, Key Terms, RDD, Tungsten
- Introducing GraphFrames, a Graph Processing Library for Apache Spark - Mar 7, 2016.
An overview of Spark's new GraphFrames, a graph processing library based on DataFrames, built in a collaboration between Databricks, UC Berkeley's AMPLab, and MIT.
Apache Spark, Databricks, Graph Analytics
- Top Spark Ecosystem Projects - Mar 2, 2016.
Apache Spark has developed a rich ecosystem, including both official and third party tools. We have a look at 5 third party projects which complement Spark in 5 different ways.
Apache Mesos, Apache Spark, Cassandra, Databricks, Distributed Systems
- Auto-Scaling scikit-learn with Spark - Feb 11, 2016.
Databricks gives us an overview of the spark-sklearn library, which automatically and seamlessly distributes model tuning on a Spark cluster, without impacting workflow.
Apache Spark, Databricks, Open Source, scikit-learn
- Exclusive Interview: Matei Zaharia, creator of Apache Spark, on Spark, Hadoop, Flink, and Big Data in 2020 - May 22, 2015.
Apache Spark is one the hottest Big Data technologies in 2015. KDnuggets talks to Matei Zaharia, creator of Apache Spark, about key things to know about it, why it is not a replacement for Hadoop, how it is better than Flink, and vision for Big Data in 2020.
Apache Spark, Big Data, Databricks, Flink, Hadoop, Matei Zaharia, MLlib, Spark SQL
- Apache Spark: O’Reilly Certification, EU Training, University Program - Sep 26, 2014.
Recent news on Apache Spark includes developer certification from O'Reilly, upcoming training workshops in EU by Databricks, and Spark tutorial events at major universities.
Academics, Apache Spark, Big Data, Certification, Databricks, Paco Nathan, Strata, Training