Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.
The 20th annual KDnuggets Software Poll had over 1,800 participants.
The average voter chose 6.1 different tools, so voters with just one choice stood out. We removed about 180 such "lone" votes (2/3 were from one vendor), because even if they represented legitimate users of that tool, their experience was not representative of what Data Scientists do in 2019.
Here is my initial analysis based on remaining participants, after "lone" voters were removed. More detailed association analysis and anonymized data will be published later.
Fig 1: KDnuggets Analytics/Data Science 2019 Software Poll: top tools in 2019, and their share in the 2017, 2018 polls
Interestingly, we see the same group of top 11 tools (each with at least 20% share) in 2019 as in 2018.
Table 1: Top Analytics/Data Science/ML Software in 2019 KDnuggets Poll
Here 201N % share is % of voters who used this software in year 201N.
The average number of tools per respondent was 6.7, very consistent with 7.0 in 2018 and 6.75 in 2017 Poll.
Here are some observations on 3-year trends for top tools.
Python stayed at the top, with almost the same share (65.8% vs 65.6%) of respondents as in 2018.
RapidMiner kept its share at around 51%, which was a reflection of both a large user base and a successful campaign to motivate its users. I note that RapidMiner is not a current advertiser on KDnuggets.
R language share has declined 2 year in a row, but less this year than in the previous year. Several users commented that RStudio should be included, and we will include it in the next poll.
The shares for Deep Learning platforms Tensorflow and especially Keras have grown each year, reflecting the growing usage of Deep Learning in many applications.
SQL is steady, with a share above 30% for many years. So, if you are an aspiring Data Scientist, learn not only TensorFlow but also SQL - it will likely be useful for many more years.
The table below lists the tools were included in KDnuggets Poll in 2018 and have grown 20% or more in share and reached at least 25 voters in 2019.
Table 2: Major Analytics/Data Science/ML Software with the largest increase in usage
Tools that had at least 2% share in 2018 and declined 25% or more in their share in 2019 are in the next table.
Table 3: Major Analytics/Data Science Platform with the largest decline in usage
Some of the decline may be due to lack of vendor campaign to vote in KDnuggets Poll, and some may reflect decline in popularity of the platform as is probably the case for IBM.
Tensorflow remains the dominant platform, and Keras continue to grow as a very popular wrapper on top of Tensorflow. PyTorch has also significantly increased its share. Share of most of the other Deep Learning tools (except for MXnet) has declined.
Table 3: Major Deep Learning Platforms
Here are the main programming languages sorted by popularity.
Next page shows regional participation and results for last 3 years.
Here is my initial analysis based on remaining participants, after "lone" voters were removed. More detailed association analysis and anonymized data will be published later.
Top Analytics, Data Science, Machine Learning Software
Fig 1: KDnuggets Analytics/Data Science 2019 Software Poll: top tools in 2019, and their share in the 2017, 2018 polls
Interestingly, we see the same group of top 11 tools (each with at least 20% share) in 2019 as in 2018.
Table 1: Top Analytics/Data Science/ML Software in 2019 KDnuggets Poll
Software | 2019 % share | 2018 % share | 2017 % share |
---|---|---|---|
Python | 65.8% | 65.6% | 59.0% |
RapidMiner | 51.2% | 52.7% | 31.9% |
R Language | 46.6% | 48.5% | 56.6% |
Excel | 34.8% | 39.1% | 31.5% |
Anaconda | 33.9% | 33.4% | 24.3% |
SQL Language | 32.8% | 39.6% | 39.2% |
Tensorflow | 31.7% | 29.9% | 22.7% |
Keras | 26.6% | 22.2% | 10.7% |
scikit-learn | 25.5% | 24.4% | 21.9% |
Tableau | 22.1% | 26.4% | 21.8% |
Apache Spark | 21.0% | 21.5% | 25.5% |
Here 201N % share is % of voters who used this software in year 201N.
The average number of tools per respondent was 6.7, very consistent with 7.0 in 2018 and 6.75 in 2017 Poll.
Here are some observations on 3-year trends for top tools.
Python stayed at the top, with almost the same share (65.8% vs 65.6%) of respondents as in 2018.
RapidMiner kept its share at around 51%, which was a reflection of both a large user base and a successful campaign to motivate its users. I note that RapidMiner is not a current advertiser on KDnuggets.
R language share has declined 2 year in a row, but less this year than in the previous year. Several users commented that RStudio should be included, and we will include it in the next poll.
The shares for Deep Learning platforms Tensorflow and especially Keras have grown each year, reflecting the growing usage of Deep Learning in many applications.
SQL is steady, with a share above 30% for many years. So, if you are an aspiring Data Scientist, learn not only TensorFlow but also SQL - it will likely be useful for many more years.
Trends
In 2019 we added a number of new entries, and eight of them received at least 25 votes:- XGBoost, 12.7%
- Javascript, 6.8%
- Apache Kafka, 6.0%
- Google Bigquery, 5.2%
- LightGBM, 3.1%
- fastai library, 2.4%
- Apache Storm 1.9%
- CatBoost, 1.8%
The table below lists the tools were included in KDnuggets Poll in 2018 and have grown 20% or more in share and reached at least 25 voters in 2019.
Table 2: Major Analytics/Data Science/ML Software with the largest increase in usage
Software | 2019 % share | 2018 % share | % change |
---|---|---|---|
BigML | 2.6% | 0.9% | 199% |
Julia | 1.7% | 0.7% | 150% |
Databricks Unified Analytics Platform | 2.6% | 1.2% | 115% |
PyTorch | 11.3% | 6.4% | 76% |
Microsoft other ML/Data Science tools | 1.8% | 1.3% | 35% |
Continuing Consolidation?
There were 48 tools with 2% or higher share in 2018, and among them 14 (less than one third) have increased share in 2019, while 34 have decreased their share. This trend which also existed in 2018 suggests continuing consolidation of Data Science / Machine Learning platforms.Tools that had at least 2% share in 2018 and declined 25% or more in their share in 2019 are in the next table.
Table 3: Major Analytics/Data Science Platform with the largest decline in usage
Platform | 2019 % share | 2018 % share | % change |
---|---|---|---|
Dataiku | 2.0% | 6.3% | -68.2% |
TIBCO Spotfire | 1.2% | 3.1% | -62.2% |
IBM DSX/Watson Studio | 1.9% | 4.5% | -58.3% |
IBM SPSS Modeler | 2.4% | 4.9% | -51.2% |
Microsoft Machine Learning Server | 1.2% | 2.1% | -41.8% |
Weka | 6.7% | 11.4% | -41.4% |
MATLAB | 6.1% | 9.3% | -34.5% |
IBM SPSS Statistics | 5.3% | 8.0% | -33.6% |
Some of the decline may be due to lack of vendor campaign to vote in KDnuggets Poll, and some may reflect decline in popularity of the platform as is probably the case for IBM.
Deep Learning Tools
The share of users of Deep Learning tools jumped to 49.8% (!!) , from 33% of voters in 2018 and 32% in 2017.Tensorflow remains the dominant platform, and Keras continue to grow as a very popular wrapper on top of Tensorflow. PyTorch has also significantly increased its share. Share of most of the other Deep Learning tools (except for MXnet) has declined.
Table 3: Major Deep Learning Platforms
Platform | 2019 % share | 2018 % share | % change |
---|---|---|---|
Tensorflow | 31.7% | 29.9% | 5.8% |
Keras | 26.6% | 22.2% | 19.7% |
PyTorch | 11.3% | 6.4% | 75.5% |
Other Deep Learning Tools | 5.6% | 4.9% | 15.2% |
DeepLearning4J | 2.5% | 3.4% | -25.6% |
Apache MXnet | 1.7% | 1.5% | 13.1% |
Microsoft Cognitive Toolkit | 1.6% | 3.0% | -45.5% |
Theano | 1.6% | 4.9% | -67.4% |
Torch | 0.9% | 1.0% | -6.1% |
TFLearn | 0.7% | 1.1% | -34.7% |
Caffe | 0.6% | 1.5% | -58.3% |
Big Data Tools
In 2019, 37% used Big Data Tools vs 33% in 2018. Apache Spark continues to be ahead of Hadoop and we see the emergence of streaming Big Data platforms, like Apache Storm, Flink, or WSO2 Stream Processor. Table below shows the details, with na indicating this software was not included in 2018 poll.Platform | 2019 % share | 2018 % share | % change |
---|---|---|---|
Apache Spark | 21.0% | 21.5% | -2.3% |
Hadoop: Open Source Tools | 12.1% | 11.0% | 10.2% |
SQL on Hadoop tools | 8.4% | 10.2% | -17.3% |
Apache Kafka | 6.0% | na | na |
Google Bigquery | 5.2% | na | na |
Hadoop: Commercial Tools | 4.5% | 5.7% | -20.1% |
Apache Storm | 1.9% | na | na |
Flink | 0.8% | na | na |
WSO2 Stream Processor | 0.5% | na | na |
Programming Languages
Python and R continue to dominate. The new entry this year was Javascript, which got a respectable 6.8% share. Julia share has increased, while most other languages have declined.Here are the main programming languages sorted by popularity.
Platform | 2019 % share | 2018 % share | % change |
---|---|---|---|
Python | 65.8% | 65.6% | 0.2% |
R Language | 46.6% | 48.5% | -4.0% |
SQL Language | 32.8% | 39.6% | -17.2% |
Java | 12.4% | 15.1% | -17.7% |
Unix shell/awk | 7.9% | 9.2% | -13.4% |
C/C++ | 7.1% | 6.8% | 3.7% |
Javascript | 6.8% | na | na |
Other programming and data languages | 5.7% | 6.9% | -17.1% |
Scala | 3.5% | 5.9% | -41.0% |
Julia | 1.7% | 0.7% | 150.4% |
Perl | 1.3% | 1.0% | 25.2% |
Lisp | 0.4% | 0.3% | 46.1% |
Next page shows regional participation and results for last 3 years.