Platinum BlogPython overtakes R, becomes the leader in Data Science, Machine Learning platforms

While Python did not "swallow" R, in 2017 Python ecosystem overtook R as the leading platform for Analytics, Data Science, and Machine Learning and is pulling users from other platforms.



Last KDnuggets Poll asked

Did you use R, Python (along with their packages), both, or other tools for Analytics, Data Science, Machine Learning work in 2016 and 2017?

Python did not quite "swallow" R, but the results, based on 954 voters, show that in 2017 Python ecosystem overtook R as the leading platform for Analytics, Data Science, Machine Learning.
See also my follow-up post: Python vs R – Who Is Really Ahead in Data Science, Machine Learning?


While in 2016 Python was in 2nd place ("Mainly Python" had 34% share vs 42% for "Mainly R"), in 2017 Python had 41% vs 36% for R.

The share of KDnuggets readers who used both R and Python in significant ways also increased from 8.5% to 12% in 2017, while the share who mainly used other tools dropped from 16% to 11%.

Python, R, Other Analytics, Data Science platform, 2016-2017
Fig. 1: Share of Python, R, Both, or Other platforms usage for Analytics, Data Science, Machine Learning, 2016 vs 2017


Next, we examine the transitions between the different platforms.

Python vs R vs Other, 2016 to 2017 Transitions
Fig. 2: Analytics, Data Science, Machine Learning Platforms
Transitions between R, Python, Both, and Other from 2016 to 2017


This chart looks complicated, but we see two key aspects, and Python wins on both:
  • Loyalty: Python users are more loyal, with 91% of 2016 Python users staying with Python. Only 74% of R users stayed, and 60% of other platforms users did.
  • Switching: Only 5% of Python users moved to R, while twice as many - 10% of R users moved to Python. Among those who used both in 2016, only 49% kept using both, 38% moved to Python, and 11% moved to R.
Net we look at trends across multiple years.

In our 2015 Poll on R vs Python we did not offer an option for "Both Python and R", so to compare trends across 4 years, we replace the shares of Python and R in 2016 and 2017 by
Python* = (Python share) + 50% of (Both Python and R)
R* = (R share) + 50% of (Both Python and R)

We see that share of R usage is slowly declining (from about 50% in 2015 to 36% in 2017), while Python share is steadily growing - from 23% in 2014 to 47% in 2017. The share of other platforms is also steadily declining.

Python R Other 2014 17 Trends
Fig. 3: Python vs R vs Other platforms for Analytics, Data Science, and Machine Learning, 2014-17


Finally, we look at trends and patterns by region. The regional participation was:
  • US/Canada, 40%
  • Europe, 35%
  • Asia, 12.5%
  • Latin America, 6.2%
  • Africa/Middle East, 3.6%
  • Australia/NZ, 3.1%

To simplify the chart we split "Both" votes among R and Python, as above, and also combine 4 regions with smaller participation of Asia, AU/NZ, Latin America, and Africa/Middle East into one "Rest" region.

Python R Other Region 2016 2017
Fig. 4: Python* vs R* vs Rest by Region, 2016 vs 2017


We observe the same pattern across all regions:
  • increase in Python share, by 8-10%
  • decline in R share, by about 2-4%
  • decline in other platforms, by 5-7%
The future looks bright for Python users, but we expect that R and other platforms will retain some share in the foreseeable future because of their large embedded base.

Comments

Bill Winkler, computational speed, etc.
We are a SAS shop with ~30 new Ph.D.s and others who primarily use R. Some people are starting to work with Python because we are starting to move to a cloud-based environment. For working with national files using software based on rigorous theoretical models, we use very highly optimized C and FORTRAN routines. SAS, R, and even sometimes Python are 100+ times too slow.