Top Languages for analytics, data mining, data science
The most popular languages continue to be R (used by 61% of KDnuggets readers), Python (39%), and SQL (37%). SAS is stable at around 20%. The highest growth was for Pig/Hive/Hadoop-based languages, R, and SQL, while Perl, C/C++, and Unix tools declined. We also find a small affinity between R and Python users.
By Gregory Piatetsky, Aug 27, 2013. comments
Previous KDnuggets polls looked at high-level Analytics and Data mining software, but sometimes a full-power programming language is needed. That was the focus of the latest KDnuggets Poll, which asked:
Based on a very high response of over 700 voters, the most popular languages continue to be R (now used by 61% of responders), Python (39%), and SQL (37%). On average, there were 2.3 languages used.
For trends, we compared the 2013 results with similar
- 2012 Poll: languages used for analytics / data mining and
- 2011 Poll: languages used for data mining / data analysis?
The language with the highest relative growth (2013 vs 2012) was Julia, which doubled in popularity, but still was used only by 0.7% in 2013.
Among more common languages, the largest relative increases in share of usage from 2012 to 2013 were for
- Pig Latin/Hive/other Hadoop-based languages, 19% growth, from 6.7% in 2012 to 8.0% in 2013
- R, 16% growth
- SQL, 14% growth (perhaps the result of increasing number of SQL interfaces to Hadoop and other Big Data systems?)
The languages with the largest decline is share of usage were
- Lisp/Clojure, 77% down
- Perl, 50% down
- Ruby, 41% down
- C/C++, 35% down
- Unix shell/awk/sed, 25% down
- Java, 22% down
Is there an affinity between R and Python? Yes, people who use R are about 13% more likely to use Python than overall population. Here are the languages more likely to be used with R:
- Julia, 64% more
- Lisp/Clojure, 41% more
- GNU Octave, 27% more
- Pig Latin/Hive/other Hadoop-based languages, 27% more
- Unix shell/awk/sed, 23% more
- Python, 13% more
Here are the full results:
What programming/statistics languages you used for an analytics / data mining / data science work in 2013? [713 votes total] % users in 2013 % users in 2012 % users in 2011 |
|
R (434 voters in 2013) | 60.9% 52.5% 45.1% |
Python (277) | 38.8% 36.1% 24.6% |
SQL (261) | 36.6% 32.1% 32.3% |
SAS (148) | 20.8% 19.7% 21.2% |
Java (118) | 16.5% 21.2% 24.4% |
MATLAB (89) | 12.5% 13.1% 14.6% |
High-level data mining suite (80) | 11.2% not asked in 2012 |
Unix shell/awk/sed (79) | 11.1% 14.7% |
C/C++ (66) | 9.3% 14.3% |
Pig Latin/Hive/other Hadoop-based languages (57) | 8.0% 6.7% |
Other low-level language (42) | 5.9% 11.4% |
GNU Octave (40) | 5.6% 5.9% |
Perl (32) | 4.5% 9.0% |
Ruby (16) | 2.2% 3.8% |
Scala (16) | 2.2% 2.4% |
F# (12) | 1.7% not asked in 2012 |
Lisp/Clojure (7) | 1.0% 4.3% |
Julia (5) | 0.7% 0.3% |
None (2) | 0.3% 0.7% |
Comments
A number of comments, such as one below, pointed that SPSS also has its own language similar to SAS – will include it in the next poll.
Ralph Winters, SPSS Language
It seems odd to exclude SPSS based upon a definition of what is or what is not language. Especially for a language which has such legacy roots, and is backed by IBM. I could argue that both Matlab and R are both not true progamming language, and SAS, as flexible as it is, I would not consider a standarized programming language as well.
Regional participation was
- US/Canada, 50.8%,
- Europe: 25.7%,
- Asia: 11.8%,
- Latin America: 6.7%,
- AU/NZ: 3.2%,
- Africa/Middle East: 1.5%