- Would You Survive the Titanic? A Guide to Machine Learning in Python Part 2 - Jul 26, 2016.
This is part 2 of a 3 part introductory series on machine learning in Python, using the Titanic dataset.
Pages: 1 2
Machine Learning, Python, Titanic
- Would You Survive the Titanic? A Guide to Machine Learning in Python Part 1 - Jul 25, 2016.
Check out the first of a 3 part introductory series on machine learning in Python, fueled by the Titanic dataset. This is a great place to start for a machine learning newcomer.
Machine Learning, Python, scikit-learn, Titanic
- SAS vs R vs Python: Which Tool Do Analytics Pros Prefer? - Jul 22, 2016.
There are lots of flame wars involving different data science and analytics tools... but this isn't one of them. Check out the quantitative results and analysis of a Burtch Works survey on the subject.
Burtch Works, Python, R, SAS, Survey
- Building a Data Science Portfolio: Machine Learning Project Part 1 - Jul 20, 2016.
Dataquest's founder has put together a fantastic resource on building a data science portfolio. This first of three parts lays the groundwork, with subsequent posts over the following 2 days. Very comprehensive!
Pages: 1 2
Advice, Career, Data Science, Data Scientist, Dataquest, Machine Learning, Portfolio, Project, Python
- Statistical Data Analysis in Python - Jul 18, 2016.
This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects, taking the form of a set of IPython notebooks.
IPython, Jupyter, Pandas, Python, Statistical Analysis
- America’s Next Topic Model - Jul 15, 2016.
Topic modeling is a a great way to get a bird's eye view on a large document collection using machine learning. Here are 3 ways to use open source Python tool Gensim to choose the best topic model.
LDA, NLP, Python, Text Mining, Topic Modeling, Unsupervised Learning
- 5 Deep Learning Projects You Can No Longer Overlook - Jul 12, 2016.
There are a number of "mainstream" deep learning projects out there, but many more niche projects flying under the radar. Have a look at 5 such projects worth checking out.
C++, Deep Learning, Javascript, Machine Learning, Neural Networks, Overlook, Python
- Interview: Florian Douetteau, Dataiku Founder, on Empowering Data Scientists - Jul 7, 2016.
Here is an interview with Florian Douetteau, founder of Dataiku, on how their tools empower data scientists, and how data science itself is evolving.
Ajay Ohri, API, Data Science Tools, Dataiku, Florian Douetteau, Python, R
- Deep Residual Networks for Image Classification with Python + NumPy - Jul 7, 2016.
This post outlines the results of an innovative Deep Residual Network implementation for Image Classification using Python and NumPy.
Deep Learning, Neural Networks, numpy, Python
- Mining Twitter Data with Python Part 7: Geolocation and Interactive Maps - Jul 6, 2016.
The final part of this 7 part series explores using geolocation and interactive maps with Twitter data.
Data Visualization, Geo-Localization, Javascript, Python, Social Media, Social Media Analytics, Text Mining, Twitter
- Mining Twitter Data with Python Part 6: Sentiment Analysis Basics - Jul 5, 2016.
Part 6 of this series builds on the previous installments by exploring the basics of sentiment analysis on Twitter data.
Python, Sentiment Analysis, Social Media, Social Media Analytics, Text Mining, Twitter
- Mining Twitter Data with Python Part 5: Data Visualisation Basics - Jun 29, 2016.
Part 5 of this series takes on data visualization, as we look to make sense of our data and highlight interesting insights.
D3.js, Data Visualization, Python, Social Media, Social Media Analytics, Text Mining, Twitter
- 5 More Machine Learning Projects You Can No Longer Overlook - Jun 28, 2016.
There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects.
Computer Vision, Data Preparation, Data Preprocessing, Javascript, Machine Learning, Natural Language Processing, NLP, Overlook, Python
- Mining Twitter Data with Python Part 4: Rugby and Term Co-occurrences - Jun 27, 2016.
Part 4 of this series employs some of the lessons learned thus far to analyze tweets related to rugby matches and term co-occurrences.
Python, Social Media, Social Media Analytics, Text Mining, Twitter
- Mining Twitter Data with Python Part 1: Collecting Data - Jun 15, 2016.
Part 1 of a 7 part series focusing on mining Twitter data for a variety of use cases. This first post lays the groundwork, and focuses on data collection.
Python, Social Media, Social Media Analytics, Twitter
- R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results - Jun 6, 2016.
R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R. RapidMiner remains the most popular general Data Science platform. Big Data tools used by almost 40%, and Deep Learning usage doubles.
Pages: 1 2
Data Mining Software, Data Science Platform, Poll, Python, Python vs R, R, RapidMiner, SQL
- 5 Machine Learning Projects You Can No Longer Overlook - May 19, 2016.
We all know the big machine learning projects out there: Scikit-learn, TensorFlow, Theano, etc. But what about the smaller niche projects that are actively developed, providing useful services to users? Here are 5 such projects.
Data Cleaning, Deep Learning, Machine Learning, Open Source, Overlook, Pandas, Python, scikit-learn, Theano
- Top 10 IPython Notebook Tutorials for Data Science and Machine Learning - Apr 22, 2016.
A list of 10 useful Github repositories made up of IPython (Jupyter) notebooks, focused on teaching data science and machine learning. Python is the clear target here, but general principles are transferable.
Data Science, Deep Learning, GitHub, IPython, Machine Learning, Python, Sebastian Raschka, TensorFlow
- Comprehensive Guide to Learning Python for Data Analysis and Data Science - Apr 20, 2016.
Want to make a career change to Data Science using python? Well learning anything on your own can be a challenge & a little guidance could be a great help, that is exactly what this article will provide you with.
Pages: 1 2
Data Analysis, Data Science Education, DataCamp, Python
- Doing Data Science: A Kaggle Walkthrough – Cleaning Data - Mar 23, 2016.
Gain insight into the process of cleaning data for a specific Kaggle competition, including a step by step overview.
Pages: 1 2
Data Cleaning, Data Preparation, Kaggle, Pandas, Python
- New KDnuggets Tutorials Page: Learn R, Python, Data Visualization, Data Science, and more - Mar 16, 2016.
Introducing new KDnuggets Tutorials page with useful resources for learning about Business Analytics, Big Data, Data Science, Data Mining, R, Python, Data Visualization, Spark, Deep Learning and more.
Data Science Education, Online Education, Python, R
- scikit-feature: Open-Source Feature Selection Repository in Python - Mar 3, 2016.
scikit-feature is an open-source feature selection repository in python, with around 40 popular algorithms in feature selection research. It is developed by Data Mining and Machine Learning Lab at Arizona State University.
Data Mining, Data Science, Feature Extraction, Feature Selection, Machine Learning, Python
- Scikit Flow: Easy Deep Learning with TensorFlow and Scikit-learn - Feb 12, 2016.
Scikit Learn is a new easy-to-use interface for TensorFlow from Google based on the Scikit-learn fit/predict model. Does it succeed in making deep learning more accessible?
Deep Learning, Google, Matthew Mayo, Python, scikit-learn, TensorFlow
- Data Science Skills for 2016 - Feb 12, 2016.
As demand for the hottest job is getting hotter in new year, the skill set required for them is getting larger. Here, we are discussing the skills which will be in high demand for data scientist which include data visualization, Apache Spark, R, python and many more.
Apache Spark, CrowdFlower, Data Science, Python, Skills, SQL
- Python Data Science with Pandas vs Spark DataFrame: Key Differences - Jan 29, 2016.
A post describing the key differences between Pandas and Spark's DataFrame format, including specifics on important regular processing features, with code samples.
Apache Spark, Pandas, Python
- Useful Data Science: Feature Hashing - Jan 28, 2016.
Feature engineering plays major role while solving the data science problems. Here, we will learn Feature Hashing, or the hashing trick which is a method for turning arbitrary features into a sparse binary vector.
Feature Engineering, Hashing, Python, Will McGinnis
- Implementing Your Own k-Nearest Neighbor Algorithm Using Python - Jan 27, 2016.
A detailed explanation of one of the most used machine learning algorithms, k-Nearest Neighbors, and its implementation from scratch in Python. Enhance your algorithmic understanding with this hands-on coding exercise.
Pages: 1 2 3
K-nearest neighbors, Python, Python Tutorial
- Top New Features in Orange 3 Data Mining Platform - Dec 10, 2015.
The main technical advantage of Orange 3 is its integration with NumPy and SciPy libraries. Other improvements include reading online data, working through queries for SQL and pre-processing.
Pages: 1 2
Data Mining, Data Visualization, numpy, Orange, Python, scikit-learn
- Using Python and R together: 3 main approaches - Dec 10, 2015.
Well if Data Science and Data Scientists can not decide on what data to choose to help them decide which language to use, here is an article to use BOTH.
Ajay Ohri, Jupyter, Python, Python vs R, R
- Beyond One-Hot: an exploration of categorical variables - Dec 8, 2015.
Coding categorical variables into numbers, by assign an integer to each category ordinal coding of the machine learning algorithms. Here, we explore different ways of converting a categorical variable and their effects on the dimensionality of data.
Data Exploration, Machine Learning, Python, Will McGinnis
- 7 Steps to Mastering Machine Learning With Python - Nov 19, 2015.
There are many Python machine learning resources freely available online. Where to begin? How to proceed? Go from zero to Python machine learning hero in 7 steps!
Pages: 1 2
7 Steps, Anaconda, Caffe, Deep Learning, Machine Learning, Matthew Mayo, Python, scikit-learn, Theano
- Getting started with Python and Apache Flink - Nov 13, 2015.
Apache Flink built on top of the distributed streaming dataflow architecture, which helps to crunch massive velocity and volume data sets. With version 1.0 it provided python API, learn how to write a simple Flink application in python.
Flink, Python, Realtime Analytics, Streaming Analytics, Will McGinnis
- Topological Data Analysis – Open Source Implementations - Nov 6, 2015.
Topological Data Analysis (TDA) is making waves in the analytics community lately, but are there open source options available?
C++, Java, Matthew Mayo, Open Source, Python, R, Topological Data Analysis
- Overview of Python Visualization Tools - Nov 3, 2015.
An overview and comparison of the leading data visualization packages and tools for Python, including Pandas, Seaborn, ggplot, Bokeh, pygal, and Plotly.
Pages: 1 2
Data Visualization, ggplot2, Pandas, Plotly, Python
- Integrating Python and R, Part 2: Executing R from Python and Vice Versa - Oct 30, 2015.
The second in a series of blog posts that: outline the basic strategy for integrating Python and R, we will concentrate on how the two scripts can be linked together by getting R to call Python and vice versa.
Python, Python vs R, R
- Integrating Python and R into a Data Analysis Pipeline, Part 1 - Oct 29, 2015.
The first in a series of blog posts that: outline the basic strategy for integrating Python and R, run through the different steps involved in this process; and give a real example of how and why you would want to do this.
Pages: 1 2
Data Analysis, Mango Solutions, Python, Python vs R, R
- Top /r/MachineLearning Posts, September: Implement a neural network from scratch in C++ - Oct 6, 2015.
Neural network in C++ for beginners, Chinese character handwriting recognition beats humans, a handy machine learning algorithm cheat sheet, neural nets versus functional programming, and a neural nets paper repository.
C++, Deep Learning, Matthew Mayo, Neural Networks, Python, R, Reddit
- 60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and more - Sep 4, 2015.
Here is a great collection of eBooks written on the topics of Data Science, Business Analytics, Data Mining, Big Data, Machine Learning, Algorithms, Data Science Tools, and Programming Languages for Data Science.
Book, Brendan Martin, Data Mining, Data Science, Free ebook, Machine Learning, Python, R, SQL
- How to become a Data Scientist for Free - Aug 28, 2015.
Here are the most required skills for a data scientist position based on ReSkill’s analyses of thousands of job posts and free resources to learn each skill.
Data Science Education, Data Scientist, Java, Online Education, Python, R, SQL, Statistics
- TheWalnut.io: An Easy Way to Create Algorithm Visualizations - Jul 29, 2015.
Google's DeepDream project has gone viral which allows to visualize the deep learning neural networks. It highlights a need for a generalized algorithm visualization tool, in this post we introduce to you one such effort.
Algorithms, Data Visualization, Javascript, Python
- 50+ Data Science and Machine Learning Cheat Sheets - Jul 14, 2015.
Gear up to speed and have Data Science & Data Mining concepts and commands handy with these cheatsheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark and Machine learning algorithms.
Cheat Sheet, Data Science, Django, Hadoop, Machine Learning, Python, R
- Popular Deep Learning Tools – a review - Jun 18, 2015.
Deep Learning is the hottest trend now in AI and Machine Learning. We review the popular software for Deep Learning, including Caffe, Cuda-convnet, Deeplearning4j, Pylearn2, Theano, and Torch.
Convolutional Neural Networks, CUDA, Deep Learning, GPU, Pylearn2, Python, Ran Bi, Theano, Torch
- Which Big Data, Data Mining, and Data Science Tools go together? - Jun 11, 2015.
We analyze the associations between the top Big Data, Data Mining, and Data Science tools based on the results of 2015 KDnuggets Software Poll. Download anonymized data and analyze it yourself.
Apache Spark, Data Mining Software, Excel, Hadoop, Knime, Poll, Python, R, RapidMiner, SQL
- Top 20 Python Machine Learning Open Source Projects - Jun 1, 2015.
We examine top Python Machine learning open source projects on Github, both in terms of contributors and commits, and identify most popular and most active ones.
GitHub, Machine Learning, Open Source, Python, scikit-learn
- R vs Python for Data Science: The Winner is … - May 26, 2015.
In the battle of "best" data science tools, python and R both have their pros and cons. Selecting one over the other will depend on the use-cases, the cost of learning, and other common tools required.
Data Science Tools, DataCamp, Python, Python vs R, R
- R leads RapidMiner, Python catches up, Big Data tools grow, Spark ignites - May 25, 2015.
R is the most popular overall tool among data miners, although Python usage is growing faster. RapidMiner continues to be most popular suite for data mining/data science. Hadoop/Big Data tools usage grew to 29%, propelled by 3x growth in Spark. Other tools with strong growth include H2O (0xdata), Actian, MLlib, and Alteryx.
Actian, Apache Spark, Data Mining Software, H2O, Knime, Poll, Python, R, RapidMiner, SQL
- Algorithmia Tested: Human vs Automated Tag Generation - Apr 21, 2015.
Algorithmia, the marketplace for algorithms, can be a platform for hosting APIs to do a plethora of text analytics and information retrieval tasks. Automatic post tagging is done in this case study to demonstrate the effectiveness and ease-of-use of the platform.
Pages: 1 2
Algorithmia, API, Grant Marshall, Information Retrieval, Python, Text Analytics
- Top KDnuggets tweets, Mar 16-18: 87 Studies shown that accurate numbers aren’t more useful than the ones you make up (Dilbert) - Mar 19, 2015.
Also Sirius - a free, open-source version of Siri; #PI art: the first 13,689 digits of pi; Great tutorial + #Python code: 1-Layer Neural Networks.
Cartoon, Data Preparation, Deep Learning, Dilbert, Excel, Neural Networks, pi, Python, Siri
- Machine Learning Table of Elements Decoded - Mar 11, 2015.
Machine learning packages for Python, Java, Big Data, Lua/JS/Clojure, Scala, C/C++, CV/NLP, and R/Julia are represented using a cute but ill-fitting metaphor of a periodic table. We extract the useful links.
Big Data Software, Java, Julia, Machine Learning, NLP, Python, R, Scala, scikit-learn, Weka
- Most Demanded Data Science and Data Mining Skills - Dec 15, 2014.
Our analysis of most demanded data scientist skills shows that Data Science is a team effort focused on business analytics, with top 5 platform skills being SQL, Python, R, SAS, and Hadoop.
Data Science Skills, Data Scientist, Hadoop, New York-NY, Python, R, SAS, Skills, SQL
- Most Popular Slideshare Presentations on Data Science - Nov 25, 2014.
Top SlideShare data science presentations provide a unique view on topics like data science management, using Python and NumPy in your data science project, and leveraging data science for enterprise big data.
API, Big Data, Data Science Skills, Data Science Tutorial, Python, SlideShare
- Top KDnuggets tweets, Nov 17-18: Keep this #Python Cheat Sheet handy; Is #BigData The Most Hyped Technology Ever? - Nov 19, 2014.
Keep this #Python Cheat Sheet handy when learning to code; Is #BigData The Most Hyped Technology Ever? No (at least not yet); How to become a data scientist in 8 (not so) easy steps;R and Hadoop make Machine Learning Possible for Everyone.
Big Data Hype, Cheat Sheet, Data Scientist, Data Visualization, Python
- Most Popular Slideshare Presentations on Data Mining - Nov 13, 2014.
SlideShare data mining presentations cover many topics, offering a unique way of consuming data mining content and exploring a variety of slideshows, both narrow and broad in scope.
API, Data Mining Training, Python, SlideShare
- Four main languages for Analytics, Data Mining, Data Science - Aug 18, 2014.
New KDnuggets Poll shows the growing dominance of four main languages for Analytics, Data Mining, and Data Science: R, SAS, Python, and SQL - used by 91% of data scientists - and decline in popularity of other languages, except for Julia and Scala.
Analytics Languages, Data Mining, Data Science, Julia, Poll, Python, R, SAS, Scala, SQL
- KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll: RapidMiner Continues To Lead - Jun 7, 2014.
With over 3,000 data miners taking part in KDnuggets 15th Annual Software Poll, RapidMiner continues to lead. Free software is used much more outside US, and Hadoop usage grows fastest in Asia.
Data Mining Software, Excel, Hadoop, Knime, Poll, Python, R, RapidMiner, SAS, SQL, SQL Server, Weka
- Guide to Data Science Cheat Sheets - May 12, 2014.
Selection of the most useful Data Science cheat sheets, covering SQL, Python (including NumPy, SciPy and Pandas), R (including Regression, Time Series, Data Mining), MATLAB, and more.
Cheat Sheet, Data Science, Python, R, SQL
- Anaconda: Free enterprise-ready Python for Big data, Predictive Analytics - Feb 15, 2014.
125+ cross-platform tested and optimized Python packages for advanced analytics totally free, even for commercial use.
Anaconda, Cross-Platform, Free Enterprise-Ready, Python