- Everything a Data Scientist Should Know About Data Management - Oct 22, 2019.
For full-stack data science mastery, you must understand data management along with all the bells and whistles of machine learning. This high-level overview is a road map for the history and current state of the expansive options for data storage and infrastructure solutions.
Data Management, Data Scientist, Hadoop
- The Death of Big Data and the Emergence of the Multi-Cloud Era - Jul 11, 2019.
The Era of Big Data is coming to an end as the focus shifts from how we collect data to processing that data in real-time. Big Data is now a business asset supporting the next eras of multi-cloud support, machine learning, and real-time analytics.
Big Data, Cloudera, Hadoop, Multi-cloud, Realtime Analytics
- Apache Spark Introduction for Beginners - Oct 18, 2018.
An extensive introduction to Apache Spark, including a look at the evolution of the product, use cases, architecture, ecosystem components, core concepts and more.
Apache Spark, Beginners, Hadoop, R
- Hadoop for Beginners - Sep 12, 2018.
An introduction to Hadoop, a framework that enables you to store and process large data sets in parallel and distributed fashion.
Beginners, Big Data, Hadoop
- Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis - May 22, 2018.
Python continues to eat away at R, RapidMiner gains, SQL is steady, Tensorflow advances pulling along Keras, Hadoop drops, Data Science platforms consolidate, and more.
Pages: 1 2
Anaconda, Data Mining Software, Data Science Platform, Hadoop, Keras, Poll, Python, R, RapidMiner, SQL, TensorFlow, Trends
- Ranking Popular Distributed Computing Packages for Data Science - Mar 20, 2018.
We examined 140 frameworks and distributed programing packages and came up with a list of top 20 distributed computing packages useful for Data Science, based on a combination of Github, Stack Overflow, and Google results.
Apache Spark, Data Science, Distributed Systems, GitHub, Hadoop
- Updates & Upserts in Hadoop Ecosystem with Apache Kudu - Oct 27, 2017.
A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
Apache, Big Data, Data Management, Hadoop, Java, NoSQL
- Are Data Lakes Fake News? - Sep 6, 2017.
The quick answer is yes, and the biggest problem is that the term “Data Lakes” has been overloaded by vendors and analysts with different meanings, resulting in an ill-defined and blurry concept.
Data Lakes, Data Warehouse, ETL, Fake News, Hadoop
- Simplifying Data Pipelines in Hadoop: Overcoming the Growing Pains - May 18, 2017.
Moving to Hadoop is not without its challenges—there are so many options, from tools to approaches, that can have a significant impact on the future success of a business’ strategy. Data management and data pipelining can be particularly difficult.
Data Management, Data Platform, Hadoop, SVDS
- HDFS vs. HBase : All you need to know - May 15, 2017.
Hadoop Distributed File System (HDFS), and Hbase (Hadoop database) are key components of Big Data ecosystem. This blog explains the difference between HDFS and HBase with real-life use cases where they are best fit.
Big Data, Hadoop, HBase, HDFS
- Key Takeaways from Strata + Hadoop World 2017 San Jose, Day 1 - Mar 24, 2017.
The focus is increasingly shifting from storing and processing Big Data in an efficient way, to applying traditional and new machine learning techniques to drive higher value from the data at hand.
CA, Cloudera, Coursera, Hadoop, MapR, Pinterest, San Jose, Strata
- What Top Firms Ask: 100+ Data Science Interview Questions - Mar 22, 2017.
Check this out: A topic wise collection of 100+ data science interview questions from top companies.
Algorithms, Data Science, Google, Hadoop, Interview Questions, Machine Learning, Microsoft, Statistics, Uber
- 50+ Data Science, Machine Learning Cheat Sheets, updated - Dec 14, 2016.
Gear up to speed and have concepts and commands handy in Data Science, Data Mining, and Machine learning algorithms with these cheat sheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark, Matlab, and Java.
Cheat Sheet, Data Science, Django, Hadoop, Java, Machine Learning, MATLAB, Python, R
- 5 EBooks to Read Before Getting into A Data Science or Big Data Career - Aug 11, 2016.
A short, carefully-curated list of 5 free ebooks to help you better understand what Data Science is all about and how you can best prepare for a career in data science, big data, and data analysis.
Big Data, Free ebook, Hadoop, Programming Languages, Simplilearn, Tableau
- Big Data Key Terms, Explained - Aug 11, 2016.
Just getting started with Big Data, or looking to iron out the wrinkles in your current understanding? Check out these 20 Big Data-related terms and their concise definitions.
Pages: 1 2
3Vs of Big Data, Apache Spark, Big Data, Business Intelligence, Cloud Computing, Data Warehouse, Explained, Hadoop, Key Terms, Predictive Analytics
- 100 Active Blogs on Analytics, Big Data, Data Mining, Data Science, Machine Learning - Mar 29, 2016.
Stay on top of your data science skills game! Here’s a list of about 100 most active and interesting blogs on Big Data, Data Science, Data Mining, Machine Learning, and Artificial intelligence.
Pages: 1 2
Big Data, Blogs, Data Science, Deep Learning, Hadoop, Machine Learning
- Top Big Data Processing Frameworks - Mar 3, 2016.
A discussion of 5 Big Data processing frameworks: Hadoop, Spark, Flink, Storm, and Samza. An overview of each is given and comparative insights are provided, along with links to external resources on particular related topics.
Apache Samza, Apache Spark, Apache Storm, Flink, Hadoop
- Data Lake Plumbers: Operationalizing the Data Lake - Feb 18, 2016.
Gain insight into data lakes, their benefits, when they are appropriate, and how to operationalize them. How do they compare to the data warehouse?
Data Lake, Data Warehouse, ETL, Hadoop
- Career path explained: Big Data Hadoop DEVELOPER to ARCHITECT - Nov 24, 2015.
The path to becoming a Big Data and Hadoop Architect is fraught with major challenges and responsibilities, but here is a handy infographic to help you chart out your path.
Big Data, Big Data Architect, Developer, Hadoop, Simplilearn
- Interview: Thanigai Vellore, Art.com on Delivering Contextually Relevant Search Experience - Jul 23, 2015.
We discuss the role of Analytics at Art.com, the polyglot data architecture at Art.com, the use cases for Hadoop, vendor selection, supporting semantic search and experience with Avro.
Architecture, Art.com, Avro, Hadoop, HBase, Interview, Semantic Analysis, Solr, Thanigai Vellore
- 50+ Data Science and Machine Learning Cheat Sheets - Jul 14, 2015.
Gear up to speed and have Data Science & Data Mining concepts and commands handy with these cheatsheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark and Machine learning algorithms.
Cheat Sheet, Data Science, Django, Hadoop, Machine Learning, Python, R
- Which Big Data, Data Mining, and Data Science Tools go together? - Jun 11, 2015.
We analyze the associations between the top Big Data, Data Mining, and Data Science tools based on the results of 2015 KDnuggets Software Poll. Download anonymized data and analyze it yourself.
Apache Spark, Data Mining Software, Excel, Hadoop, Knime, Poll, Python, R, RapidMiner, SQL
- Exclusive Interview: Matei Zaharia, creator of Apache Spark, on Spark, Hadoop, Flink, and Big Data in 2020 - May 22, 2015.
Apache Spark is one the hottest Big Data technologies in 2015. KDnuggets talks to Matei Zaharia, creator of Apache Spark, about key things to know about it, why it is not a replacement for Hadoop, how it is better than Flink, and vision for Big Data in 2020.
Apache Spark, Big Data, Databricks, Flink, Hadoop, Matei Zaharia, MLlib, Spark SQL
- Most Viewed Big Data Videos on YouTube - May 9, 2015.
The top Big Data YouTube videos by those like Hortonworks and Kirk D. Borne cover diverse topics including Hadoop, Big Data Trends, Deep Learning, and Big Data Leadership.
Big Data, Cloudera, Deep Learning, Google, Grant Marshall, Hadoop, IBM, Kirk D. Borne, TED, Youtube
- Hadoop as a Service: 18 Cloud Options - Apr 2, 2015.
Hadoop as a service in the cloud makes big data applications and projects easier to approach and these 18 platforms each provide their own unique solutions.
AWS, Big Data Services, Cloud, Cloudera, Hadoop, Hortonworks, Information Management, MapR, Microsoft Azure
- 16 NoSQL, NewSQL Databases To Watch - Dec 15, 2014.
NoSQL and NewSQL databases have become much more important with the proliferation of big, mobile, and networked data, and these sixteen database solutions are some of the biggest up-and-comers.
Hadoop, InformationWeek, MongoDB, NoSQL, Oracle, VoltDB
- Most Demanded Data Science and Data Mining Skills - Dec 15, 2014.
Our analysis of most demanded data scientist skills shows that Data Science is a team effort focused on business analytics, with top 5 platform skills being SQL, Python, R, SAS, and Hadoop.
Data Science Skills, Data Scientist, Hadoop, New York-NY, Python, R, SAS, Skills, SQL
- Interview: Daqing Zhao, Macys.com on Building Effective Data Models for Marketing - Dec 11, 2014.
We discuss the challenges in identifying the fair price of ad media, recommendations for building effective models for online marketing, unique challenges of Mobile channel, selection of Big Data tools, and more.
Daqing Zhao, Data Models, Data Science Skills, Hadoop, Interview, Macy's, Marketing, Mobile, Tools
- Why Azure ML is the Next Big Thing for Machine Learning? - Nov 17, 2014.
With advanced capabilities, free access, strong support for R, cloud hosting benefits, drag-and-drop development and many more features, Azure ML is ready to take the consumerization of ML to the next level.
Azure ML, Cloud Computing, Hadoop, Machine Learning, Marketplace, Microsoft Azure, Nate Silver, Predictive Analytics, Strata
- R and Hadoop make Machine Learning Possible for Everyone - Nov 16, 2014.
R and Hadoop make machine learning approachable enough for inexperienced users to begin analyzing and visualizing interesting data to start down the path in this lucrative field.
Data Science Skills, Hadoop, Hadoop 2.0, Joel Horwitz, LinkedIn, Machine Learning, R
- 18 essential Hadoop tools - Aug 1, 2014.
Hadoop tools develop at a rapid rate, and keeping up with the latest can be difficult. Here we detail 18 of the most essential tools that work well with Hadoop.
Apache Spark, Data Infrastructure, Hadoop
- Interview: Sastry Malladi, StubHub on Designing Big Data Architecture for the Unknown Future - Jul 28, 2014.
We discuss the Big Data architecture at StubHub, important factors in architecture design, hybrid approach of using Big Data along with traditional data warehouses, challenges, importance of meta-data and more.
Architecture, Challenges, Design, Hadoop, Interview, Metadata, Personalization, Recommendation, Sastry Malladi, StubHub
- Containers: The Enabler of YARN - Jul 28, 2014.
The evolution of a data-center operating system is discussed along with the underlying challenges and approaches being followed. Containers play a big role in enabling the required abstraction and deliver additional benefits.
Altiscale, Applications, Containers, Docker, Hadoop, MapReduce, Mesos, Virtualization, YARN
- KDnuggets Analytics, Data Mining, Data Science Software Poll – Analyzed - Jun 17, 2014.
We analyze the results of KDnuggets Software Poll, including correlations between tools, and relationships between commercial, free, and Hadoop/Big Data tools. We identify a potential capability gap. Download anonymized data and analyze it yourself.
Data Mining Software, Hadoop, Poll, R, RapidMiner
- KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll: RapidMiner Continues To Lead - Jun 7, 2014.
With over 3,000 data miners taking part in KDnuggets 15th Annual Software Poll, RapidMiner continues to lead. Free software is used much more outside US, and Hadoop usage grows fastest in Asia.
Data Mining Software, Excel, Hadoop, Knime, Poll, Python, R, RapidMiner, SAS, SQL, SQL Server, Weka
- Poll Results: Data Types/Sources Analyzed - May 17, 2014.
Trends in data sources for data mining include: table data dominates, followed by time series and text; audio, JSON grows in popularity, while itemsets decline; 70% access DB engines, but only 20% access NoSQL stores; Hadoop, MongoDB used more for text; Europe is lagging in NoSQL usage.
Data types, Hadoop, NoSQL, Poll, Relational Databases
- Cartoon: Data Scientist Salary Negotiation - Apr 29, 2014.
New KDnuggets Cartoon looks at Data Scientist Salary Negotiation situation.
Cartoon, Data Scientist, Hadoop, Salary
- Interactive Big Data Timeline - Apr 8, 2014.
A very interesting interactive Big Data timeline takes you from the beginning of information overload in 1880s to Business Intelligence, World Wide Web, Hadoop, Cloud, and more.
3Vs of Big Data, ERP, Gil Press, Hadoop, IBM, Information Overload, Timeline
- Is Data Scientist the right career path for you? Candid advice - Mar 28, 2014.
Candid advice from an industry veteran reveals the true picture behind the much-talked-about Data Scientist "glamour" and helps people have the right expectations for a Data Science career.
Advice, Career, Data Science, Data Scientist, Hadoop, Paco Nathan, Recommendation, Visualization