- Data Labeling for Machine Learning: Market Overview, Approaches, and Tools - Dec 13, 2021.
So much of data science and machine learning is founded on having clean and well-understood data sources that it is unsurprising that the data labeling market is growing faster than ever. Here, we highlight many of the top players in this industry and the techniques they use to help you consider which might make a good partner for your needs.
Big Data, Crowdsourcing, Data Classification, Data Labeling, Data Mining, Data Platform
- Essential Features of An Efficient Data Integration Solution - Aug 24, 2021.
This blog highlights the essential features of a data integration solution that help an organization generate consistent and accurate data to keep the business running smoothly.
Big Data, Data Analytics, Data Integration, Data Processing
- Model Drift in Machine Learning – How To Handle It In Big Data - Aug 17, 2021.
Rendezvous Architecture helps you run and choose outputs from a Champion model and many Challenger models running in parallel without many overheads. The original approach works well for smaller data sets, so how can this idea adapt to big data pipelines?
Big Data, Data Engineering, Data Preparation, Machine Learning, Model Drift
- Querying the Most Granular Demographics Dataset - Aug 13, 2021.
Having access to broad and detailed population data can potentially offer enormous value to any organization looking to interact with specific demographics. However, access alone is not sufficient without being able to leverage advanced techniques to explore and visualize the data.
Big Data, Data Visualization, Geolocation, Neo4j, Open Source
- Data Monetization 101 - Jul 30, 2021.
The evolving marketplace of data now includes many firms that support a variety of needs from organizations looking to grow with data. This listing of the key players categorized by target market provides an interesting picture of this exciting industry sector.
Big Data, Business, Business Intelligence, Data Monetization, Monetizing
- Awesome list of datasets in 100+ categories - May 20, 2021.
With an estimated 44 zettabytes of data in existence in our digital world today and approximately 2.5 quintillion bytes of new data generated daily, there is a lot of data out there you could tap into for your data science projects. It's pretty hard to curate through such a massive universe of data, but this collection is a great start. Here, you can find data from cancer genomes to UFO reports, as well as years of air quality data to 200,000 jokes. Dive into this ocean of data to explore as you learn how to apply data science techniques or leverage your expertise to discover something new.
Big Data, Data Science, Datasets
- Vaex: Pandas but 1000x faster - May 17, 2021.
If you are working with big data, especially on your local machine, then learning the basics of Vaex, a Python library that enables the fast processing of large datasets, will provide you with a productive alternative to Pandas.
Big Data, Data Preprocessing, Pandas, Scalability, Vaex
- ETL in the Cloud: Transforming Big Data Analytics with Data Warehouse Automation - Apr 15, 2021.
Today, organizations are increasingly implementing cloud ETL tools to handle large data sets. With data sets becoming larger by the day, unified ETL tools have become crucial for data integration needs of enterprises.
Automation, Big Data, Big Data Analytics, Cloud, Data Analytics, Data Warehouse, ETL
- Are You Still Using Pandas to Process Big Data in 2021? Here are two better options - Mar 1, 2021.
When its time to handle a lot of data -- so much that you are in the realm of Big Data -- what tools can you use to wrangle the data, especially in a notebook environment? Pandas doesn’t handle really Big Data very well, but two other libraries do. So, which one is better and faster?
Big Data, Dask, Data Preparation, Pandas, Python, Vaex
- KDnuggets™ News 20:n41, Oct 28: Difference Between Junior and Senior Data Scientists; Ain’t No Such a Thing as a Citizen Data Scientist - Oct 28, 2020.
The unspoken difference between junior and senior data scientists; Ain't No Such a Thing as a Citizen Data Scientist; How to become a Data Scientist: a step-by-step guide; Good-bye Big Data. Hello, Massive Data!; DeepMind Relies on this Old Statistical Method to Build Fair Machine Learning Models
Big Data, Career Advice, Citizen Data Scientist, Computer Vision, Data Science, Data Scientist, DeepMind, Statistical Modeling
- Good-bye Big Data. Hello, Massive Data! - Oct 22, 2020.
Join the Massive Data Revolution with SQream. Shorten query times from days to hours or minutes, and speed up data preparation with - analyze the raw data directly.
Big Data, GPU, SQream
- Let’s Be Honest: We’re Drowning in Data - Sep 10, 2020.
The fields of Big Data, Data Analytics/Science, and Data Integration need to face a new truth: We are drowning in data, more and more so every second of every day.
Big Data, Data Analytics, Data Science
- Performance Testing on Big Data Applications - Aug 21, 2020.
You can use performance testing in any application you’re working on but it’s especially useful for big data applications. Let’s see why.
Applications, Big Data, Performance
- 10 Steps for Tackling Data Privacy and Security Laws in 2020 - Jul 22, 2020.
Data privacy laws, such as the CCPA, GDPR, and HIPAA, are here to stay and significantly impact everyone in the digital era. These steps will guide organizations to prepare for compliance and ensure they support the fundamental privacy rights of their customers and users.
Advice, Big Data, CCPA, GDPR, Privacy, Security
- New Poll: What was the largest dataset you analyzed / data mined? - Jun 9, 2020.
Take part in KDnuggets latest survey to have your voice heard, and let the community know what the largest dataset size you have worked with is.
Big Data, Datasets, Largest, Poll
- 3 Key Data Science Questions to Ask Your Big Data - Jun 3, 2020.
The process of understanding your data begins by asking 3 questions at the highest level, and then iteratively asking hundreds of cascading questions to get deeper insights.
Big Data, Business, Customer Analytics, Data Science, Metrics
- Evidence Counterfactuals for explaining predictive models on Big Data - May 18, 2020.
Big Data generated by people -- such as, social media posts, mobile phone GPS locations, and browsing history -- provide enormous prediction value for AI systems. However, explaining how these models predict with the data remains challenging. This interesting explanation approach considers how a model would behave if it didn't have the original set of data to work with.
Big Data, Explainability, Predictive Modeling, Predictive Models, Statistics
- Why and How to Use Dask with Big Data - Apr 15, 2020.
The Pandas library for Python is a game-changer for data preparation. But, when the data gets big, really big, then your computer needs more help to efficiency handle all that data. Learn more about how to use Dask and follow a demo to scale up your Pandas to work with Big Data.
Big Data, Dask, Data Engineering
- The Data Science Puzzle — 2020 Edition - Feb 7, 2020.
The data science puzzle is once again re-examined through the relationship between several key concepts of the landscape, incorporating updates and observations since last time. Check out the results here.
AI, Big Data, Data Mining, Data Science, Deep Learning, Machine Learning
- 7 Resources to Becoming a Data Engineer - Jan 7, 2020.
An estimated 8,650% growth of the volume of Data to 175 zetabytes from 2010 to 2025 has created an enormous need for Data Engineers to build an organization's big data platform to be fast, efficient and scalable.
Advice, Big Data, Cloud Computing, Data Engineering, Data Science, MOOC, SQL
- Alternative Cloud Hosted Data Science Environments - Dec 19, 2019.
Over the years new alternative providers have risen to provided a solitary data science environment hosted on the cloud for data scientist to analyze, host and share their work.
Big Data, Cloud Computing, Data Science, Jupyter, Saturn Cloud
- How to Make an Agile Team Work for Big Data Analytics - Oct 31, 2019.
Learn how to approach the challenges when merging an agile methodology into a data science team to bring out the best value for your Big Data products.
Agile, Big Data, Big Data Analytics, Data Science Team
- Data Sources 101 - Oct 28, 2019.
Data collection is one of the first steps of the data lifecycle — you need to get all the data you require in the first place. To collect the right data, you need to know where to find it and determine the effort involved in collecting it. This article answers the most basic question: where does all the data you need (or might need) come from?
Big Data, Data Science, Datasets, Unstructured data
- The Hidden Risk of AI and Big Data - Sep 20, 2019.
With recent advances in AI being enabled through access to so much “Big Data” and cheap computing power, there is incredible momentum in the field. Can big data really deliver on all this hype, and what can go wrong?
AI, Big Data, Causation, Correlation, Overfitting, Risks
- How to count Big Data: Probabilistic data structures and algorithms - Aug 26, 2019.
Learn how probabilistic data structures and algorithms can be used for cardinality estimation in Big Data streams.
Algorithms, Big Data, Probability
- Automate Stacking In Python: How to Boost Your Performance While Saving Time - Aug 21, 2019.
Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it.
Algorithms, Big Data, Data Science, Python
- An Overview of Python’s Datatable package - Aug 20, 2019.
Modern machine learning applications need to process a humongous amount of data and generate multiple features. Python’s datatable module was created to address this issue. It is a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum possible speed.
Big Data, Data Science, Python
- Learn how to use PySpark in under 5 minutes (Installation + Tutorial) - Aug 13, 2019.
Apache Spark is one of the hottest and largest open source project in data processing framework with rich high-level APIs for the programming languages like Scala, Python, Java and R. It realizes the potential of bringing together both Big Data and machine learning.
Apache Spark, Big Data, Data Science, Python
- Here’s how you can accelerate your Data Science on GPU - Jul 30, 2019.
Data Scientists need computing power. Whether you’re processing a big dataset with Pandas or running some computation on a massive matrix with Numpy, you’ll need a powerful machine to get the job done in a reasonable amount of time.
Big Data, Data Science, DBSCAN, Deep Learning, GPU, NVIDIA, Python
- Easy, One-Click Jupyter Notebooks - Jul 24, 2019.
All of the setup for software, networking, security, and libraries is automatically taken care of by the Saturn Cloud system. Data Scientists can then focus on the actual Data Science and not the tedious infrastructure work that falls around it
Big Data, Cloud, Data Science, Data Scientist, DevOps, Jupyter, Python, Saturn Cloud
- Big Data for Insurance - Jul 18, 2019.
The insurance industry has always been quite conservative; however, the adoption of new technologies is not just a modern trend but a necessity to maintain the competitive pace. In the modern digital era, Big Data technologies help to process vast amounts of information, increase workflow efficiency, and reduce operational costs. Learn more about the benefits of Big Data for insurance from our material.
Analytics, Big Data, Insurance, Predictive Analytics
- The Death of Big Data and the Emergence of the Multi-Cloud Era - Jul 11, 2019.
The Era of Big Data is coming to an end as the focus shifts from how we collect data to processing that data in real-time. Big Data is now a business asset supporting the next eras of multi-cloud support, machine learning, and real-time analytics.
Big Data, Cloudera, Hadoop, Multi-cloud, Realtime Analytics
- An Overview of Outlier Detection Methods from PyOD – Part 1 - Jun 27, 2019.
PyOD is an outlier detection package developed with a comprehensive API to support multiple techniques. This post will showcase Part 1 of an overview of techniques that can be used to analyze anomalies in data.
Algorithms, Big Data, Outliers, Python
- One Simple Trick for Speeding up your Python Code with Numpy - Jun 19, 2019.
Looping over Python arrays, lists, or dictionaries, can be slow. Thus, vectorized operations in Numpy are mapped to highly optimized C code, making them much faster than their standard Python counterparts.
Big Data, numpy, Python
- Scalable Python Code with Pandas UDFs: A Data Science Application - Jun 13, 2019.
There is still a gap between the corpus of libraries that developers want to apply in a scalable runtime and the set of libraries that support distributed execution. This post discusses how to bridge this gap using the the functionality provided by Pandas UDFs in Spark 2.3+
Apache Spark, Big Data, Pandas, Python
- Mongo DB Basics - Jun 5, 2019.
Mongo DB is a document oriented NO SQL database unlike HBASE which has a wide column store. The advantage of Document oriented over relation type is the columns can be changed as an when required for each case as opposed to the same column name for all the rows.
Big Data, Data Engineering, Data Science, MongoDB
- Analyzing Tweets with NLP in Minutes with Spark, Optimus and Twint - May 24, 2019.
Social media has been gold for studying the way people communicate and behave, in this article I’ll show you the easiest way of analyzing tweets without the Twitter API and scalable for Big Data.
Pages: 1 2
Apache Spark, Big Data, Deep Learning, Machine Learning, NLP, Optimus, Python, Twint
- What’s Going to Happen this Year in the Data World - May 14, 2019.
"If we wish to foresee the future of mathematics, our proper course is to study the history and present condition of the science." Henri Poncairé.
Advice, AI, Big Data, Data Science, Deep Learning
- 2019 KDnuggets Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months? - May 7, 2019.
Vote in KDnuggets 20th Annual Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months? We will publish the anon data, results, and trends here.
Big Data, Data Mining Software, Data Science, Deep Learning, Machine Learning, Poll, Programming Languages
- 3 Big Problems with Big Data and How to Solve Them - Apr 18, 2019.
We discuss some of the negatives of using big data, including false equivalences and bias, vulnerability to security breaches, protecting against unauthorized access and the lack of international standards for data privacy regulations.
Advice, Bias, Big Data, Privacy, Security
- Best Data Visualization Techniques for small and large data - Apr 17, 2019.
Data visualization is used in many areas to model complex events and visualize phenomena that cannot be observed directly, such as weather patterns, medical conditions or mathematical relationships. Here we review basic data visualization tools and techniques.
Big Data, Charts, Data Visualization, Histogram, Sciforce
- 7 Qualities Your Big Data Visualization Tools Absolutely Must Have and 10 Tools That Have Them - Apr 2, 2019.
Without the right visualization tools, raw data is of little use. Data visualization helps present the data in an interactive visual format. Here are the qualities to look for in a data visualization tool.
Big Data, Data Visualization, Domo, Plotly, Power BI, QlikView, Sisense, Tableau
- How to Capture Data to Make Business Impact - Mar 21, 2019.
We take a look at the formula for calculating the efficiency of a data capturing method, before going onto explain the concept of Smart Data.
Analytics, Big Data, Data Science, ROI, Smart Data
- Top Active Blogs on AI, Analytics, Big Data, Data Science, Machine Learning – updated - Jan 14, 2019.
Stay up-to-date with the latest technological advancements using our extensive list of active blogs; this is a list of 100 recently active blogs on Big Data, Data Science, Data Mining, Machine Learning, and Artificial intelligence.
AI, Analytics, Big Data, Blogs, Data Mining, Data Science, Data Visualization, Machine Learning
- 4 Myths of Big Data and 4 Ways to Improve with Deep Data - Jan 9, 2019.
There is a fundamental misconception that bigger data produces better machine learning results. However bigger data lakes / warehouses won’t necessarily help to discover more profound insights. It is better to focus on data quality, value and diversity not just size. "Deep Data" is better than Big Data.
Big Data, Data Lakes, Data Warehouse, Hype, Machine Learning, Sampling
- 10 More Must-See Free Courses for Machine Learning and Data Science - Dec 20, 2018.
Have a look at this follow-up collection of free machine learning and data science courses to give you some winter study ideas.
AI, Algorithms, Big Data, Data Science, Deep Learning, Machine Learning, MIT, NLP, Reinforcement Learning, U. of Washington, UC Berkeley, Yandex
- Best Machine Learning Languages, Data Visualization Tools, DL Frameworks, and Big Data Tools - Dec 3, 2018.
We cover a variety of topics, from machine learning to deep learning, from data visualization to data tools, with comments and explanations from experts in the relevant fields.
Big Data, Data Visualization, Deep Learning, Jupyter, Machine Learning, Python, R, Tableau
- Top 5 domains Big Data analytics helps to transform - Nov 23, 2018.
Big data analytics gives a competitive advantage to companies across many industries, especially, financial services, e-commerce, aviation, transportation, logistics, and energy. It enables to reduce downtime, mitigate risks, cut costs, and improve performance.
Aviation, Big Data, Big Data Analytics, Credit Risk, Data Analytics, Ecommerce, Finance, Security
- The Big Data Game Board™ - Nov 19, 2018.
Move aside “Monopoly,” “Risk,” and “Snail Race!” Time to teach the youth of the world of an important, career-advancing game: how to leverage data and analytics to change your life! Introducing the “Big Data Game Board™”!
Big Data, Data Science, Games
- Data Science “Paint by the Numbers” with the Hypothesis Development Canvas - Nov 2, 2018.
Now you are ready to take the next step from a Big Data MBA perspective by building off of the Business Model Canvas to flesh out the business use cases – or hypothesis – which is where we can become more effective at leveraging data and analytics to optimize our the business.
Big Data, Business, Data Science
- Cartoon: Halloween Costume for Big Data. - Oct 31, 2018.
We revisit KDnuggets cartoon looking at the appropriate Halloween costume for Big Data and its companion, No Privacy.
Big Data, Cartoon, Halloween, Privacy
- BIG, small or Right Data: Which is the proper focus? - Oct 8, 2018.
For most businesses, having and using big data is either impossible, impractical, costly to justify, or difficult to outsource due to the over demand of qualified resources. So, what are the benefits of using small data?
Big Data, Big Data Analytics, Data Analytics, Small Data
- Things you should know when traveling via the Big Data Engineering hype-train - Oct 8, 2018.
Maybe you want to join the Big Data world? Or maybe you are already there and want to validate your knowledge? Or maybe you just want to know what Big Data Engineers do and what skills they use? If so, you may find the following article quite useful.
Big Data, Big Data Hype, Data Engineering, Hype
- Hadoop for Beginners - Sep 12, 2018.
An introduction to Hadoop, a framework that enables you to store and process large data sets in parallel and distributed fashion.
Beginners, Big Data, Hadoop
- Interpreting a data set, beginning to end - Aug 20, 2018.
Detailed knowledge of your data is key to understanding it! We review several important methods that to understand the data, including summary statistics with visualization, embedding methods like PCA and t-SNE, and Topological Data Analysis.
Analytics, Big Data, Data Science, Data Visualization, Machine Learning, SAS, Statistics, t-SNE
- Big Data a $4.7 Billion opportunity in the healthcare and pharmaceutical industry - Jul 31, 2018.
This post contains some of the key findings from the SNS Telecom & IT's latest report, which indicates that Big Data investments in the healthcare and pharmaceutical industry are expected to reach nearly $4.7 Billion by the end of 2018.
Big Data, Healthcare, Pharma
- 5 reasons data analytics are falling short - Jul 30, 2018.
When it comes to big data, possession is not enough. Comprehensive intelligence is the key. But traditional data analytics paradigms simply cannot deliver on the promise of data-driven insights. Here’s why.
Big Data, Data Analytics, Failure, SQream
- The What, Where and How of Data for Data Science - Jun 12, 2018.
Here we will take data science apart and build it back up to a coherent and manageable concept. Bear with us!
Big Data, Data Science
- Event Processing: Three Important Open Problems - May 28, 2018.
This article summarizes the three most important problems to be solved in event processing. The facts in this article are supported by a recent survey and an analysis conducted on the industry trends.
Big Data, Data Analytics, Insights, Real-time, SQL, Streaming Analytics
- YouTube videos on database management, SQL, Datawarehousing, Business Intelligence, OLAP, Big Data, NoSQL databases, data quality, data governance and Analytics – free - May 18, 2018.
Watch over 20 hours of YouTube videos on databases and database design, Physical Data Storage, Transaction Management and Database Access, and Data Warehousing, Data Governance and (Big) Data Analytics - all free.
Analytics, Bart Baesens, Big Data, Business Intelligence, Data Governance, Data Quality, Data Warehousing, Databases, NoSQL, SQL, Youtube
- The Executive Guide to Data Science and Machine Learning - May 10, 2018.
This article provides a short introductory guide for executives curious about data science or commonly used terms they may encounter when working with their data team. It may also be of interest to other business professionals who are collaborating with data teams or trying to learn data science within their unit.
Big Data, Business, Data Science, Machine Learning
- Presto for Data Scientists – SQL on anything - Apr 19, 2018.
Presto enables data scientists to run interactive SQL across multiple data sources. This open source engine supports querying anything, anywhere, and at large scale.
Big Data, Database, Presto, SQL
- 5 Things You Need to Know about Big Data - Mar 16, 2018.
We take a look at five things you need to know about Big Data.
3Vs of Big Data, Big Data, Careers, Education, Industry
- 18 Inspiring Women In AI, Big Data, Data Science, Machine Learning - Mar 8, 2018.
For the 2018 international women's day, we profile 18 inspiring women who lead the field in AI, Analytics, Big Data , Data science, and Machine Learning areas.
AI, Big Data, Carla Gentry, Data Science, Fei-Fei Li, Hilary Mason, Jill Dyche, Meta Brown, Monica Rogati, Women
- Resurgence of AI During 1983-2010 - Feb 16, 2018.
We discuss supervised learning, unsupervised learning and reinforcement learning, neural networks, and 6 reasons that helped AI Research and Development to move ahead.
AI, Big Data, History, Machine Learning, Neural Networks, Reinforcement Learning, Trends
- Upcoming Meetings in AI, Analytics, Big Data, Data Science, Deep Learning, Machine Learning: February and Beyond - Feb 2, 2018.
Coming soon: TDWI Las Vegas, BI + Analytics Huntington Beach, Strata San Jose, IBM Think Las Vegas, Big Data & Analytics Singapore, KNIME Berlin, Nvidia GPU, and more.
AI, Analytics, Big Data, Las Vegas, London, Meetings
- Exclusive Interview: Doug Laney on Big Data and Infonomics - Jan 25, 2018.
We discuss 3Vs of Big Data; Infonomics and many aspects of monetizing information including promising analytics methods, successful companies, main challenges; Information marketplaces and why data ownership concept is misguided, and more.
3Vs of Big Data, Big Data, Doug Laney, Infonomics, Marketplace, Privacy
- Four Big Data Trends for 2018 - Jan 25, 2018.
Curious about the future of Big Data and AI? Here’s what the trends have it in 2018 for innovations.
2018 Predictions, AI, Big Data, Chatbot, Explainable AI, IoT, Trends
- Supercharging Visualization with Apache Arrow - Jan 5, 2018.
Interactive visualization of large datasets on the web has traditionally been impractical. Apache Arrow provides a new way to exchange and visualize data at unprecedented speed and scale.
Apache Arrow, Big Data, Data Analytics, Data Visualization, Dremio, GPU, Graphistry, Open Source
- How Nonprofits Can Benefit from the Power of Data Science - Jan 3, 2018.
Nonprofits can use analytics to boost their fundraising efforts, measure and monitor the impact of their activities, build predictive models, optimize allocation of funds, and more
Big Data, Data Science, Social Good
- Simple Ways Of Working With Medium To Big Data Locally - Dec 27, 2017.
An overview of the installation and implementation of simple techniques for working with large datasets in your machine.
Big Data, iPhone, Python, R, SAS
- 70 Amazing Free Data Sources You Should Know - Dec 20, 2017.
70 free data sources for 2017 on government, crime, health, financial and economic data, marketing and social media, journalism and media, real estate, company directory and review, and more to start working on your data projects.
Big Data, Business, Crime, Datasets, Finance, Government, Health, Journalism, Octoparse, Social Media
- Big Data: Main Developments in 2017 and Key Trends in 2018 - Dec 5, 2017.
As we bid farewell to one year and look to ring in another, KDnuggets has solicited opinions from numerous Big Data experts as to the most important developments of 2017 and their 2018 key trend predictions.
2018 Predictions, Big Data, Bill Inmon, Bill Schmarzo, Doug Laney, James Kobielus, Matei Zaharia, Meta Brown, Predictions, Ronald van Loon, Trends, Yves Mulkers
- Graph Analytics Using Big Data - Dec 4, 2017.
An overview and a small tutorial showing how to analyze a dataset using Apache Spark, graphframes, and Java.
Pages: 1 2
Apache Spark, Big Data, Graph Analytics, India, Java
- PySpark SQL Cheat Sheet: Big Data in Python - Nov 16, 2017.
PySpark is a Spark Python API that exposes the Spark programming model to Python - With it, you can speed up analytic applications. With Spark, you can get started with big data processing, as it has built-in modules for streaming, SQL, machine learning and graph processing.
Pages: 1 2
Apache Spark, Big Data, DataCamp, Python, SQL
- Updates & Upserts in Hadoop Ecosystem with Apache Kudu - Oct 27, 2017.
A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
Apache, Big Data, Data Management, Hadoop, Java, NoSQL
- Introduction to Blockchains & What It Means to Big Data - Sep 27, 2017.
Perhaps most significant development in IT over the past few years, blockchain has the potential to change the way that the world approaches big data, with enhanced security and data quality.
Big Data, Big Data Analytics, Bitcoin, Blockchain, Monetizing
- Top 10 Active Big Data, Data Science, Machine Learning Influencers on LinkedIn, Updated - Sep 26, 2017.
Looking for advice? Guidance? Stories? We’ve put a list of the top ten LinkedIn influencers of the last three months, follow them and stay up-to-date with the latest news in Big Data, Data Science, Analytics, Machine Learning and AI.
About Gregory Piatetsky, Bernard Marr, Big Data, Carla Gentry, Data Science, DJ Patil, Influencers, Kirk D. Borne, LinkedIn, Machine Learning, Tom Davenport, Trends
- Big Data Architecture: A Complete and Detailed Overview - Sep 19, 2017.
Data scientists may not be as educated or experienced in computer science, programming concepts, devops, site reliability engineering, non-functional requirements, software solution infrastructure, or general software architecture as compared to well-trained or experienced software architects and engineers.
Analytics, Big Data, Big Data Architecture, Cloud, Cloud Computing, Scalability, Software, Software Engineering
- The Rise of GPU Databases - Aug 17, 2017.
The recent but noticeable shift from CPUs to GPUs is mainly due to the unique benefits they bring to sectors like AdTech, finance, telco, retail, or security/IT . We examine where GPU databases shine.
Big Data, Database, GPU, Predictive Analytics, SQL, SQream
- Global Big Data Conference, Santa Clara, Aug 29-31 – KDnuggets Offer - Aug 14, 2017.
Global Big Data Conference, a leading vendor agnostic conference for the Big Data community, will hold 5th conference in Santa Clara. Use code KDnuggets to save.
Big Data, CA, Finance, Global Big Data Conference, Industry, Santa Clara
- Why Apache Arrow is the future for open source-columnar memory analytics - Aug 7, 2017.
Apache Arrow is a de-facto standard for columnar in-memory analytics. In the coming years we can expect all the big data platforms adopting Apache Arrow as its columnar in-memory layer.
Analytics, Apache, Apache Arrow, Big Data, In-Memory Computing, Open Source
- Machine Learning Applied to Big Data, Explained - Jul 17, 2017.
Machine learning with Big Data is, in many ways, different than "regular" machine learning. This informative image is helpful in identifying the steps in machine learning with Big Data, and how they fit together into a process of their own.
Big Data, Explained, Machine Learning, Rubens Zimbres
- Marketing Analytics for Data Rich Environments - Jul 14, 2017.
A lot is changing in the world of marketing analytics. Marketing scientist Kevin Gray asks Professor Michel Wedel, a leading authority on this topic from the Robert H. Smith School of Business at the University of Maryland, what marketing researchers and data scientists most need to know about it.
Analytics, Big Data, Marketing Analytics
- Apache Flink: The Next Distributed Data Processing Revolution? - Jul 5, 2017.
Will Apache Flink displace Apache Spark as the new champion of Big Data Processing? We compare Spark and Apache Flink performance for batch processing and stream processing.
Apache Spark, Big Data, Flink, Streaming Analytics
- How HR Managers Use Data Science to Manage Talent for Their Companies - Jun 7, 2017.
Data sciences can also be used by HR manager to create several estimates like the investment on talent pool, cost per hire, cost on training, and cost per employee. It provides better techniques for optimization, forecasting, and reporting.
Big Data, Data Science, Decision Management, HR
- Must-Know: What are common data quality issues for Big Data and how to handle them? - May 16, 2017.
Let's have a look at common quality issues facing Big Data in terms of the key characteristics of Big Data – Volume, Velocity, Variety, Veracity, and Value.
3Vs of Big Data, Big Data, Data Quality, Interview Questions
- HDFS vs. HBase : All you need to know - May 15, 2017.
Hadoop Distributed File System (HDFS), and Hbase (Hadoop database) are key components of Big Data ecosystem. This blog explains the difference between HDFS and HBase with real-life use cases where they are best fit.
Big Data, Hadoop, HBase, HDFS
- Machine Learning overtaking Big Data? - May 4, 2017.
Is Machine Learning is overtaking Big Data?! We also examine trends for several more related and popular buzzwords, and see how BD, ML. Artificial Intelligence, Data Science, and Deep Learning rank.
Big Data, Big Data Hype, Gartner, Google Trends, Machine Learning
- Did you know cavemen were already dealing with “Big Data” issues? - May 3, 2017.
We know Big Data & Analytics are new & cutting edge technologies; but actually, human started using data & analytics techniques 5000 years ago. Let’s take a look.
Big Data, Big Data Analytics, Data Analysis, Data Science, History
- Deep Learning – Past, Present, and Future - May 2, 2017.
There is a lot of buzz around deep learning technology. First developed in the 1940s, deep learning was meant to simulate neural networks found in brains, but in the last decade 3 key developments have unleashed its potential.
Pages: 1 2
Andrew Ng, Big Data, Deep Learning, Geoff Hinton, Google, GPU, History, Neural Networks, NVIDIA
- What Do Frameworks Offer Data Scientists that Programming Languages Lack? - May 2, 2017.
While programming languages will never be completely obsolete, a growing number of programmers (and data scientists) prefer working with frameworks and view them as the more modern and cutting-edge option for a number of reasons.
Big Data, Data Science, Programming Languages
- Difference Between Big Data and Internet of Things - Apr 21, 2017.
If you cannot manage real-time streaming data and make real-time analytics and real-time decisions at the edge, then you are not doing IOT or IOT analytics, in my humble opinion. So what is required to support these IOT data management and analytic requirements?
Big Data, Internet of Things, IoT
- How Big Data Helps Today’s Airlines Operate - Apr 19, 2017.
Companies all over the world have placed a lot of value on getting more insights from big data analytics. That’s not without good reason.
Airlines, Big Data
- 17 More Must-Know Data Science Interview Questions and Answers, Part 3 - Mar 15, 2017.
The third and final part of 17 new must-know Data Science interview questions and answers covers A/B testing, data visualization, Twitter influence evaluation, and Big Data quality.
Pages: 1 2
3Vs of Big Data, A/B Testing, Big Data, Data Quality, Data Science, Data Visualization, Influencers, Interview Questions, Statistics, Twitter
- Big Data Desperately Needs Transparency - Mar 6, 2017.
If Big Data is to realize its potential, people need to understand what it is capable of, what information is out there and where every piece of data comes from. Without such transparency and understanding, it will be difficult to persuade people to rely on the findings.
Big Data, Interpretability, Transparency, Trust
- The Origins of Big Data - Feb 21, 2017.
Big Data has truly come of age in 2013 when OED introduced the term “Big Data” for the first time. But when was the term Big Data first used and Why? Here are the results of our investigation.
Big Data, Doug Laney, History, Tim O'Reilly
- So What is Big Data? - Feb 9, 2017.
We examine what experts say about Big Data – is it like teenage sex? Is it more than just a large and complex collection of data? And how many Vs are there?
3Vs of Big Data, Big Data, Forrester, Gartner, IBM, McKinsey, O'Reilly
- 5 Career Paths in Big Data and Data Science, Explained - Feb 6, 2017.
Sexiest job... massive shortage... blah blah blah. Are you looking to get a real handle on the career paths available in "Data Science" and "Big Data?" Read this article for insight on where to look to sharpen the required entry-level skills.
Big Data, Career, Data Analyst, Data Engineering, Data Infrastructure, Data Science, Explained, Machine Learning
- The Data Science Puzzle, Revisited - Jan 20, 2017.
The data science puzzle is re-examined through the relationship between several key concepts in the realm, and incorporates important updates and observations from the past year. The result is a modified explanatory graphic and rationale.
AI, Big Data, Data Mining, Data Science, Deep Learning, Machine Learning
- The big data ecosystem for science: X-ray crystallography - Jan 19, 2017.
Diffract-and-destroy experiments to accurately determine three-dimensional structures of nano-scale systems can produce 150 TB of data per sample. We review how such Big Data is processed.
Big Data, Science, Strata, X-ray crystallography
- More Data or Better Algorithms: The Sweet Spot - Jan 17, 2017.
We examine the sweet spot for data-driven Machine Learning companies, where is not too easy and not too hard to collect the needed data.
Algorithms, Big Data, Data, Datasets, Machine Learning
- 90 Active Blogs on Analytics, Big Data, Data Mining, Data Science, Machine Learning (updated) - Jan 17, 2017.
Stay up-to-date in the data science with active blogs. This is a list of 90 recently active blogs on Big Data, Data Science, Data Mining, Machine Learning, and Artificial intelligence.
Pages: 1 2
Big Data, Blogs, Data Mining, Data Science, Machine Learning
- A Funny Look at Big Data and Data Science - Dec 27, 2016.
A less than serious look at Big Data and Data Science. If you can laugh at all cartoons, then your Data Science skills are in good shape.
Big Data, Cartoon, Humor, SQL
- The big data ecosystem for science: Climate Science and Climate Change - Dec 22, 2016.
Climate change is one of the most pressing challenges for human society in the 21st century. We review the Big Data ecosystem for studying the climate change.
Big Data, Climate Change, Science, Strata
- Smart Data Platform – The Future of Big Data Technology - Dec 2, 2016.
Data processing and analytical modelling are major bottlenecks in today’s big data world, due to need of human intelligence to decide relationships between data, required data engineering tasks, analytical models and it’s parameters. This article talks about Smart Data Platform to help to solve such problems.
Big Data, Big Data Analytics, China, Data Processing, Modeling, TalkingData
- Top Reasons Why Big Data, Data Science, Analytics Initiatives Fail - Dec 1, 2016.
We examine the main reasons for failure in Big Data, Data Science, and Analytics projects which include lack of clear mandate, resistance to change, and not asking the right questions, and what can be done to address these problems.
Big Data, Data Science, Failure, Project Fail
- Top 10 Facebook Groups for Big Data, Data Science, and Machine Learning - Nov 23, 2016.
Social media now not only shares friendship connections or photos of “selfies” but also spreads from political media to science information. Social network members are tending to more eagerly learn about big data, data science and machine learning through groups. We review the ten largest Facebook groups in this area.
Big Data, Data Science, Facebook, Machine Learning
- Cartoon: Thanksgiving, Big Data, and Turkey Data Science. - Nov 23, 2016.
We revisit KDnuggets Thanksgiving cartoon, which examines the predicament of one group of fowl Data Scientists.
Big Data, Cartoon, Thanksgiving
- Data Science and Big Data, Explained - Nov 14, 2016.
This article is meant to give the non-data scientist a solid overview of the many concepts and terms behind data science and big data. While related terms will be mentioned at a very high level, the reader is encouraged to explore the references and other resources for additional detail.
Beginners, Big Data, Data Science, Explained
- Practical Data Science: Building Minimum Viable Models - Nov 8, 2016.
Data Science for startups based on data: Minimum Valuable Model, a new concept to avoid a full scale 95% accurate data science model. Want to know more about MVM? Have a look at this interesting article.
Big Data, Data Science, Startups
- Largest Dataset Analyzed Poll shows surprising stability, more junior Data Scientists - Nov 8, 2016.
The majority (57%) of respondents only worked with Gigabyte range data. More junior Data Scientists enter the market, but Petabyte Big Data Scientists still stand apart.
Asia, Big Data, Datasets, Europe, Largest, Poll, USA
- Evaluating HTAP Databases for Machine Learning Applications - Nov 2, 2016.
Businesses are producing a greater number of intelligent applications; which traditional databases are unable to support. A new class of databases, Hybrid Transactional and Analytical Processing (HTAP) databases, offers a variety of capabilities with specific strengths and weaknesses to consider. This article aims to give application developers and data scientists a better understanding of the HTAP database ecosystem so they can make the right choice for their intelligent application.
Pages: 1 2
Big Data, Data Processing, HTAP, Oracle, SAP, Splice Machine, SQL
- Cartoon: Scary Big Data - Oct 29, 2016.
What do Halloween and Big Data have in common? Both can be scary, as new KDnuggets cartoon shows.
Big Data, Cartoon, Halloween, Healthcare, Privacy
- Learn Data Science in 8 (Easy) Steps - Oct 27, 2016.
Want to learn data science? Check out these 8 (easy) steps to set out in the right direction!
Pages: 1 2
Big Data, Data Science, DataCamp, Machine Learning
- Big Data Science: Expectation vs. Reality - Oct 27, 2016.
The path to success and happiness of the data science team working with big data project is not always clear from the beginning. It depends on maturity of underlying platform, their cross skills and devops process around their day-to-day operations.
Big Data, Big Data Engineer, Data Science, Data Science Team, DevOps
- Top 12 Interesting Careers to Explore in Big Data - Oct 12, 2016.
From data driven strategies to decision making, the true worth of Big Data has been realized, and has led to opening up of amazing career choices. Check out these 12 interesting careers to explore in Big Data.
Analyst, Big Data, Big Data Engineer, Business Analytics, Data Science, Data Scientist, Machine Learning Scientist, Simplilearn, Statistician
- Here’s How IT Departments are Using Big Data - Oct 10, 2016.
The use cases for big data are clear when it comes to areas like marketing, healthcare, and retail, but IT’s use of big data is a little less clear. Recently, however, some IT departments are finding ways to use big data to improve their individual operations along with that of the entire organization.
Big Data, Business
- Top 16 Active Big Data, Data Science Leaders on LinkedIn - Sep 23, 2016.
Who are the most active Big Data, Data Science Influencers and Leaders on LinkedIn? We analyze the data and bring you the list of key people to follow.
About Gregory Piatetsky, Bernard Marr, Big Data, Big Data Influencers, Carla Gentry, Data Science, DJ Patil, Influencers, LinkedIn, Tom Davenport
- The top 5 Big Data courses to help you break into the industry - Aug 25, 2016.
Here is an updated and in-depth review of top 5 providers of Big Data and Data Science courses: Simplilearn, Cloudera, Big Data University, Hortonworks, and Coursera
Big Data, Cloudera, Coursera, Data Science Education, Hortonworks, Online Education, Simplilearn
- 5 EBooks to Read Before Getting into A Data Science or Big Data Career - Aug 11, 2016.
A short, carefully-curated list of 5 free ebooks to help you better understand what Data Science is all about and how you can best prepare for a career in data science, big data, and data analysis.
Big Data, Free ebook, Hadoop, Programming Languages, Simplilearn, Tableau
- Big Data Key Terms, Explained - Aug 11, 2016.
Just getting started with Big Data, or looking to iron out the wrinkles in your current understanding? Check out these 20 Big Data-related terms and their concise definitions.
Pages: 1 2
3Vs of Big Data, Apache Spark, Big Data, Business Intelligence, Cloud Computing, Data Warehouse, Explained, Hadoop, Key Terms, Predictive Analytics
- Why Big Data is in Trouble: They Forgot About Applied Statistics - Jul 18, 2016.
This "classic" (but very topical and certainly relevant) post discusses issues that Big Data can face when it forgets, or ignores, applied statistics. As great of a discussion today as it was 2 years ago.
Applied Statistics, Big Data, Google, Statistics
- 10 Algorithm Categories for AI, Big Data, and Data Science - Jul 14, 2016.
With a focus on leveraging algorithms and balancing human and AI capital, here are the top 10 algorithm categories used to implement A.I., Big Data, and Data Science.
AI, Algorithms, Big Data, Data Science
- Big Data, Bible Codes, and Bonferroni - Jul 8, 2016.
This discussion will focus on 2 particular statistical issues to be on the look out for in your own work and in the work of others mining and learning from Big Data, with real world examples emphasizing the importance of statistical processes in practice.
Bible, Big Data, Bonferroni, Probability, Statistics, Terrorism
- 3 Key Ethics Principles for Big Data and Data Science - Jul 6, 2016.
If ethics in general are important, should ethics training be a crucial element of the data science field?
Big Data, Data Science, Ethics, Hui Xiong
- The Big Data Ecosystem is Too Damn Big - Jun 28, 2016.
The Big Data ecosystem is just too damn big! It's complex, redundant, and confusing. There are too many layers in the technology stack, too many standards, and too many engines. Vendors? Too many. What is the user to do?
Analytics, Big Data, Business Analytics
- 5 Best Practices for Big Data Security - Jun 9, 2016.
Lack of data security can not only result in financial losses, but may also damage the reputation of organizations. Take a look at some of the most important data security best practices that can reduce the risks associated with analyzing a massive amount of data.
Best Practices, Big Data, Security
- Data Science of Variable Selection: A Review - Jun 7, 2016.
There are as many approaches to selecting features as there are statisticians since every statistician and their sibling has a POV or a paper on the subject. This is an overview of some of these approaches.
Algorithms, Big Data, Feature Selection, Statistics
- Big Data Business Model Maturity Index and the Internet of Things (IoT) - Jun 7, 2016.
This post explores how organizations could use the Big Data Business Model Maturity Index (BDBMMI) to exploit the Internet of Things (IoT).
Big Data, Internet of Things, IoT, Maturity Model