- What Comes After HDF5? Seeking a Data Storage Format for Deep Learning - Nov 9, 2021.
In this article we are discussing that HDF5 is one of the most popular and reliable formats for non-tabular, numerical data. But this format is not optimized for deep learning work. This article suggests what kind of ML native data format should be to truly serve the needs of modern data scientists.
Data Management, Deep Learning, Python
- CSV Files for Storage? No Thanks. There’s a Better Option - Aug 31, 2021.
Saving data to CSV’s is costing you both money and disk space. It’s time to end it.
Data Management, Pandas, Parquet, Python
- The Best Tool for Data Blending is KNIME - Jan 13, 2021.
These are the lessons and best practices I learned in many years of experience in data blending, and the software that became my most important tool in my day-to-day work.
Data Exploration, Data Management, ETL, Knime
- How A Single Source of Truth Can Benefit Your Organization - Aug 7, 2020.
A single source of truth provides stakeholders with a clear picture of the enterprise assets and the potential complications that can disrupt the data strategy. Find out how you can implement this single source of truth in your enterprise ecosystem.
Business Intelligence, Data Management, Data Quality, Decision Making
- A Holistic Framework for Managing Data Analytics Projects - May 22, 2020.
Agile project management for Data Science development continues to be an effective framework that enables flexibility and productivity in a field that can experience continuous changes in data and evolving stakeholder expectations. Learn more about the leading approaches for developing Data Science models, and apply them to your next project.
Agile, CRISP-DM, Data Analytics, Data Management, Data Mining, Decision Management, Development, Software Engineering
- The Benefits & Examples of Using Apache Spark with PySpark - Apr 21, 2020.
Apache Spark runs fast, offers robust, distributed, fault-tolerant data objects, and integrates beautifully with the world of machine learning and graph analytics. Learn more here.
Apache Spark, Data Management, Python, SQL
- How Bad Data is Affecting Your Organization’s Operational Efficiency - Mar 5, 2020.
Despite recognizing the importance of data quality, many companies still fail to implement a data quality framework that could protect them from making costly mistakes. Poor data does not just cause revenue loss – it’s the reason your company could lose employees, customers and reputation!
Business, Data Management, Data Operations, Data Quality, Efficiency
- Everything a Data Scientist Should Know About Data Management - Oct 22, 2019.
For full-stack data science mastery, you must understand data management along with all the bells and whistles of machine learning. This high-level overview is a road map for the history and current state of the expansive options for data storage and infrastructure solutions.
Data Management, Data Scientist, Hadoop
- How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions - Aug 22, 2019.
As machine learning evolves, the need for tools and platforms that automate the lifecycle management of training and testing datasets is becoming increasingly important. Fast growing technology companies like Uber or LinkedIn have been forced to build their own in-house data lifecycle management solutions to power different groups of machine learning models.
AirBnB, Data Management, LinkedIn, Machine Learning, Netflix, Uber
- Manual Coding or Automated Data Integration – What’s the Best Way to Integrate Your Enterprise Data? - Aug 19, 2019.
What’s the best way to execute your data integration tasks: writing manual code or using ETL tool? Find out the approach that best fits your organization’s needs and the factors that influence it.
Advice, Data Integration, Data Management, Data Science, Data Science Platform, ETL
- Updates & Upserts in Hadoop Ecosystem with Apache Kudu - Oct 27, 2017.
A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
Apache, Big Data, Data Management, Hadoop, Java, NoSQL
- Simplifying Data Pipelines in Hadoop: Overcoming the Growing Pains - May 18, 2017.
Moving to Hadoop is not without its challenges—there are so many options, from tools to approaches, that can have a significant impact on the future success of a business’ strategy. Data management and data pipelining can be particularly difficult.
Data Management, Data Platform, Hadoop, SVDS
- The dynamics between AI and IoT - Apr 18, 2017.
We see the need for a new type of Engineer who will combine knowledge from Electronics & IoT with Machine learning, AI, Robotics, Cloud and Data management (devops).
AI, Cloud Computing, Data Management, DevOps, Engineer, IoT, Robots
- How To Stay Competitive In Machine Learning Business - Jan 4, 2017.
To stay competitive in machine learning business, you have to be superior than your rivals and not the best possible – says one of the leading machine learning expert. Simple rules are defined here to make that happen. Let’s see how.
Business, Business Analytics, Data Management, Machine Learning, Research
- 5 Ways in Which Big Data Can Help Leverage Customer Data - May 25, 2016.
Every business enterprise realizes the importance of big data but rarely puts the customer data that they possess to good use. Here are few ways enterprises can leverage customer data.
Analytics, Big Data, Data Management, Data Mining
- Data Science Data Architecture - Sep 10, 2015.
Data scientists are kind of a rare breed, who juggles between data science, business and IT. But, they do understand less IT than an IT person and understands less business than a business person. Which demands a specific workflow and data architecture.
Pages: 1 2
Big Data Architecture, Data Management, Data Science, Olav Laudy
- Data Hierarchy of Needs - Aug 28, 2015.
Data Hierarchy of Needs helps understand the steps in Big Data processing. Before going to advanced data modeling (top of the pyramid), organizations need to fill huge holes they frequently have in the base of the pyramid, lacking reliable complete data flow.
Data Management, Data-Driven Business, Yanir Seroussi