KDnuggets 30th Anniversary Interview with Founder Gregory Piatetsky-Shapiro

Gregory Piatetsky-Shapiro founded KDnuggets 30 years ago, after organizing early workshops on knowledge discovery. In this retrospective interview, he reflects on KDnuggets' growth, key innovations like deep learning, and concerns about AI's societal impact.



KDnuggets 30th Anniversary Interview with Gregory Piatetsky-Shapiro

 

Happy anniversary KDnuggets!

This website — the very one you are reading right now — started life 30 years ago as a modest newsletter, and has since morphed into one of the oldest and longest-enduring data science resources available today. We are celebrating this achievement all month long, starting rather appropriately by sharing our recent discussion with KDnuggets founder Gregory Piatetsky-Shapiro.

Gregory is the mastermind behind KDnuggets, and ran the site for 28+ years, up until very recently. Known for coining the term "knowledge discovery in databases" and founding the KDD conference series, Gregory started the Knowledge Discovery Nuggets (KDnuggets) newsletter in 1993 to connect researchers in the fields of data mining and knowledge discovery. Until his retirement in 2022, KDnuggets grew into an influential publication in data science, machine learning, AI and analytics under Gregory's stewardship.

Though he is enjoying has hard-earned retirement, we managed to coax him back into the fray for a wide-ranging discussion on KDnuggets' history, its current state, the future, and even some reminiscing.

 
Questions for this interview were posed by KDnuggets editors Matthew Mayo, Abid Ali Awan, and Nisha Arya. The editor posing each question is noted along the way.

 
KDnuggets: Happy 30th anniversary, Gregory! For the few people out there who may not know who you are, can you give us the 30,000 foot abridged version? (asked by Matthew)

Gregory: Matt, thank you and pleasure to work with you and write for KDnuggets again!

I am probably most known as the founder of KDnuggets — this publication — and a co-founder of KDD Conferences, a leading conference in data science and data mining. I started my scientific career as a researcher in AI and Databases; my Ph.D. thesis in 1984 was on the topic of self-organizing database systems. I then worked for a dozen years at GTE Laboratories in the Boston area, doing research, and building applied systems at the intersection of AI and databases. In 1989 I started the first project in the world called "Knowledge Discovery in Databases". Our project produced interesting applications to healthcare (KEFIR system), fraud detection, churn (customer attrition) prediction, and other areas.

In 1997 the dot-com boom was in the early stages and I left GTE to join a startup which was applying data mining to the financial area. We worked with some of the largest banks and insurance companies in the world, developing models for customer segmentation, attrition, cross-sell, and so on. In 2000 the first startup was bought by a larger startup for $50 million, but before any of us could cash our stock options, the dot com bubble burst and the second start-up went out of business. The value of all the hard-earned stock options was zero.

 

Gregory Piatetsky-Shapiro coined the term "knowledge discovery in databases" for the first workshop on the same topic (KDD-1989) and this term became more popular in the AI and machine learning communities. However, the term data mining became more popular in the business and press communities. Currently, the terms data mining and knowledge discovery are used interchangeably.
"Data mining" Wikipedia entry

 

So, in 2001 I decided to go on my own, publishing KDnuggets and doing consulting.

I have done a large variety of interesting consulting projects, from searching for biomarkers for Alzheimer to detecting counterfeit jewelry on eBay to analyzing software usage. But as KDnuggets became more popular it demanded more time, so I stopped consulting and focused on KDnuggets full time.

With data science and machine learning becoming hot fields around 2012 (as evidenced by the article, among many, titled "Data Scientist – the sexiest job of the 21st century") KDnuggets grew significantly and achieved wide recognition in the industry. KDnuggets was named frequently among the top publications in AI, big data, data science, and machine learning (see here for details).

I was very honored to be named LinkedIn top voice in data science and analytics in 2018.

Of course, whatever success with KDnuggets I have achieved is shared with many other people who helped me and worked with me along the way. I cannot name all, but I want to mention especially Chris Matheus and Michael Beddows who worked with me at GTE on the early KDnuggets website; Usama Fayyad, Sam Uthurusamy, and Won Kim with whom I worked on KDD conferences and organization; and Anmol Rajpurohit for helping with KDnuggets in 2013-15.

Finally, and most importantly, Matthew Mayo who joined the KDnuggets team in 2016 and helped KDnuggets reach its current success, and has taken over when I retired in 2022.

 

LinkedIn Top Voices 2018: Data Science & Analytics
From LinkedIn Top Voices 2018: Data Science & Analytics

 

Can you tell us about the inspiration behind starting your publication? (Nisha)

In 1989 I organized the first workshop on Knowledge Discovery in Databases at IJCAI-89. That workshop was repeated in 1991 and 1993, and in July of 1993, to connect researchers working in this area, I started a newsletter which I then called Knowledge Discovery Nuggets. I used the term "knowledge discovery" because the term "data mining" used at that time seemed imprecise — it was not clear what we were mining for. "Nuggets" because we published mainly short but relevant and interesting items. Think "gold nuggets" found in the ore of data.

The workshop became a KDD-95 conference in 1995 (ably organized by Usama Fayyad and Sam Ramaswamy) and KDD conferences have been going strong since as the premier data science conference in the world. I served as chair of ACM KDD organization from 2005 to 2009 and on the KDD executive committee until 2013.

The very first issue of KDnuggets was sent to about 50 researchers who attended KDD-93 workshop. The amount of information in this area was growing, and as the workshop organizer I was well-positioned to assemble and organize it. In 1994, soon after the appearance of the World Wide Web, we started what was then the second site in the world on data mining and knowledge discovery. It was called "Knowledge Discovery Mine" but it resided on GTE Labs domain and is no longer available.

When I left GTE Labs in 1997, I copied the information to a new website called KDnuggets, an abbreviation for Knowledge Discovery Nuggets. This website still exists today... and you are reading it!!!

 
Do you feel you have achieved your goal with KDnuggets? (Nisha)

The goal is the journey!

But KDnuggets success and longevity have far exceeded my expectations.

My initial goal in creating the KDnuggets newsletter was to connect researchers working in this area more frequently than at a once a year workshop. My goal for the first KDnuggets-connected website, created in 1994 at GTE Labs and called "Knowledge Discovery Mine", was mainly to organize then existing information about data mining, mainly software and datasets, and make it available to all. Those two sections — Software and Datasets — were the most popular sections for many years.

In the 1990s, KDnuggets had a very comprehensive directory of then available software, datasets, meetings, and other relevant information, so it was a very useful resource.

As the field grew, it became impossible to maintain a hand-curated directory of things related to data mining and data science, and KDnuggets refocused on practical and educational content, and more on what was useful to practitioners. We were also fortunate in timing, as the interest in data mining and data science grew dramatically in 2010s and 2020s. As a result, the number of subscribers and website visitors grew significantly.

 
Do you feel that KDnuggets made a positive impact on the data field along the way? (Abid)

I certainly hope so! In the early days, the KDnuggets newsletter and website were useful resources for connecting the research community, and later it was a useful educational resource for practitioners and data scientists in the beginning stages of their career.

Some of our readers really enjoyed KDnuggets, as demonstrated in this cartoon:

 

Cartoon: KDnuggets Addiction
From Cartoon: KDnuggets Addiction

 

What do you feel is the biggest advancement in data science to have come along during your publication career? (Matt)

Clearly, deep learning. Although research in neural networks had been going since the 1960s, the big breakthrough was the deep learning approach, developed mainly by Geoff Hinton, Yann LeCun, and Yoshua Bengio in early 2000s. The first notable success of deep learning is usually dated to October of 2012 when AlexNet, created by Geoff Hinton and his students, won the ImageNet competition in October of 2012 by an unprecedented large margin.

Soon thereafter, many researchers and practitioners began using deep learning and KDnuggets started covering it. Deep learning was already the top KDnuggets news item in December 2012.

Deep learning and all the later technologies derived from it, like ChatGPT, remain among the most popular topics now.

 
What was important to you while working on KDnuggets (for example: money, experience, or spreading knowledge)? (Abid)

Of course, money was important, since I was self-employed since 2001 and had to support my family and pay the mortgage, but it was not the most important. Probably the main motivation for me when I started KDnuggets was building a community and interacting with smart people. From 1993 to 2000, I ran KDnuggets newsletter and website without any revenue or ads, as a purely volunteer service for the community. Running KDnuggets was a natural complement of helping organize KDD workshops and conferences, and an unpaid but very rewarding volunteer activity.

I think that KDnuggets played a positive role in spreading the knowledge of data mining and data science, as judged by very large numbers of visitors and subscribers.

 
How did you ensure that KDnuggets stood out in the competitive media landscape? (Nisha)

There is no magic formula. This required, first and foremost, a lot of hard work. But if I were to find some "nuggets" of KDnuggets' enduring success, that would be quality content, synergy, and attention.

First, we tried hard to find or write good quality content. Second, we relied on positive synergy between different channels — emails were helping to bring visitors to the site, and the site was helping to bring more email subscribers. KDnuggets' successful presence on Twitter (now X), LinkedIn, and Facebook were also reinforcing each other.

Finally, attention. I was paying a lot of attention to both the site internal behavior, periodically modifying it to improve important metrics, and to external trends, adapting our content to what was interesting and hot in the field.

 
Can you share a particularly impactful or memorable story that KDnuggets covered early on, and the effect it had? (Nisha)

One early story from 1990s was that about foster children. One of the useful things KDnuggets did was posting queries from researchers, and one person around 1995 posted a query about his problem working on a foster children payment database. There were a lot of names that were spelled slightly differently and to get payments to the right person you had to unify the different spelling. Another researcher saw that query in KDnuggets and was able to apply their algorithm for name matching to solve the foster children problem. This helped to get payments to more children and improved their lives.

 
Even though you have stepped away, where would you like to see KDnuggets in the next 10 years? (Nisha)

I hope it will still have some content written by humans and have human readers!

 
How do you feel about AI eventually taking over content creation? (Abid)

On one hand, I feel very excited that sci-fi stories about AI and robots I was reading as a child are getting close to reality, and in some cases the reality is already exceeding the sci-fi. On the other hand, I feel sad for human content creators.

Social networks have already shown the dangers of optimizing for attention, and AI is extremely good at optimizing. I can imagine in a few years (or even a few months) AI will excel at creating addictive content that many humans would want to watch non-stop.

Perhaps AI is already generating a lot of content on TikTok.

But is it good for the society if so many people will be addicted to a digital drug?

AI's promise and threat is of course much broader than content creation — AI can potentially take over most jobs.

In the short term, I think there will be a period of collaboration, when human + AI can do better in many tasks that human or AI alone. Taking chess as an example, after Deep Blue had defeated world champion Garry Kasparov in 1997, there were tournaments where human + computer teams did better than computers or humans. However, that period was short and now the best chess programs are much, much better than even the world champion.

In the longer term, I am very concerned about AI-caused job losses and increased income inequality, which can destabilize societies and destroy democracies. This will not happen this year, but the current technology trends are pointing towards such scenarios. A possible long-term solution to AI-caused unemployment could be some form of universal basic income, and focusing on developing human creativity.

Such a solution will be hard to adopt and will require political activism and civic engagement, so if you, the reader, are concerned about risks of AI, then learn about it, engage, and vote!

 
Thanks, Gregory! Your participation in this is appreciated, and celebrating such a milestone for KDnuggets wouldn't be the same without it.

 
 
Matthew Mayo (@mattmayo13) holds a Master's degree in computer science and a graduate diploma in data mining. As Editor-in-Chief of KDnuggets, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.