10 Pandas One Liners for Data Access, Manipulation, and Management
These 10 one liners will help you start to access, manipulate, and manage data using Pandas.
Pandas one liners... get it? Image created with Midjourney
Python is known for being a language that is easy to read, write, and understand. It's syntax is also expressive and flexible, meaning that what could take you a number of lines of code in other languages could be accomplished much more concisely in Python. Lots of power can be stuffed into a single line of Python.
Pandas is a popular open-source Python library for data analysis, manipulation, and cleaning. Pandas provides data structures for storing datasets, as well as tools for working with them. These tools are incredibly wide ranging, and all sorts of data processing can be accomplished using the library.
This article will share 10 simple Python one liners for use with the Pandas library in order to get you accessing, manipulating, and managing data right away.
1. Read data from a CSV
This one liner is for reading data into a Pandas DataFrame from a CSV file.
df = pd.read_csv('data.csv')
2. Remove columns with null values
This one liner removes columns with any number of null values.
df.drop(df.columns[df.isnull().sum() > 0], axis=1, inplace=True)
3. Create new columna based on existing columns
This line of Python creates a new column based on existing columns.
df['new_col'] = df.apply(lambda x: x['col_1'] * x['col_2'], axis=1)
4. Group and calculate the mean of columns
Here's a one liner for grouping and calculating the mean of columns.
df.groupby('group_col').mean()
5. Filter rows based on specific values
This line of code is for filtering rows based on a specific value.
df.loc[df['col'] == 'value']
6. Sort a DataFrame by a specific column
This Python one liner is for sorting the dataframe by a specific column.
df.sort_values(by='col_name', ascending=False)
7. Fill all null values
This will fill all null values of a DataFrame with 0.
df.fillna(0)
8. Remove duplicate rows
This line of code will remove duplicate rows from your DataFrame.
df.drop_duplicates()
9. Create a pivot table
This one liner is for creating a pivot table.
df.pivot_table(index='col_1', columns='col_2', values='col_3')
10. Save to CSV file
And finally, this Python code will save a manipulated DataFrame to a new CSV file.
df.to_csv('new_data.csv', index=False)
This article has presented 10 simple Python one liners for for accessing, manipulating, and managing data with the Pandas library. Did we forget any? Drop some interesting Pandas one liners in the comments below.
Matthew Mayo (@mattmayo13) is a Data Scientist and the Editor-in-Chief of KDnuggets, the seminal online Data Science and Machine Learning resource. His interests lie in natural language processing, algorithm design and optimization, unsupervised learning, neural networks, and automated approaches to machine learning. Matthew holds a Master's degree in computer science and a graduate diploma in data mining. He can be reached at editor1 at kdnuggets[dot]com.