How To Speed Up SQL Queries Using Indexes [Python Edition]
Learn to work with SQLite databases using Python’s built-in sqlite3 module. Also learn how to create indexes to speed up queries.
Image by Author
Suppose you’re sifting through the pages of a book. And you want to find the information that you’re looking for much faster. How’d you do that? Well, you’d probably look up the index of terminologies and then jump to the pages that reference a particular term. Indexes in SQL work similarly to the indexes in books.
In most real-world systems, you’ll run queries against a database table with a large number of rows (think millions of rows). Queries that require a full-table scan through all the rows to retrieve the results will be quite slow. If you know that you’ll have to query information based on some of the columns often, you can create database indexes on those columns. This will speed up the query significantly.
So what’ll we learn today? We’ll learn to connect to and query a SQLite database in Python—using the sqlite3 module. We’ll also learn how to add indexes and see how it improves performance.
To code along to this tutorial, you should have Python 3.7+ and SQLite installed in your working environment.
Note: The examples and sample output in this tutorial are for Python 3.10 and SQLite3 (version 3.37.2) on Ubuntu LTS 22.04.
Connecting to a Database in Python
We’ll use the built-in sqlite3 module. Before we start running queries, we need to:
- connect to the database
- create a database cursor to run queries
To connect to the database, we’ll use the connect()
function from sqlite3 module. Once we have established a connection, we can call cursor()
on the connection object to create a database cursor as shown:
import sqlite3
# connect to the db
db_conn = sqlite3.connect('people_db.db')
db_cursor = db_conn.cursor()
Here we try to connect to the database people_db
. If the database doesn’t exist, running the above snippet will create the sqlite database for us.
Creating a Table and Inserting Records
Now, we’ll create a table in the database and populate it with records.
Let's create a table named people in the people_db
database with the following fields:
- name
- job
# main.py
...
# create table
db_cursor.execute('''CREATE TABLE people (
id INTEGER PRIMARY KEY,
name TEXT,
email TEXT,
job TEXT)''')
...
# commit the transaction and close the cursor and db connection
db_conn.commit()
db_cursor.close()
db_conn.close()
Synthetic Data Generation with Faker
We now have to insert records into the table. To do this, we’ll use the Faker—a Python package for synthetic data generation—installable through pip:
$ pip install faker
After installing faker, you can import the Faker
class into the Python script:
# main.py
...
from faker import Faker
...
The next step is to generate and insert records into the people table. Just so we know how indexes can speed up queries, let’s insert a large number of records. Here, we’ll insert 100K records; set the num_records
variable to 100000.
Then, we do the following:
- Instantiate a
Faker
objectfake
and set the seed so we get reproducibility. - Get a name string using first and last names—by calling
first_name()
andlast_name()
on thefake
object. - Generate a fake domain by calling
domain_name()
. - Use the first and last names and the domain to generate the email field.
- Get a job for each individual record using
job()
.
We generate and insert records into the people
table:
# create and insert records
fake = Faker() # be sure to import: from faker import Faker
Faker.seed(42)
num_records = 100000
for _ in range(num_records):
first = fake.first_name()
last = fake.last_name()
name = f"{first} {last}"
domain = fake.domain_name()
email = f"{first}.{last}@{domain}"
job = fake.job()
db_cursor.execute('INSERT INTO people (name, email, job) VALUES (?,?,?)', (name,email,job))
# commit the transaction and close the cursor and db connection
db_conn.commit()
db_cursor.close()
db_conn.close()
Now the main.py file has the following code:
# main.py
# imports
import sqlite3
from faker import Faker
# connect to the db
db_conn = sqlite3.connect('people_db.db')
db_cursor = db_conn.cursor()
# create table
db_cursor.execute('''CREATE TABLE people (
id INTEGER PRIMARY KEY,
name TEXT,
email TEXT,
job TEXT)''')
# create and insert records
fake = Faker()
Faker.seed(42)
num_records = 100000
for _ in range(num_records):
first = fake.first_name()
last = fake.last_name()
name = f"{first} {last}"
domain = fake.domain_name()
email = f"{first}.{last}@{domain}"
job = fake.job()
db_cursor.execute('INSERT INTO people (name, email, job) VALUES (?,?,?)', (name,email,job))
# commit the transaction and close the cursor and db connection
db_conn.commit()
db_cursor.close()
db_conn.close()
Run this script—once—to populate the table with num_records
number of records.
Querying the Database
Now that we have the table with 100K records, let’s run a sample query on the people
table.
Let’s run a query to:
- get the names and emails of the records where the job title ‘Product manager’, and
- limit the query results to 10 records.
We’ll use the default timer from the time module to get the approximate execution time for the query.
# sample_query.py
import sqlite3
import time
db_conn = sqlite3.connect("people_db.db")
db_cursor = db_conn.cursor()
t1 = time.perf_counter_ns()
db_cursor.execute("SELECT name, email FROM people WHERE job='Product manager' LIMIT 10;")
res = db_cursor.fetchall()
t2 = time.perf_counter_ns()
print(res)
print(f"Query time without index: {(t2-t1)/1000} us")
Here’s the output:
Output >>
[
("Tina Woods", "Tina.Woods@smith.com"),
("Toni Jackson", "Toni.Jackson@underwood.com"),
("Lisa Miller", "Lisa.Miller@solis-west.info"),
("Katherine Guerrero", "Katherine.Guerrero@schmidt-price.org"),
("Michelle Lane", "Michelle.Lane@carr-hardy.com"),
("Jane Johnson", "Jane.Johnson@graham.com"),
("Matthew Odom", "Matthew.Odom@willis.biz"),
("Isaac Daniel", "Isaac.Daniel@peck.com"),
("Jay Byrd", "Jay.Byrd@bailey.info"),
("Thomas Kirby", "Thomas.Kirby@west.com"),
]
Query time without index: 448.275 us
You can also invoke the SQLite command-line client by running sqlite3 db_name
at the command line:
$ sqlite3 people_db.db
SQLite version 3.37.2 2022-01-06 13:25:41
Enter ".help" for usage hints.
To get the list of indexes, you can run .index
:
sqlite> .index
As there are no indexes currently, no index will be listed.
You can also check the query plan like so:
sqlite> EXPLAIN QUERY PLAN SELECT name, email FROM people WHERE job='Product Manager' LIMIT 10;
QUERY PLAN
`--SCAN people
Here the query plan is to scan all the rows which is inefficient.
Creating an Index on a Specific Column
To create a database index on a particular column you can use the syntax:
CREATE INDEX index-name on table (column(s))
Say we need to frequently look up the records of individuals with a particular job title. It’d help to create an index people_job_index
on the job column:
# create_index.py
import time
import sqlite3
db_conn = sqlite3.connect('people_db.db')
db_cursor =db_conn.cursor()
t1 = time.perf_counter_ns()
db_cursor.execute("CREATE INDEX people_job_index ON people (job)")
t2 = time.perf_counter_ns()
db_conn.commit()
print(f"Time to create index: {(t2 - t1)/1000} us")
Output >>
Time to create index: 338298.6 us
Even though creating the index takes this long, it's a one-time operation. You will still get substantial speed-up when running multiple queries.
Now if you run .index
at the SQLite command-line client, you’ll get:
sqlite> .index
people_job_index
Querying the Database with Index
If you now look at the query plan, you should be able to see that we now search people
table using index people_job_index
on the job column:
sqlite> EXPLAIN QUERY PLAN SELECT name, email FROM people WHERE job='Product manager' LIMIT 10;
QUERY PLAN
`--SEARCH people USING INDEX people_job_index (job=?)
You can re-run sample_query.py. Only modify the print()
statement and see how long it takes for the query to run now:
# sample_query.py
import sqlite3
import time
db_conn = sqlite3.connect("people_db.db")
db_cursor = db_conn.cursor()
t1 = time.perf_counter_ns()
db_cursor.execute("SELECT name, email FROM people WHERE job='Product manager' LIMIT 10;")
res = db_cursor.fetchall()
t2 = time.perf_counter_ns()
print(res)
print(f"Query time with index: {(t2-t1)/1000} us")
Here’s the output:
Output >>
[
("Tina Woods", "Tina.Woods@smith.com"),
("Toni Jackson", "Toni.Jackson@underwood.com"),
("Lisa Miller", "Lisa.Miller@solis-west.info"),
("Katherine Guerrero", "Katherine.Guerrero@schmidt-price.org"),
("Michelle Lane", "Michelle.Lane@carr-hardy.com"),
("Jane Johnson", "Jane.Johnson@graham.com"),
("Matthew Odom", "Matthew.Odom@willis.biz"),
("Isaac Daniel", "Isaac.Daniel@peck.com"),
("Jay Byrd", "Jay.Byrd@bailey.info"),
("Thomas Kirby", "Thomas.Kirby@west.com"),
]
Query time with index: 167.179 us
We see that the query now takes about 167.179 microseconds to execute.
Performance Improvement
For our sample query, querying with index is about 2.68 times faster. And we get a percentage speedup of 62.71% in execution times.
You can also try running a few more queries: queries that involve filtering on the job column and see the performance improvement.
Also note: Because we have created an index only on the job column, if you are running queries that involve other columns, the queries will not run any faster than without index.
Wrap-Up and Next Steps
I hope this guide helped you understand how creating database indexes—on frequently queried columns—can significantly speed up queries. This is an introduction to database indexes. You can also create multi-column indexes, multiple indexes for the same column, and much more.
You can find all the code used in this tutorial in this GitHub repository. Happy coding!
Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more.