SQL Query Optimization Techniques
Learn how to optimize the queries written in SQL to make them execute faster and more memory efficient.
Image by Author
At the beginner level, we only focus on just writing and running the SQL queries. We do not bother about how much time it takes to execute or whether it can handle millions of records. But at the intermediate level, people expect your query to be optimized and take minimum time to execute.
Writing an optimized query in large applications with millions of records, like e-commerce platforms or banking systems, is imperative. Suppose you own an e-commerce company with more than a million products, and a customer wants to search for a product. What if the query you wrote in the backend takes more than a minute to fetch that product from the database? Will you think the customers buy products from your website?
You have to understand the importance of SQL query optimization. In this tutorial, I will show you some tips and tricks to optimize your SQL queries and make them to execute faster. The primary pre-requisite is that you must have a basic knowledge of SQL.
1. Use EXIST() instead of COUNT() to find a Specific Element in the Table
To check whether a specific element is present in the table, use the EXIST()
keyword instead of the COUNT()
will run the query in a more optimized way.
Using COUNT()
, the query needs to count all the occurrences of that particular element which may be inefficient when the database is extensive. On the other hand, EXIST()
will check only the first occurrence of that element and then stop when it finds the first occurrence. This saves a lot of time.
Also, you are only interested in finding whether a particular element is present or not. You are not interested in finding the number of occurrences. That’s why also EXIST()
is better.
SELECT
EXISTS(
SELECT
*
FROM
table
WHERE
myColumn = 'val'
);
The above query will return 1 if at least one table row contains an entry where a column named myColumn
has a value equal to val. Otherwise, it will return 0.
2. Use of Varchar instead of Char
Both char
and varchar
data types are used to store character strings in the table. But varchar
is much more memory efficient than char
.Â
The char datatype can only store the character string of fixed length defined. If the length of the string is less than the fixed length, then it will pad the blank spaces to make its length equal to the set length. This will unnecessarily waste memory in padding. For example,CHAR(100)
will take 100 bytes of memory even if a single character is stored.
On the other hand, varchar datatype stores the character string of variable length having a length less than the maximum length specified. It does not pad the blank spaces and only takes the memory equal to the string's actual length. For example, VARCHAR(100)
takes only 1 byte of memory when storing a single character.
CREATE TABLE myTable (
id INT PRIMARY KEY,
charCol CHAR(10),
varcharCol VARCHAR(10)
);
In the above example, a table myTable
is created having two columns, charCol
and varcharCol
having char and varchar datatypes respectively. charCol
will always take 10 bytes of memory. In contrast, varcharCol
takes memory equal to the actual size of the character string stored in it.
3. Avoid Subqueries in WHERE Clause
We must avoid using subqueries inside the WHERE clause to optimize an SQL query. As the subqueries can be expensive and difficult to execute when they return a large number of rows.
Instead of using the subquery, you can get the same result by using a join operation or writing a correlated subquery. A correlated subquery is a subquery in which the inner query depends on the outer query. And they are very efficient as compared to non-correlated subquery.
Below is an example to understand the difference between the two.
# Using a subquery
SELECT
*
FROM
orders
WHERE
customer_id IN (
SELECT
id
FROM
customers
WHERE
country = 'INDIA'
);
# Using a join operation
SELECT
orders.*
FROM
orders
JOIN customers ON orders.customer_id = customers.id
WHERE
customers.country = 'INDIA';
In the 1st example, the subquery first collects all the customer ids that belong to INDIA, and then the outer query will get all the orders of the selected customer ids. And in the 2nd example, we have achieved the same result by joining the customers
and orders
tables and then selecting only orders where the customers belong from INDIA.
In this way, we can optimize the query by avoiding the use of subqueries inside the WHERE clause and making them easier to read and understand.Â
4. Ordering JOINs from a Larger Table to a Smaller Table
Applying the JOIN
operation from a larger table to a smaller table is a common SQL optimization technique. Because joining from a larger table to a smaller table will make your query to execute faster. If we apply a JOIN
operation from a smaller table to a larger table, our SQL engine has to search in a larger table for matching rows. This is more resource-intensive and time-consuming. But on the other hand, if the JOIN
is applied from a larger table to a smaller table, then the SQL engine has to search in a smaller table for matching rows.
Here is an example for your better understanding.
# Order table is larger than the Customer table
# Join from a larger table to a smaller table
SELECT
*
FROM
Order
JOIN Customer ON Customer.id = Order.id
# Join from a smaller table to a larger table
SELECT
*
FROM
Customer
JOIN Order ON Customer.id = Order.id
5. Use `regexp_like` instead of `LIKE` Clause
Unlike the LIKE
clause, regexp_like
is also used for pattern searching. The LIKE
clause is a basic pattern-matching operator that can perform only basic operations like _ or %, which are used to match a single character or any number of characters respectively. The LIKE
clause must scan the complete database to find the particular pattern, which is slow for large tables.
On the other hand, regexp_like
is a more efficient, optimized, and powerful pattern-searching technique. It uses more complex regular expressions to find specific patterns in a character string. These regular expressions are more specific than simple wildcard matching because they allow you to search for the exact pattern that we are finding. Due to this, the amount of data that needs to be searched is reduced, and the query executes faster.
Please note that regexp_like
may not be present in all database management systems. Its syntax and functionality may vary in other systems.
Here is an example for your better understanding.
# Query using the LIKE clause
SELECT
*
FROM
mytable
WHERE
(
name LIKE 'A%'
OR name LIKE 'B%'
);
# Query using regexp_like clause
SELECT
*
FROM
mytable
WHERE
regexp_like(name, '^[AB].*');
The above queries are used to find the elements that name starts with A or B. In the first example, LIKE
is used to search all the names that start with A or B. A%
means that the first character is A; after that, any number of characters can be present. In the second example, regexp_like
is used. Inside ^[AB]
, ^
represents that the symbol will match at the beginning of the string, [AB]
represents that the beginning character can be A or B, and .*
represents all the characters after that.
Using regexp_like
, the database can quickly filter out the rows that don’t match the pattern, improving performance and reducing resource usage.
Conclusion
In this article, we have discussed various methods and tips to optimize the SQL query. This article gives you a clear understanding of how to write efficient SQL queries and the importance of optimizing them. There are many more ways of optimizing the queries, like preferring the use of integer values rather than characters or using Union All instead of Union when your table doesn’t contain duplicates, etc.
Aryan Garg is a B.Tech. Electrical Engineering student, currently in the final year of his undergrad. His interest lies in the field of Web Development and Machine Learning. He have pursued this interest and am eager to work more in these directions.