Top 10 Advanced Data Science SQL Interview Questions You Must Know How to Answer
In this article, we will give a list of commonly asked SQL interview questions to help you prepare for your coming technical interview.
Image by AuthorÂ
Introduction
SQL( Structured Query Language) is a standard programming language used for managing and manipulating databases. It's an essential skill for any data professional, as it allows them to effectively retrieve and analyze data stored in a database. As a result, SQL is a common topic in technical interviews for positions that involve working with data, similar to data analysts, data engineers, and database administrators.Â
Question 01
SQL query to find nth highest salary/payment or 3rd Highest salary/payment
To find the nth highest salary, you can use a subquery with the DENSE_RANK() function to calculate the dense rank of each salary, and then filter the results to only include the row with the ranking equal to n.
SELECT
*
FROM
(
SELECT
name,
salary,
DENSE_RANK() OVER (
ORDER BY
salary DESC
) as salary_rank
FROM
employees
) subquery
WHERE
salary_rank = n;
You can also use the LIMIT and OFFSET clauses to find the nth highest salary, as follows:
SELECT
name,
salary
FROM
employees
ORDER BY
salary DESC
LIMIT
1 OFFSET (n - 1);
For example, to find the 3rd highest salary, you would use the following query:
SELECT
name,
salary
FROM
employees
ORDER BY
salary DESC
LIMIT
1 OFFSET 2;
Question 02
How do you optimize SQL queries for performance?
There are several ways to optimize SQL queries for better performance, including
IndexingÂ
Creating an index on a column or group of columns can significantly improve the speed of queries that filter on those columns.Â
PartitioningÂ
Partitioning a large table into smaller pieces can improve the performance of queries that only need to access a subset of the data.Â
NormalizationÂ
Normalization involves organizing the data in a database so that each piece of data is stored in only one place, reducing redundancy and perfecting the integrity of the data.
Use of appropriate data typesÂ
Using the correct data type for each column can improve the performance of queries that filter or sort on those columns.Â
Use of appropriate JOIN typesÂ
Using the correct JOIN type(e.g., INNER JOIN, OUTER JOIN, CROSS JOIN) can improve the performance of queries that join multiple tables.
Use of appropriate aggregate functionsÂ
Using appropriate aggregate functions(e.g., SUM, AVG, MIN, MAX) can improve the performance of queries that perform calculations on large sets of data. Some aggregate functions, similar to COUNT, are more effective than others, so it's important to choose the applicable function for your query.
Question 03
How do you use the LAG and LEAD functions in SQL? Can you give an example of their use?
The LAG() and LEAD() functions are window functions in SQL that allow you to compare values in a row with values in a preceding or following row, respectively. They're useful for calculating running totals, or for comparing values in a table with values in a previous or subsequent row.
The LAG() function takes two arguments: the column to be returned, and the number of rows to go back. For example
SELECT
name,
salary,
LAG(salary, 1) OVER (
ORDER BY
salary DESC
) as prev_salary
FROM
employees;
The LEAD() function works in a similar way, but goes forward rather than backward. For example
SELECT
name,
salary,
LEAD(salary, 1) OVER (
ORDER BY
salary DESC
) as next_salary
FROM
employees
Question 04
Explain ETL and ELT concept in SQL
ETL( Extract, Transform, Load) is a process used in SQL to extract data from one or more sources, transform the data into a format suitable for analysis or other uses, and then load the data into a target system, such as a data warehouse or data lake.
ELT( Extract, Load, Transform) is similar to ETL, but the Transform phase is performed after the data is loaded into the target system, rather than before. This allows the target system to perform the transformations, which can be more efficient and scalable than performing the transformations in an ETL tool. ELT is often used in modern data infrastructures, which use powerful data processing engines( such as Apache Spark or Apache Flink) to perform the Transform phase.
Question 05
Can you explain the difference between the WHERE and HAVING clauses in SQLÂ
The WHERE and HAVING clauses are both used to filter rows from a SELECT statement. The main difference between the two is that the WHERE clause is used to filter rows before the group by operation, while the HAVING clause is used to filter rows after the group by operation.
SELECT
department,
SUM(salary)
FROM
employees
GROUP BY
department
HAVING
SUM(salary) > 100000;
In this example, the HAVING clause is used to filter out any departments where the sum of the salaries for employees in that department is less than 100000. This is done after the group by operation, so it only affects the rows that represent each department.
SELECT
*
FROM
employees
WHERE
salary > 50000;
In this example, the WHERE clause is used to filter out any employees with a salary of less than 50000. This is done before any group by operation, so it affects all rows in the employees table.
Question 06
Explain the difference between TRUNCATE, DROP, and DELETE operations in SQL
TRUNCATE
The TRUNCATE operation removes all rows from a table, but it doesn't affect the structure of the table. It's faster than DELETE, because it doesn't generate any undo or redo logs and doesn't fire any delete triggers.
Here's an example of using the TRUNCATE statement
TRUNCATE TABLE employees;
This statement removes all rows from the employees table, but the table structure, including column names and data types, remains unchanged.
DROP
The DROP operation removes a table from the database and removes all data in the table. It also removes any indexes, triggers, and constraints associated with the table.
Here's an example of using the DROP statement
DROP
TABLE employees;
This statement removes the employees table from the database and all data in the table is permanently deleted. The table structure is also removed.
DELETE
The DELETE operation removes one or more rows from a table. It allows you to specify a WHERE clause to select the rows to delete. It also generates undo and redo logs, and fires cancel triggers.
Here's an example of using the DELETE statement
DELETE FROM
employees
WHERE
salary & lt;
50000;
This statement removes all rows from the employees table where the salary is less than 50000 The table structure remains unchanged, and the deleted rows can be recovered using the undo logs.
Question 07
Which is more efficient join or subquery?
It's generally more efficient to use a JOIN rather than a subquery when combining data from multiple tables. This is because a JOIN allows the database to execute the query more efficiently by using indices on the joined tables.
For example, consider the following two queries that return the same results:
SELECT
*
FROM
orders o
WHERE
o.customer_id IN (
SELECT
customer_id
FROM
customers
WHERE
country = 'US'
);
SELECT
*
FROM
orders o
WHERE
o.customer_id IN (
SELECT
customer_id
FROM
customers
WHERE
country = 'US'
);
The first query uses a JOIN to combine the orders and customers tables, and then filters the results using a WHERE clause. The second query uses a subquery to select the relevant customer IDs from the customers table, and then uses the IN operator to filter the orders table based on those IDs.
Question 08
How do you use window functions in SQL?
In SQL, a window function is a function that operates on a set of rows, or a" window", defined by a window specification. Window functions are used to perform calculations across rows, and they can be used in SELECT, UPDATE, and DELETE statements, as well as in the WHERE and HAVING clauses of a SELECT statement.
Here's an example of using a window function in a SELECT statement:
SELECT
name,
salary,
AVG(salary) OVER (PARTITION BY department_id) as avg_salary_by_department
FROM
employees
This statement returns a result set with three columns: name, salary, and avg_salary_by_department. The avg_salary_by_department column is calculated using the AVG window function, which calculates the average salary for each department. The PARTITION BY clause specifies that the window is partitioned by department_id, meaning that the average salary is calculated separately for each department.
Question 09
Explain Normalization
Normalization is the process of organizing a database in a way that reduces redundancy and dependency. It is a systematic approach to decomposing tables to eliminate data redundancy and improve data integrity. There are several normal forms that can be used to normalize a database. The most common normal forms are:
First Normal Form (1NF)Â Â Â Â Â Â Â Â Â
- Each cell in the table contains a single value, and not a list of values
- Each column in the table has a unique name
- The table does not contain any repeating groups of columns
Second Normal Form (2NF)
- It is in first normal form
- It does not have any partial dependencies (that is, a non-prime attribute is dependent on a part of a composite primary key)
Third Normal Form (3NF)
- It is in second normal form
- It does not have any transitive dependencies (that is, a non-prime attribute is dependent on another non-prime attribute)
Boyce-Codd Normal Form (BCNF)
- It is in third normal form
- Every determinant (an attribute that determines the value of another attribute) is a candidate key (a column or set of columns that can be used as a primary key)
Question 10
Explain Exclusive Lock and Update Lock in SQL
An exclusive lock is a lock that prevents other transactions from reading or writing to the locked rows. This type of lock is typically used when a transaction needs to modify the data in a table, and it wants to ensure that no other transactions can access the table at the same time.
An update lock is a lock that allows other transactions to read the locked rows, but it prevents them from updating or writing to the locked rows. This type of lock is typically used when a transaction needs to read the data in a table, but it wants to ensure that the data is not modified by other transactions until the current transaction is finished.
Sonia Jamil is currently employed as a Database Analyst at one of Pakistan's largest telecommunications companies. In addition to her full-time job, she also work as a freelancer. Her background includes expertise in database administration and experience with both on-premises and cloud-based SQL Server environments. She is proficient in the latest SQL Server technologies and have a strong interest in data management and data analytics.