Top 10 Advanced Data Science SQL Interview Questions You Must Know How to Answer

In this article, we will give a list of commonly asked SQL interview questions to help you prepare for your coming technical interview.

By Sonia Jamil, Executive Database Admin (DBA) at Ufone on January 27, 2023 in SQL

Top 10 Advanced Data Science SQL Interview Question s You Must Know How to Answer

Image by Author

Introduction

SQL( Structured Query Language) is a standard programming language used for managing and manipulating databases. It's an essential skill for any data professional, as it allows them to effectively retrieve and analyze data stored in a database. As a result, SQL is a common topic in technical interviews for positions that involve working with data, similar to data analysts, data engineers, and database administrators.

Question 01

SQL query to find nth highest salary/payment or 3rd Highest salary/payment

To find the nth highest salary, you can use a subquery with the DENSE_RANK() function to calculate the dense rank of each salary, and then filter the results to only include the row with the ranking equal to n.

SELECT 
  * 
FROM 
  (
    SELECT 
      name, 
      salary, 
      DENSE_RANK() OVER (
        ORDER BY 
          salary DESC
      ) as salary_rank 
    FROM 
      employees
  ) subquery 
WHERE 
  salary_rank = n;

You can also use the LIMIT and OFFSET clauses to find the nth highest salary, as follows:

SELECT 
  name, 
  salary 
FROM 
  employees 
ORDER BY 
  salary DESC 
LIMIT 
  1 OFFSET (n - 1);

For example, to find the 3rd highest salary, you would use the following query:

SELECT 
  name, 
  salary 
FROM 
  employees 
ORDER BY 
  salary DESC 
LIMIT 
  1 OFFSET 2;

Question 02

How do you optimize SQL queries for performance?

There are several ways to optimize SQL queries for better performance, including

Indexing

Creating an index on a column or group of columns can significantly improve the speed of queries that filter on those columns.

Partitioning

Partitioning a large table into smaller pieces can improve the performance of queries that only need to access a subset of the data.

Normalization

Normalization involves organizing the data in a database so that each piece of data is stored in only one place, reducing redundancy and perfecting the integrity of the data.

Use of appropriate data types

Using the correct data type for each column can improve the performance of queries that filter or sort on those columns.

Use of appropriate JOIN types

Using the correct JOIN type(e.g., INNER JOIN, OUTER JOIN, CROSS JOIN) can improve the performance of queries that join multiple tables.

Use of appropriate aggregate functions

Using appropriate aggregate functions(e.g., SUM, AVG, MIN, MAX) can improve the performance of queries that perform calculations on large sets of data. Some aggregate functions, similar to COUNT, are more effective than others, so it's important to choose the applicable function for your query.

Question 03

How do you use the LAG and LEAD functions in SQL? Can you give an example of their use?

The LAG() and LEAD() functions are window functions in SQL that allow you to compare values in a row with values in a preceding or following row, respectively. They're useful for calculating running totals, or for comparing values in a table with values in a previous or subsequent row.

The LAG() function takes two arguments: the column to be returned, and the number of rows to go back. For example

SELECT 
  name, 
  salary, 
  LAG(salary, 1) OVER (
    ORDER BY 
      salary DESC
  ) as prev_salary 
FROM 
  employees;

The LEAD() function works in a similar way, but goes forward rather than backward. For example

SELECT 
  name, 
  salary, 
  LEAD(salary, 1) OVER (
    ORDER BY 
      salary DESC
  ) as next_salary 
FROM 
  employees

Question 04

Explain ETL and ELT concept in SQL

ETL( Extract, Transform, Load) is a process used in SQL to extract data from one or more sources, transform the data into a format suitable for analysis or other uses, and then load the data into a target system, such as a data warehouse or data lake.

ELT( Extract, Load, Transform) is similar to ETL, but the Transform phase is performed after the data is loaded into the target system, rather than before. This allows the target system to perform the transformations, which can be more efficient and scalable than performing the transformations in an ETL tool. ELT is often used in modern data infrastructures, which use powerful data processing engines( such as Apache Spark or Apache Flink) to perform the Transform phase.

Question 05

Can you explain the difference between the WHERE and HAVING clauses in SQL

The WHERE and HAVING clauses are both used to filter rows from a SELECT statement. The main difference between the two is that the WHERE clause is used to filter rows before the group by operation, while the HAVING clause is used to filter rows after the group by operation.

SELECT 
  department, 
  SUM(salary) 
FROM 
  employees 
GROUP BY 
  department 
HAVING 
  SUM(salary) > 100000;

In this example, the HAVING clause is used to filter out any departments where the sum of the salaries for employees in that department is less than 100000. This is done after the group by operation, so it only affects the rows that represent each department.

SELECT 
  * 
FROM 
  employees 
WHERE 
  salary > 50000;

In this example, the WHERE clause is used to filter out any employees with a salary of less than 50000. This is done before any group by operation, so it affects all rows in the employees table.

Question 06

Explain the difference between TRUNCATE, DROP, and DELETE operations in SQL

TRUNCATE

The TRUNCATE operation removes all rows from a table, but it doesn't affect the structure of the table. It's faster than DELETE, because it doesn't generate any undo or redo logs and doesn't fire any delete triggers.

Here's an example of using the TRUNCATE statement

TRUNCATE TABLE employees;

This statement removes all rows from the employees table, but the table structure, including column names and data types, remains unchanged.

DROP

The DROP operation removes a table from the database and removes all data in the table. It also removes any indexes, triggers, and constraints associated with the table.

Here's an example of using the DROP statement

DROP 
  TABLE employees;

This statement removes the employees table from the database and all data in the table is permanently deleted. The table structure is also removed.

DELETE

The DELETE operation removes one or more rows from a table. It allows you to specify a WHERE clause to select the rows to delete. It also generates undo and redo logs, and fires cancel triggers.

Here's an example of using the DELETE statement

DELETE FROM 
  employees 
WHERE 
  salary & lt;
50000;

This statement removes all rows from the employees table where the salary is less than 50000 The table structure remains unchanged, and the deleted rows can be recovered using the undo logs.

Question 07

Which is more efficient join or subquery?

It's generally more efficient to use a JOIN rather than a subquery when combining data from multiple tables. This is because a JOIN allows the database to execute the query more efficiently by using indices on the joined tables.

For example, consider the following two queries that return the same results:

SELECT 
  * 
FROM 
  orders o 
WHERE 
  o.customer_id IN (
    SELECT 
      customer_id 
    FROM 
      customers 
    WHERE 
      country = 'US'
  );

SELECT 
  * 
FROM 
  orders o 
WHERE 
  o.customer_id IN (
    SELECT 
      customer_id 
    FROM 
      customers 
    WHERE 
      country = 'US'
  );

The first query uses a JOIN to combine the orders and customers tables, and then filters the results using a WHERE clause. The second query uses a subquery to select the relevant customer IDs from the customers table, and then uses the IN operator to filter the orders table based on those IDs.

Question 08

How do you use window functions in SQL?

In SQL, a window function is a function that operates on a set of rows, or a" window", defined by a window specification. Window functions are used to perform calculations across rows, and they can be used in SELECT, UPDATE, and DELETE statements, as well as in the WHERE and HAVING clauses of a SELECT statement.

Here's an example of using a window function in a SELECT statement:

SELECT 
  name, 
  salary, 
  AVG(salary) OVER (PARTITION BY department_id) as avg_salary_by_department 
FROM 
  employees

This statement returns a result set with three columns: name, salary, and avg_salary_by_department. The avg_salary_by_department column is calculated using the AVG window function, which calculates the average salary for each department. The PARTITION BY clause specifies that the window is partitioned by department_id, meaning that the average salary is calculated separately for each department.

Question 09

Explain Normalization

Normalization is the process of organizing a database in a way that reduces redundancy and dependency. It is a systematic approach to decomposing tables to eliminate data redundancy and improve data integrity. There are several normal forms that can be used to normalize a database. The most common normal forms are:

First Normal Form (1NF)

Each cell in the table contains a single value, and not a list of values
Each column in the table has a unique name
The table does not contain any repeating groups of columns

Second Normal Form (2NF)

It is in first normal form
It does not have any partial dependencies (that is, a non-prime attribute is dependent on a part of a composite primary key)

Third Normal Form (3NF)

It is in second normal form
It does not have any transitive dependencies (that is, a non-prime attribute is dependent on another non-prime attribute)

Boyce-Codd Normal Form (BCNF)

It is in third normal form
Every determinant (an attribute that determines the value of another attribute) is a candidate key (a column or set of columns that can be used as a primary key)

Question 10

Explain Exclusive Lock and Update Lock in SQL

An exclusive lock is a lock that prevents other transactions from reading or writing to the locked rows. This type of lock is typically used when a transaction needs to modify the data in a table, and it wants to ensure that no other transactions can access the table at the same time.

An update lock is a lock that allows other transactions to read the locked rows, but it prevents them from updating or writing to the locked rows. This type of lock is typically used when a transaction needs to read the data in a table, but it wants to ensure that the data is not modified by other transactions until the current transaction is finished.

Sonia Jamil is currently employed as a Database Analyst at one of Pakistan's largest telecommunications companies. In addition to her full-time job, she also work as a freelancer. Her background includes expertise in database administration and experience with both on-premises and cloud-based SQL Server environments. She is proficient in the latest SQL Server technologies and have a strong interest in data management and data analytics.