Are you preparing for an SQL interview? If so, having a solid understanding of indexes is crucial. Indexes are one of the most important performance-tuning factors in databases, and interviewers often focus on this topic to gauge a candidate’s knowledge and expertise.
In this comprehensive article, we’ll cover the top 25 SQL interview questions and answers related to indexes. Whether you’re a beginner or an experienced professional, these questions will help you strengthen your understanding and prepare you for the challenges of an SQL interview.
1. What is a SQL Server Index?
A SQL Server index is a database object that helps speed up data retrieval and query processing operations from a database table or view. It provides fast access to the table’s rows without scanning the entire table, similar to how a book’s index allows you to find specific information quickly without reading every page.
2. Describe the structure of a SQL Server Index.
A SQL Server index is created using a B-Tree structure, which consists of 8K pages called index nodes. The B-Tree structure has three main levels:
- Root Level: The top node containing a single index page from which SQL Server starts its data search.
- Leaf Level: The bottom level of nodes containing the data pages you’re looking for. The number of leaf pages depends on the amount of data stored in the index.
- Intermediate Level: One or more levels between the root and leaf levels, holding index key values and pointers to the next intermediate level pages or leaf data pages. The number of intermediate levels depends on the amount of data stored in the index.
3. What is the difference between a Heap table and a Clustered table?
A Heap table is a table in which the data rows are not stored in any particular order within each data page, and the data pages are not linked in a linked list. This is because a Heap table contains no clustered index.
On the other hand, a Clustered table is a table with a predefined clustered index on one or more columns that defines the storing order of the rows within the data pages and the order of the pages within the table based on the clustered index key.
4. How can you identify if a table is a Heap table?
You can identify a Heap table by querying the sys.partitions
system object, which has one row per partition with an index_id
value of 0. Alternatively, you can query the sys.indexes
system object, which shows the Heap table index details, with an index ID of 0 and a type of HEAP.
5. Explain the “Forwarding Pointers issue” and how to fix it.
The “Forwarding Pointers issue” occurs when data modification operations are performed on Heap table data pages. Forwarding Pointers are inserted into the Heap to point to the new location of the moved data. Over time, these forwarding pointers can cause performance issues due to the need to visit the old/original location and then the new location specified by the forwarding pointers to get a specific value.
Starting from SQL Server 2008, you can overcome the forwarding pointers performance issue by using the ALTER TABLE REBUILD
command, which rebuilds the Heap table.
6. Explain Index Depth, Density, and Selectivity, and how they affect index performance.
- Index Depth: The number of levels from the index root node to the leaf nodes. A deep index can suffer from performance degradation.
- Index Density: A measure of the lack of uniqueness of data in a table. A dense column has a high number of duplicates.
- Index Selectivity: A measure of how many rows are scanned compared to the total number of rows. An index with high selectivity means a small number of rows are scanned relative to the total number of rows.
These factors affect index performance as follows:
- A deep index can lead to performance degradation.
- A dense index (with many duplicates) can impact performance.
- An index with high selectivity (scanning fewer rows) can improve performance.
7. What is the difference between OLTP and OLAP workloads, and how does it affect index creation decisions?
- Online Transaction Processing (OLTP) databases are used for transactional systems, where most queries are data modification queries.
- Online Analytical Processing (OLAP) databases are used for data warehousing systems, where most queries are data retrieval queries that filter, group, aggregate, and join large data sets quickly.
For OLTP databases with heavily modified tables, you should avoid creating a large number of indexes, as they can impact data modification performance. On the other hand, for OLAP databases with low modification requirements, you can create more indexes to improve data retrieval performance.
8. Why is it not recommended to create indexes on small tables?
Creating indexes on small tables is generally not recommended because the SQL Server Engine can scan the underlying table faster than traversing the index when searching for specific data. Additionally, the index will still negatively affect the performance of data modification operations, as it will need to be adjusted whenever the underlying table’s data is modified.
9. What are the different ways to create an index?
You can create an index in SQL Server using the following methods:
CREATE INDEX
T-SQL statement- SQL Server Management Studio (right-click on the “Indexes” node of a table and choose “New Index”)
- Indirectly by defining
PRIMARY KEY
andUNIQUE
constraints withinCREATE TABLE
orALTER TABLE
statements
10. What are the pros and cons of using the ONLINE
index creation or rebuilding option?
Setting the ONLINE
option to ON
when creating or rebuilding an index allows other data retrieval or modification processes on the underlying table to continue, preventing the index creation process from locking the table. However, the ONLINE
index creation or rebuilding process takes longer than the offline default index creation process.
11. What is the difference between PAD_INDEX
and FILLFACTOR
?
FILLFACTOR
sets the percentage of free space that the SQL Server Engine will leave in the leaf level of each index page during index creation. It should be an integer value from 0 to 100, with 0 or 100 as the default value, where the pages will be filled completely during index creation.PAD_INDEX
applies the free space percentage specified byFILLFACTOR
to the index intermediate level pages during index creation.
12. How many Clustered indexes can be created on a table, and why?
SQL Server allows you to create only one Clustered index per table because the data can be sorted in the table using only one order criteria.
13. Describe the characteristics of an ideal Clustered index key.
An ideal Clustered index key should have the following characteristics:
- Short: Use as few columns as possible, with a maximum key size of 900 bytes.
- Static: Choose columns that are not changed frequently, as changing the Clustered index key values means the entire row will be moved to the new proper page.
- Increasing: Use an increasing (incrementing) column, such as an
IDENTITY
column, to improveINSERT
performance. - Unique: Declare the Clustered index key column(s) as unique to improve query performance.
- Accessed frequently: Rows will be stored in the Clustered index in a sorted order based on the index key, which is used to access the data.
- Used in the
ORDER BY
clause: This avoids the need for the SQL Server Engine to sort the data to display it, as the rows are already sorted based on the index key used in theORDER BY
clause.
14. Why is it not recommended to use GUID
and CHARACTER
columns as Clustered index keys?
- GUID columns: The random nature and large size (16 bytes) of
GUID
values can affect the Clustered index key sorting performance, as they are different from the continuously increasingIDENTITY
integer values. - CHARACTER columns: Challenges include limited sorting performance, large size, non-increasing values, non-static values that often change in business applications, and character comparison mechanisms that depend on the used collation.
15. What is the main difference between a Clustered and Non-Clustered index structure?
A Non-Clustered index differs from a Clustered index in that the underlying table rows are not stored and sorted based on the Non-Clustered key. The leaf level nodes of a Non-Clustered index are made of index pages that contain Non-Clustered index key values with pointers to the storage location of these rows in the underlying Heap table or Clustered index.
16. What is the difference between a Non-Clustered index built over a Heap table and one built over a Clustered table? What is the difference between a RID Lookup and a Key Lookup?
-
If a Non-Clustered index is built over a Heap table or view (which has no Clustered index), the leaf level nodes of that index hold the index key values and Row ID (RID) pointers to the location of the rows in the Heap table. The RID consists of the file identifier, data page number, and the number of rows on that data page.
-
If a Non-Clustered index is created over a Clustered table, the leaf level nodes of that index contain Non-Clustered index key values and clustering keys for the base table, which are the locations of the rows in the Clustered index data pages.
-
A RID Lookup operation retrieves the rest of the columns that are not available in the index from the Heap table based on the ID of each row.
-
A Key Lookup operation retrieves the rest of the columns that are not available in the index from the Clustered index, based on the Clustered key of each row.
17. How can you benefit from the INCLUDE
feature to overcome Non-Clustered index limitations?
Instead of creating a Non-Clustered index with a wide key, you can include large columns that are used to cover the query as non-key columns in the Non-Clustered index using the INCLUDE
clause of the CREATE INDEX
statement. This feature, introduced in SQL Server 2005, allows you to include up to 1023 non-key columns in a Non-Clustered index with a minimum of one key column.
The INCLUDE
feature extends the functionality of Non-Clustered indexes by allowing you to cover more queries by adding columns as non-key columns, which are stored and sorted only in the leaf level of the index. This can help avoid exceeding the Non-Clustered index size limit of 900 bytes and 16 columns in the index key and include columns with data types that are not allowed in the index key, such as VARCHAR(MAX)
, NVARCHAR(MAX)
, text
, ntext
, and image
.
18. Which type of indexes is used to maintain the data integrity of the columns on which it is created?
Unique Indexes are used to maintain the data integrity of the columns on which they are created by ensuring that there are no duplicate values in the index key and the table rows on which that index is created.
19. How can Filtered indexes improve query performance?
Filtered indexes use a filter predicate to improve the performance of queries that retrieve a well-defined subset of rows from the table by indexing only a portion of the table rows. The smaller size of the Filtered index, which consumes less disk space compared to a full-table index, and the more accurate filtered statistics, which cover only the filtered index rows with minimal maintenance cost, help improve query performance by generating a more optimal execution plan.
20. What are the different ways to retrieve the properties of the columns participating in a SQL Server index?
You can retrieve the properties of the columns participating in a SQL Server index using the following methods:
- SQL Server Management Studio (SSMS): Expand the “Indexes” node under a database table, right-click on each index, and choose the “Properties” option.
sp_helpindex
system stored procedure: Provide the name of the table for which you want to list its indexes.sys.indexes
system dynamic management view: Contains one row per index in the table or view. It’s recommended to joinsys.indexes
with other system DMVs, such assys.index_columns
,sys.columns
, andsys.tables
, to return meaningful information about the indexes.
21. How can you get the fragmentation percentage of a database index?
You can get the fragmentation percentage of a database index using the following methods:
- SQL Server Management Studio (SSMS): From the “Fragmentation” tab of the index Properties window.
sys.dm_db_index_physical_stats
dynamic management function: Introduced in SQL Server 2005, this DMF can be joined with thesys.indexes
DMV to return the fragmentation percentage of all indexes under the specified database.
22. When checking the index usage statistics retrieved by querying the sys.dm_db_index_usage_stats
dynamic management view, explain the results of the returned number of seeks, scans, lookups, and updates.
- Seeks: The number of times the index is used to find a specific row.
- Scans: The number of times the leaf pages of the index are scanned.
- Lookups: The number of times a Clustered index is used by a Non-Clustered index to fetch the full row.
- Updates: The number of times the index data has been modified.
23. What is the difference between index REBUILD
and REORGANIZE
operations?
- Index
REBUILD
operation removes fragmentation by dropping the index and creating it again, defragmenting all index levels, compacting the index pages using theFILLFACTOR
values specified in the rebuild command (or using the existing value if not specified), and updating the index statistics using a full scan of all the data. - Index
REORGANIZE
operation physically reorders the leaf level pages of the index to match the logical order of the leaf nodes. It is always performed online.
Microsoft recommends rebuilding the index if the fragmentation percentage exceeds 30% and reorganizing the index if the fragmentation percentage exceeds 5% but is less than 30%.
24. How can you find missing indexes that are needed to potentially improve the performance of your queries?
You can find missing indexes that may improve query performance using the following methods:
- The “Missing Index Details” option in the query execution plan, if available.
- The
sys.dm_db_missing_index_details
dynamic management view, which returns detailed information about missing indexes, excluding spatial indexes. - A combination of the SQL Server Profiler and the Database Engine Tuning Advisor tools.
25. Why is an index described as a “double-edged sword”?
An index is described as a “double-edged sword” because:
- A well-designed index can enhance system performance and speed up data retrieval.
- However, a poorly designed index can cause performance degradation, consume extra disk space, and delay data insertion and modification operations.
It’s always recommended to test the performance of the system before and after adding an index to the development environment, before deploying it to the production environment.
By understanding these key concepts and being prepared to answer questions related to SQL Server indexes, you’ll increase your chances of acing your SQL interview. Remember, practice and hands-on experience are crucial in mastering this topic, so don’t hesitate to explore further and apply your knowledge.
Good luck with your SQL interview preparation!
Top 25 SQL Interview Questions and Answers(The BEST SQL Interview Questions)
FAQ
What is index in SQL interview questions?
What is true about indexes in SQL?
How do I choose a good index in SQL?