T-SQL/MSBI Knowledge Share: Clustered/Non-Clustered Index (Key Lookup/ Book Mark Lookup and RID Lookup)

Index over the table is similar to the index what we see in the books. Index is used to access the content quickly or to make the key word search faster.

Clustered Index:
This is similar to the telephone directory, where the name is arranged in a sorted order and we directly located the name and then the phone number in front of it. When a primary key is created on a table, by default clustered index is created. There can be only one clustered index per table. The column on which index is created will be physically sorted.

Non - Clustered Index:
This is similar to the index behind the text book, where we look for a key word in the index and find the page. And then go to that page and again search manually for the key word. We cannot locate directly like the phone number we got in case of clustered index. When a unique key is created on the table, by default a non-clustered index is created. There can be multiple non-clustered indexes created on a table. The column on which index is created will be logically sorted.

Index is made up of a set of pages called index nodes that are organized in a B-tree structure (Binary tree). This structure is hierarchical in nature, with the root node at the top of the hierarchy and the leaf nodes at the bottom, as shown in below figure.

When you see the execution plan, you see following few words:

Table Scan
Index Scan
Index Seek
RID Lookup
Key Lookup/ Book Mark lookup

Now lets look into these.

Clustered Index Scan and Table Scan is almost same and expensive, because in case of Clustered index all data resides in the leaf level of clustered index page and in case of Table scan all the data resides in data pages. Suppose you have 100 columns and you are interested in 10 columns and you use only 10 columns in your select query, in case of Clustered index scan even though you are interested in 10 columns the I/O it takes is for all the columns because the leaf level has all 100 columns. In case of Table scan all data resides in the data pages and will take time to I/O 100 columns even though you are interested in 10 columns.
Non Clustered Index scan is better than the clustered index scan. In case of non clustered index scan, the index page will have only those columns which are defined while creating the index. So even though your table has 100 columns the I/O it takes is for only those columns which are defined in the index.
Clustered Index Scan and Table Scan both are same. Suppose in our example below, we have only clustered index on ID column and if we execute the below query, we get the Clustered index scan.Because here we have index on ID but not on City. In the same query instead of city if we use the ID column to filter, you get the clustered index seek.

NOTE: Index works better on a column when the redundancy of data is less i.e. less repeating values.

IF EXISTS(SELECT * FROM sys.objects WHERE name='OneIndex')

DROP TABLE OneIndex

-- Create Table OneIndex with few columns

CREATE TABLE OneIndex (ID INT,

FirstName VARCHAR(100),

LastName VARCHAR(100),

City VARCHAR(100))

-- Insert One Hundred Thousand Records

INSERT INTO OneIndex (ID,FirstName,LastName,City)

SELECT TOP 100000000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,

'Bob',

CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'

ELSE 'Brown' END,

CASE

WHEN ROW_NUMBER() OVER (ORDER BY a.name)%1000 = 1 THEN 'Las Vegas'

WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York'

WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino'

WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles'

ELSE 'Houston' END

FROM sys.all_objects a

CROSS JOIN sys.all_objects b

UPDATE OneIndex

SET City='California'

WHERE City='Las Vegas'

UPDATE OneIndex

SET City='Las Vegas'

WHERE ID =52

Table with no Index :

Now create a CLUSTERED index on ID column and search for city

Now create a CLUSTERED index on ID column and search for ID

In the below figure you can see that the clustered index is created on the ID column. The index page is structured in to B-Tree. So the search value 3 falls between 1 to 25 at the leaf node. The search is shown with the arrow in the figure. In case of clustered index the leaf node contains the data.

CASE 1 (RID Lookup): RID Lookup happens when your table has a NONCLUSTERED index without INCLUDING any columns from the SELECT and has no CLUSTERED index. If we take our example below, for RID Lookup don’t have Clustered index to point to the clustered index page.In this case SQL engine will have Row ID for each record in the data page and same is mapped in the index page too. So using Row ID, table is being scanned to get the required columns/data from the data page. This is more expensive than the key Lookup. InCase key lookup is almost nearly equal to RID lookup.

CASE 2(Key Lookup/Bookmark Lookup): Key Lookup happens when your table has a CLUSTERED INDEX and NONCLUSTERED INDEX without including all the columns in the select. If we take our example we have clustered index on ID column and Non Clustered index on City. FirstName is not included in index page and so when a city is found at the Non clustered index page it finds for FirstName but its is not there in the non clustered index page so using Key Lookup ID(which is a primary key value) it points to the Clustered index page and gets the remaining column it needs.

CASE 3(Remove Lookup’s): To remove either of lookup INCLUDE all the required columns in the NONCLUSTERED index. So that all the required columns reside in the index page itself and the data is retrieved at the faster rate. No need to again look into the data page.

Please refer the below you tube link as well.

https://www.youtube.com/watch?v=puTvvDB2XuQ

T-SQL/MSBI Knowledge Share

T-SQL/MSBI Knowledge Share Videos

Clustered/Non-Clustered Index (Key Lookup/ Book Mark Lookup and RID Lookup)

No comments: