What is an Index on SQL Server?

Hello and welcome back to another post.

Have you ever used an app or a report that takes years to return the data you want to see? I have seen this many times over different platforms. This happens because the queries or the data model behind the app, report or whatever method you use to fetch data is not optimized, and it could be because of several things.

In this post, I’m going to talk about one of the many ways that we have to optimize our queries: using indexes.

SO, WHAT IS AN INDEX?

What is an index? Picture the index of a book, if you are trying to find information in a book but you are in a hurry and you need it fast, you go to the index and since it has an order, you go directly to the letter your items starts with and it will tell you in which page is located, instead of going page by page looking for the information you need.

The same happens with an index in a SQL Server table. An index is a physical data structure that enables queries to get to the data fast. SQL Server engine uses indexes to find the physical location of the row you’re looking for.

In SQL Server, we have a component called the Query Optimizer, which is the component in charge of deciding to use an existing index or not. This generally does not happen, though, as SQL Server will try to use the best index for your query (unless you don’t have any). If you are a book reader, you can decide to use the book’s index or not. Another difference is that in a book, the page you’re looking for, will never change. However, the pages where the data is in SQL Server, might change because of different factors.

TYPES OF INDEXES AND TERMINOLOGY

There are several type of indexes, I’m not going to cover all of them or go deep to the internals, I just want you to know what they are, what are they used for and know of its existence so you can use them and make your queries run faster. What I do want you to know is that, the data structure behind the index is called a B-Tree. Which is a balanced tree where we have a root node, intermediate and leaf nodes, like this one below:

HEAP – Basically, a heap is a table without a clustered index. A data structure that stores data with no order at all.

CLUSTERED INDEX – A clustered index is just a copy of the table physically ordered, this is the reason why we can only have one clustered index in our tables. In this type of index, the leaf pages have the data pages where the information actually is.

NON CLUSTERED INDEX – A non clustered index uses the same b-tree as the clustered, however, with this type of index the data does not live in the leaf nodes of the index, but pointers to the pages where the data is. This index does not changes the physical order of the table if a non clustered index is defined. Starting with SQL Sever 2008, we can have up to 999 non clustered indexes in our tables.

But, since they come with a cost, we should not go crazy and start creating indexes everywhere, these will increase the size of the table and you could potentially run out of space, so be careful.

Another thing, we also have the PRIMARY KEY and UNIQUE KEY. When you are designing a table and you add a PRIMARY KEY, by default a CLUSTERED INDEX will be created, this is also why we can only have just one primary key in our table. For UNIQUE KEYS, these are created as non clustered indexes in our tables. You can have one or more UNIQUE KEYS although is not common. There are a few more, but I’ll leave those for some other time. Now, we should focus on these.

SO, HOW DO I CREATE AN INDEX?

The basic syntax to create an index is very easy, let me show you. Suppose you have a table called Orders and gets hit a lot from users, and sometimes there’s a lot of things going on the database and your data takes a while to come back. First, I’m going to show you the Execution Plan of the query and see what is doing.

The engine is doing a Full Table Scan, this happens because we don’t have any index in place. If I hover over the Table Scan operator, I am able to see the properties.

It’s like going through the entire book just to find a word!

Look at Number of Rows Read says that it had to go through all the rows in the table to bring back only the six we need for that customer, as stated in Actual Number of Rows for All Executions. Now, since I know the table is being queried by the CustomerID column, I’m going to create an index there and see how it affects our query.

CREATE INDEX IX_Orders_CustomerID -- Name of the index

ON dbo.Orders -- Table where the index will be

(

CustomerID -- Column to index

)

CREATE INDEX IX_Orders_CustomerID -- Name of the index ON dbo.Orders -- Table where the index will be ( CustomerID -- Column to index ) GO

CREATE INDEX IX_Orders_CustomerID	-- Name of the index
ON dbo.Orders				        -- Table where the index will be
(	
    CustomerID			           -- Column to index
)
GO

I always try to follow that naming convention as a best practice. I’m pretty sure everybody has its own, but as long as you keep your names standard, it’s okay. In this case, as you can see I am just creating an index, but it doesn’t say whether if it’s a Clustered, Non-Clustered, Unique or Primary Key. By default, SQL Server will create this index as Non-Unique and Non-Clustered. If you want a Unique or Clustered index, you will have to specify it.

BUT WAIT, DOES IT REALLY HELP MY QUERY?

Yes it does. Once I have my index created, let’s run the same query again and see the Execution Plan.

In the picture above of the Execution Plan, we see that the Table Scan operator disappeared and now we have an Index Seek operator, which is good for our query, it means that our index works. Buuuuuuuut… there’s something curious .

Do you see that? There’s an RID Lookup Operator also there. Hmm, why? Well, that’s because in this table we don’t have a primary key, so the optimizer has to look up for the rows it needs to return. I’m going to talk more about that later in another post.

Last, I would like to show you how can you add a Primary Key once your table is already created, and the execution plan after we run the query and how the index works. Let’s go.

ALTER TABLE dbo.Orders

ADD CONSTRAINT PK_Orders_OrderID PRIMARY KEY

(

OrderID

)

ALTER TABLE dbo.Orders ADD CONSTRAINT PK_Orders_OrderID PRIMARY KEY ( OrderID ) GO

ALTER TABLE dbo.Orders
ADD CONSTRAINT PK_Orders_OrderID PRIMARY KEY
(
    OrderID
)
GO

You need to execute an ALTER TABLE command and ADD CONSTRAINT and you have to tell SQL Server that you are creating a PRIMARY KEY, you also need to define the column name you decided will be the primary key, and remember, it can’t contain NULLs. So if you define a column that has NULLs as your primary key, you’ll get an error message.

Now, if we run the query again:

There you go, we got rid of the RID Lookup operator and we have just our nice and well behaved Index Seek. I could go on and on about indexes, but I wanted to keep this post short and concise

… Anyways, I hope you guys liked the post, also remember that you can find the code for this and other posts in the following link: https://github.com/learningsql/sql-code I f you have any questions or comments, shoot me an email, leave a message or tweet me here. Don’t forget to like and share. See you soon. Cheers! If you want to receive updates and new blog posts to your inbox, subscribe to my LearningSQL with Beno email list down here. [mc4wp_form id=”331″]

Discover more from Ben Rodríguez

Subscribe to get the latest posts sent to your email.