SQL Performance Tuning using Indexes

SQL Performance Tuning using Indexes

This article looks at general guidelines to creating effective indexes using short keys, distinct keys, covering indexes and clustered indexes.

Effective indexes are one of the best ways to improve performance in a database application. Without an index, the SQL Server engine is like a reader trying to find a word in a book by examining each page. By using the index in the back of a book, a reader can complete the task in a much shorter time. In database terms, a table scan happens when there is no index available to help a query. In a table scan SQL Server examines every row in the table to satisfy the query results. Table scans are sometimes unavoidable, but on large tables, scans have a terrific impact on performance.

One of the most important jobs for the database is finding the best index to use when generating an execution plan. Most major databases ship with tools to show you execution plans for a query and help in optimizing and tuning indexes. This article outlines several good rules of thumb to apply when creating and modifying indexes for your database. First, let’s cover the scenarios where indexes help performance, and when indexes can hurt performance.

Useful Index Queries

Just like the reader searching for a word in a book, an index helps when you are looking for a specific record or set of records with a WHERE clause. This includes queries looking for a range of values, queries designed to match a specific value, and queries performing a join on two tables. For example, both of the queries against the Northwind database below will benefit from an index on the UnitPrice column.

DELETE FROM Products WHERE UnitPrice = 1

SELECT * FROM PRODUCTS
WHERE UnitPrice BETWEEN 14 AND 16

Since index entries are stored in sorted order, indexes also help when processing ORDER BY clauses. Without an index the database has to load the records and sort them during execution. An index on UnitPrice will allow the database to process the following query by simply scanning the index and fetching rows as they are referenced. To order the records in descending order, the database can simply scan the index in reverse.

SELECT * FROM Products ORDER BY UnitPrice ASC

Grouping records with a GROUP BY clause will often require sorting, so a UnitPrice index will also help the following query to count the number of products at each price.

SELECT Count(*), UnitPrice FROM Products
GROUP BY UnitPrice

By retrieving the records in sorted order through the UnitPrice index, the database sees matching prices appear in consecutive index entries, and can easily keep a count of products at each price. Indexes are also useful for maintaining unique values in a column, since the database can easily search the index to see if an incoming value already exists. Primary keys are always indexed for this reason.

Index Drawbacks

Indexes are a performance drag when the time comes to modify records. Any time a query modifies the data in a table the indexes on the data must change also. Achieving the right number of indexes will require testing and monitoring of your database to see where the best balance lies. Static systems, where databases are used heavily for reporting, can afford more indexes to support the read only queries. A database with a heavy number of transactions to modify data will need fewer indexes to allow for higher throughput. Indexes also use disk space. The exact size will depends on the number of records in the table as well as the number and size of the columns in the index. Generally this is not a major concern as disk space is easy to trade for better performance.

Building The Best Index

There are a number of guidelines to building the most effective indexes for your application. From the columns you select to the data values inside them, consider the following points when selecting the indexes for your tables.

Short Keys

Having short index is beneficial for two reasons. First, database work is inherently disk intensive. Larger index keys will cause the database to perform more disk reads, which limits throughput. Secondly, since index entries are often involved in comparisons, smaller entries are easier to compare. A single integer column makes the absolute best index key because an integer is small and easy for the database to compare. Character strings, on the other hand, require a character by character comparison and attention to collation settings.

Distinct Keys

The most effective indexes are the indexes with a small percentage of duplicated values. As an analogy, think of a phone book for a town where almost everyone has the last name of Smith. A phone book in this town is not very useful if sorted in order of last name, because you can only discount a small number of records when you are looking for a Smith.

An index with a high percentage of unique values is a selective index. Obviously, a unique index is highly selective since there are no duplicate entries. Many databases will track statistics about each index so they know how selective each index is. The database uses these statistics when generating an execution plan for a query.


Covering Queries

Indexes generally contain only the data values for the columns they index and a pointer back to the row with the rest of the data. This is similar to the index in a book: the index contains only the key word and then a page reference you can turn to for the rest of the information. Generally the database will have to follow pointers from an index back to a row to gather all the information required for a query. However, if the index contains all of he columns needed for a query, the database can save a disk read by not returning to the table for more information.

Take the index on UnitPrice we discussed earlier. The database could use just the index entries to satisfy the following query.

SELECT Count(*), UnitPrice FROM Products
GROUP BY UnitPrice

We call these types of queries covered queries, because all of the columns requested in the output are covered by a single index. For your most crucial queries, you might consider creating a covering index to give the query the best performance possible. Such an index would probably be a composite index (using more than one column), which appears to go against our first guideline of keeping index entries as short as possible. Obviously this is another tradeoff you can only evaluate with performance testing and monitoring.

Clustered Indexes

Many databases have one special index per table where all of the data from a row exists in the index. SQL Server calls this index a clustered index. Instead of an index at the back of a book, a clustered index is closer in similarity to a phone book because each index entry contains all the information you need, there are no references to follow to pick up additional data values.

As a general rule of thumb, every non-trivial table should have a clustered index. If you only create one index for a table, make the index a clustered index. In SQL Server, creating a primary key will automatically create a clustered index (if none exists) using the primary key column as the index key. Clustered indexes are the most effective indexes (when used, they always cover a query), and in many databases systems will help the database efficiently manage the space required to store the table.

When choosing the column or columns for a clustered index, be careful to choose a column with static data. If you modify a record and change the value of a column in a clustered index, the database might need to move the index entry (to keep the entries in sorted order). Remember, index entries for a clustered index contain all of the column values, so moving an entry is comparable to executing a DELETE statement followed by an INSERT, which can obviously cause performance problems if done often. For this reason, clustered indexes are often found on primary or foreign key columns. Key values will rarely, if ever, change.

Conclusion

Determining the correct indexes to use in a database requires careful analysis, benchmarking, and testing. The rules of thumb presented in this article are general guidelines. After applying these principals you need to retest your specific application in your specific environment of hardware, memory, and concurrent activity. See my previous article: SQL Server Indexes, for a more thorough introduction.

Analyze and Fix Index Fragmentation in SQL Server 2008

Analyze and Fix Index Fragmentation in SQL Server 2008

It is very common that over time SQL Server tables and indexes tend to become fragmented. The fragmentation generally happens when data within the underlying tables on which an index exists is modified. The data modification basically can be an insert, update or a delete operation. The indexes over time become ineffective because they get fragmented. In this article you will see an example of how an index gets fragmented and the steps which database administrator needs to take to fix index fragmentations.

Example to Analyze and Fix Index Fragmentation in SQL Server 2008

Follow the below mentioned steps to see how an index fragmentation occurs on a table which has indexes defined on it. And finally you will see the steps which you need to take to fix index fragmentation issues.

Create AnalyzeFragmentation Database

First let us create a new database named AnalyzeFragmentation for this example. Database can be created by executing the below mentioned TSQL Query.

Use master
GO
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'AnalyzeFragmentation')
DROP DATABASE [AnalyzeFragmentation]
GO
CREATE DATABASE AnalyzeFragmentation
GO

Create FindAndFixFragmentation Table in AnalyzeFragmentation Database

The next step will be to create a new table named FindAndFixFragmentation within the AnalyzeFragmentation database.


USE AnalyzeFragmentation
GO
IF OBJECT_ID (N'dbo.FindAndFixFragmentation', N'U') IS NOT NULL
DROP TABLE dbo.FindAndFixFragmentation;
GO

— Create FindAndFixFragmentation Table–
CREATE TABLE [dbo].[FindAndFixFragmentation]
(

[AddressID] [int] NOT NULL,
[AddressLine1] [nvarchar](60) NOT NULL,
[City] [nvarchar](30) NOT NULL,
[PostalCode] [nvarchar](15) NOT NULL,
[ModifiedDate] [datetime] NOT NULL,
[RowGUID] [UNIQUEIDENTIFIER] NOT NULL

)
ON [PRIMARY]

GO

Populate the FindAndFixFragmentation Table using the below TSQL code

The next step will be to populate the FindAndFixFragmentation table which you have created earlier by executing the below mentioned TSQL code. For this example we will be using the data which is available in Person.Address table available in AdventureWorks database.


USE AnalyzeFragmentation
GO

— Populate FindAndFixFragmentation table with data from AdventureWorks.Person.Address —

INSERT INTO FindAndFixFragmentation
SELECT
AddressID,
AddressLine1,
City,
PostalCode,
ModifiedDate,
RowGUID
FROM AdventureWorks.Person.Address
GO

Create a Clustered Index on FindAndFixFragmentation Table using the below TSQL code

The next step will be to create a clustered index named CL_FindAndFixFragmentation_Index on FindAndFixFragmentation table using the below mentioned TSQL code.

-- Drop the index if it is already existing--
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[FindAndFixFragmentation]') AND name = N'CL_FindAndFixFragmentation_Index')

DROP INDEX [CL_FindAndFixFragmentation_Index] ON [dbo].[FindAndFixFragmentation]
GO
— Create Clustered Index on FindAndFixFragmentation(RowGUID) —
CREATE CLUSTERED INDEX [CL_FindAndFixFragmentation_Index] ON [dbo].[FindAndFixFragmentation]
(
[RowGUID] ASC
)
WITH (FILLFACTOR = 90) ON [PRIMARY]
GO

You can see that we are creating a clustered index on FindAndFixFragmentation table with a Fill Factor 90. The fill factor option is basically provided for fine tuning index data storage and to improve performance. Whenever an index is created or it is rebuilt, the fill factor value basically determines the percentage of space on each leaf level page that needs to be filled with data. Based on the fill factor value a percentage of free space is allocated on every single page. By default the fill factor value is 0 or 100 which means there will be no free space allocated on each leaf level page. The value for fill factor is defined in percentages and this can be any value in between 1 to 100. In this example the fill factor value provide is 90 which mean on every single page there will be a 10 percentage of free space left to accommodate future growth.

Query to Find Existing Fragmentation on FindAndFixFragmentation Table

Next step will be to execute the below mentioned TSQL query to know the existing fragmentation on FindAndFixFragmentation table. The important values which need to be noted by the database administrators are AvgPageFragmentation and PageCounts. The value for AvgPageFragmentation is 0.341296928327645, which means there is a very little fragmentation existing on the table at this point of time. However the value for PageCounts is 293, which mean the data is stored in that many data pages on SQL Server. This query will be executing many a times in this article.


-- Find index fragmentation --
SELECT
DB_NAME(DATABASE_ID) AS [DatabaseName],
OBJECT_NAME(OBJECT_ID) AS TableName,
SI.NAME AS IndexName,
INDEX_TYPE_DESC AS IndexType,
AVG_FRAGMENTATION_IN_PERCENT AS AvgPageFragmentation,
PAGE_COUNT AS PageCounts
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N'LIMITED') DPS
INNER JOIN sysindexes SI
ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID
GO

Perform Update Operation on FindAndFixFragmentation Table

Next step will be to perform updates on FindAndFixFragmentation table by executing the below mentioned TSQL code. This query will modify all the data for RowGUID column on which we have created clustered index with fill factor as 90.

— Update all the rows within to FindAndFixFragmentation table create index fragmentation —

USE AnalyzeFragmentation
GO
UPDATE FindAndFixFragmentation
SET RowGUID =NEWID()
GO

Execute the query to find existing fragmentation on FindAndFixFragmentation table as shown in the below snippet.

Now you can see that the value for AvgPageFragmentation has changed from 0.341296928327645 to 99.0049751243781, which means index is completely fragmentation. At the same time the value for PageCounts has changed from 293 to 603, which mean more number of data pages are required to store the content. Now the question which comes to your mind is how this can be fixed.

There are two methods to fix index fragmentation issues in SQL Server 2005 and higher versions. The two methods are Reorganize or Rebuild Index. The Reorganize Index is an online operation, however Rebuild Index is not an online operation until you have specified the option ONLINE=ON while performing the Rebuild. Next step will be to perform first REORGANIZE Index option and then finally perform we will perform the REBUILD and see which options is the best.

Perform Reorgainize Index Operation on Clustered Index of FindAndFixFragmentation Table

First let us perform REORGANIZE Index operation on the clustered index, and then execute the query as shown in the snippet to find the fragmentation on FindAndFixFragmentation table.


-- Reorganize [CL_FindAndFixFragmentation_Index] index on FindAndFixFragmentation --

ALTER INDEX [CL_FindAndFixFragmentation_Index] ON FindAndFixFragmentation

REORGANIZE;

GO

Once we have performed the REORGANIZE Index operation you can see that the value for AvgPageFragmentation has changed from 99.0049751243781 to 5.70469798657718, which means index fragmentation is much better that how it was earlier. And at the same time the value for PageCounts has also come down from 603 to 298, this is considerable improvement.

Perform Rebuild Index Operation on Clustered Index of FindAndFixFragmentation Table

Now let us perform REBUILD Index operation on the clustered index, when you are using the Rebuild index operation it basically drops and recreates the index. The important thing what we need to see is does this results in reducing the index fragmentation further down from 5.70469798657718. Once you have performed the Rebuild operation execute the query as shown in the snippet to check the fragmentation on FindAndFixFragmentation table.


-- Rebuild [CL_FindAndFixFragmentation_Index] index on FindAndFixFragmentation --

ALTER INDEX [CL_FindAndFixFragmentation_Index] ON FindAndFixFragmentation
REBUILD WITH (FILLFACTOR = 90, ONLINE=ON)
GO

You can see that the value for AvgPageFragmentation is back to 0.341296928327645, which means the fragmentation is same as it was when we began this exercise. And at the same time the value for PageCounts is back to 293. This proves that using REBUILD Index operation is better than REORGANIZE Index operation.

Reorganize Index

Reorganize Index uses minimal system resources and it is performed online. The biggest advantage is it does not require locks for long time therefore it does not block updates or other user queries. If the index fragmentation ranges in between 5% to 30% then it is better to perform Reorganize Index.

Rebuild Index

Rebuild Index basically drops and recreates the index; this is by far the best approach. If the index fragmentation is greater than 30% then the best strategy will be to use Rebuild Index instead of Reorganize Index.

Conclusion

Database Administrators should always make sure that fragmentation of indexes is handled on time. If the indexes are fragmented then the query response will not only be very slow; the data storage will also require more disk space. In this article you have seen an example where the clustered index gets fragmented over time and the steps which you need to perform to resolve index fragmentation issues.

SQL Tuning or SQL Optimization

SQL Tuning or SQL Optimization

Sql Statements are used to retrieve data from the database. We can get same results by writing different sql queries. But use of the best query is important when performance is considered. So you need to sql query tuning based on the requirement. Here is the list of queries which we use reqularly and how these sql queries can be optimized for better performance.

SQL Tuning/SQL Optimization Techniques:

1) The sql query becomes faster if you use the actual columns names in SELECT statement instead of than ‘*’.

For Example: Write the query as

SELECT id, first_name, last_name, age, subject FROM student_details;

Instead of:

SELECT * FROM student_details;

2) HAVING clause is used to filter the rows after all the rows are selected. It is just like a filter. Do not use HAVING clause for any other purposes.
For Example: Write the query as

SELECT subject, count(subject)
FROM student_details
WHERE subject != 'Science'
AND subject != 'Maths'
GROUP BY subject;

Instead of:

SELECT subject, count(subject)
FROM student_details
GROUP BY subject
HAVING subject!= 'Vancouver' AND subject!= 'Toronto';

3) Sometimes you may have more than one subqueries in your main query. Try to minimize the number of subquery block in your query.
For Example: Write the query as

SELECT name
FROM employee
WHERE (salary, age ) = (SELECT MAX (salary), MAX (age)
FROM employee_details)
AND dept = 'Electronics';

Instead of:

SELECT name
FROM employee
WHERE salary = (SELECT MAX(salary) FROM employee_details)
AND age = (SELECT MAX(age) FROM employee_details)
AND emp_dept = 'Electronics';

4) Use operator EXISTS, IN and table joins appropriately in your query.
a) Usually IN has the slowest performance.
b) IN is efficient when most of the filter criteria is in the sub-query.
c) EXISTS is efficient when most of the filter criteria is in the main query.

For Example: Write the query as

Select * from product p
where EXISTS (select * from order_items o
where o.product_id = p.product_id)

Instead of:

Select * from product p
where product_id IN
(select product_id from order_items

5) Use EXISTS instead of DISTINCT when using joins which involves tables having one-to-many relationship.
For Example: Write the query as

SELECT d.dept_id, d.dept
FROM dept d
WHERE EXISTS ( SELECT 'X' FROM employee e WHERE e.dept = d.dept);

Instead of:

SELECT DISTINCT d.dept_id, d.dept
FROM dept d,employee e
WHERE e.dept = e.dept;

6) Try to use UNION ALL in place of UNION.
For Example: Write the query as

SELECT id, first_name
FROM student_details_class10
UNION ALL
SELECT id, first_name
FROM sports_team;

Instead of:

SELECT id, first_name, subject
FROM student_details_class10
UNION
SELECT id, first_name
FROM sports_team;

7) Be careful while using conditions in WHERE clause.
For Example: Write the query as

SELECT id, first_name, age FROM student_details WHERE age > 10;

Instead of:

SELECT id, first_name, age FROM student_details WHERE age != 10;

Write the query as

SELECT id, first_name, age
FROM student_details
WHERE first_name LIKE 'Chan%';

Instead of:

SELECT id, first_name, age
FROM student_details
WHERE SUBSTR(first_name,1,3) = 'Cha';

Write the query as

SELECT id, first_name, age
FROM student_details
WHERE first_name LIKE NVL ( :name, '%');

Instead of:

SELECT id, first_name, age
FROM student_details
WHERE first_name = NVL ( :name, first_name);

Write the query as

SELECT product_id, product_name
FROM product
WHERE unit_price BETWEEN MAX(unit_price) and MIN(unit_price)

Instead of:

SELECT product_id, product_name
FROM product
WHERE unit_price >= MAX(unit_price)
and unit_price <= MIN(unit_price)

Write the query as

SELECT id, name, salary
FROM employee
WHERE dept = 'Electronics'
AND location = 'Bangalore';

Instead of:

SELECT id, name, salary
FROM employee
WHERE dept || location= 'ElectronicsBangalore';

Use non-column expression on one side of the query because it will be processed earlier.

Write the query as

SELECT id, name, salary
FROM employee
WHERE salary < 25000;

Instead of:

SELECT id, name, salary
FROM employee
WHERE salary + 10000 < 35000;

Write the query as

SELECT id, first_name, age
FROM student_details
WHERE age > 10;

Instead of:

SELECT id, first_name, age
FROM student_details
WHERE age NOT = 10;

8) Use DECODE to avoid the scanning of same rows or joining the same table repetitively. DECODE can also be made used in place of GROUP BY or ORDER BY clause.
For Example: Write the query as

SELECT id FROM employee
WHERE name LIKE 'Ramesh%'
and location = 'Bangalore';

Instead of:

SELECT DECODE(location,'Bangalore',id,NULL) id FROM employee
WHERE name LIKE 'Ramesh%';

9) To store large binary objects, first place them in the file system and add the file path in the database.

10) To write queries which provide efficient performance follow the general SQL standard rules.

a) Use single case for all SQL verbs
b) Begin all SQL verbs on a new line
c) Separate all words with a single space
d) Right or left aligning verbs within the initial SQL verb