Friday, December 2, 2011

Efficiently Paging Through Large Result Sets in SQL Server 2000


Introduction
When displaying large amounts of data to a user, it's important that the information be presented in digestible chunks rather than slamming it all down their throats at once. For example, searching for "Microsoft" on Google returns approximately 2,190,000,000 results, but thankfully Google only shows me ten of those at a time. When providing paging access to particularly large result sets - tables with tens of thousands, hundreds of thousands, or millions of records - care must be taken in querying the data such that only the particular page of data requested is returned.
Both the DataGrid in ASP.NET 1.x and GridView in ASP.NET 2.0 providing two paging flavors:
  • Default paging - easy to implement but naively retrieves all records from the database and then trims the results to show only those for the requested page
  • Custom paging - requires the developer to correctly retrieve only the precise subset of records to show for the current page; requires more effort to implement
While default paging is quick and easy to implement, for sufficiently-sized result sets custom paging is a must. In March 2006 I authored a couple of articles here on 4Guys on how to efficiently page through large result sets using custom paging in the GridView with the ObjectDataSource and SQL Server 2005's new ROW_NUMBER() keyword. This new keyword simplifies efficiently retrieving a particular subset of data ordered by row number. After authoring these articles, I received many requests from readers for a look at how to implement this type of efficient custom paging using SQL Server 2000 (which lacks the ROW_NUMBER() keyword).
A previous article on 4Guys, Paging through Records using a Stored Procedure by Daniel Anderson, provides one approach that far outperforms the default paging implementation. However, it has a few areas that can be updated to improve the performance. This article looks at updating Daniel's stored procedure to provide an even-more efficient approach. The stored procedure presented at the end of this article can be used for classic ASP applications, custom paging with the DataGrid in ASP.NET 1.x, or used by the ObjectDataSource to provide custom paging for the GridView in ASP.NET 2.0 applications. Read on to learn more!


Improving the Stored Procedure's Performance
Daniel's approach is costly in part because it dumps all of the columns from the table to be paged through into the temp table. This requires a complete table scan on the data to be paged through. Instead, we can create the table variable to hold only the primary key column(s) of the table to be paged through. Since these records likely exist in a clustered index, populating the table variable likely be done entirely from memory, without needing to hit the disk.
The remaining columns in the table whose data is being paged through can be accessed and returned in the stored procedure's final query, which grabs the particular subset of records from the table variable. After modernizing and optimizing Daniel's original stored procedure we end up with:

CREATE PROCEDURE PagedResults_New
(
   @startRowIndex      int,
   @maximumRows      int
)
AS


--Create a table variable
DECLARE @TempItems TABLE
(
   ID int IDENTITY,
   EmployeeID int
)

-- Insert the rows from tblItems into the temp. table
INSERT INTO @TempItems (EmployeeID)
SELECT EmployeeID
FROM Employees

-- Now, return the set of paged records
SELECT e.*, d.[Name] as DepartmentName
FROM @TempItems t
   INNER JOIN Employees e ON
      e.EmployeeID = t.EmployeeID
   INNER JOIN Departments d ON
      d.DepartmentID = e.DepartmentID
WHERE ID BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1
GO



Comparing the Performance of the Old and New Approach
To ascertain the improvements in performance by restricting the table variable to only including the primary key column(s) from the table to be paged through I used SQL Profiler and compared the different queries on the Employees table with different record counts. Keep in mind that these results are highly unscientific. I ran each of them numerous times to "warm them up" and then recorded nine test runs and computed the average. But still, I was doing this on my personal machine while running a variety of other programs. Additionally, I did not simulate any load that multiple, concurrent visitors to a website would incur. The results follow and are also avaiable at the end of this article as an Excel spreadsheet. In the New stored procedure results, the percentage improvement over the old approach is included for reference:

 Records
in
Employees
Avg. ReadsAvg. Duration
(in milliseconds)
Old50,00055,360425
 100,000110,492850
 200,000220,7261,567
New50,00051,637
6.7% improvement
307
27.7% improvement
 100,000103,028
6.7% improvement
536
36.9% improvement
 200,000205,876
6.7% improvement
1,033
34.1% improvement

During my testing I wondered if specifying the table variable's ID column as a primary key would boost performance. Marking the IDcolumn as a primary key would create a clustered index, meaning that the results would be sorted by the ID column and the index could be scanned when finding correct subset of records in the final query. However, the index would incur additional overhead when dumping the contents of the table to be paged through into the table variable. The spreadsheet at the end of this article shows the numbers, but in my tests marking ID column as a primary key exhibited worse performance than not, although it was still better than the "old" approach of dumping the entire table contents into the temp table (the "old" approach).
While this approach is, perhaps, acceptable for paging through large result sets in SQL Server 2000, as illustrated in Custom Paging in ASP.NET 2.0 with SQL Server 2005, SQL Server 2005's ROW_NUMBER() technique affords much better performance, taking on average a mere 3 milliseconds on average to return 10 records from a total table of 50,000. Compare that to the approach examined in this article, which consumed 307 milliseconds - two orders of magnitude worse!
Before you swear off the approach discussed in this article, keep in mind that bringing back all the records takes considerably longer than dumping them into a table variable. From the Custom Paging in ASP.NET 2.0 with SQL Server 2005 article, paging through 50,000 records using the GridView's default paging, which retrieves all records from the Employees table, took on average 1,411 milliseconds. Also, since the ROW_NUMBER() option is a feature new to SQL Server 2005, if you're still using SQL Server 2000 you're stuck with the table variable technique.
Conclusion
In this article we looked at how to retrieve a specific page of data from a table using a stored procedure. This technique is useful for providing efficient, custom paging through large result sets with the ASP.NET 1.x DataGrid and ASP.NET 2.0 GridView. This technique can also be used in classic ASP applications or any other data-driven application that needs to work with paged data. If you are using SQL Server 2005, you should instead use the ROW_NUMBER() keyword as discussed in Custom Paging in ASP.NET 2.0 with SQL Server 2005.

http://www.4guysfromrolla.com/webtech/041206-1.shtml


No comments:

Post a Comment