在SQL Server 2005中分析非常大的結果集的有效方法是什么?

[英]What is an efficient method of paging through very large result sets in SQL Server 2005?


EDIT: I'm still waiting for more answers. Thanks!

編輯:我還在等待更多答案。謝謝!

In SQL 2000 days, I used to use temp table method where you create a temp table with new identity column and primary key then select where identity column between A and B.

在SQL 2000天,我曾經使用臨時表方法,您在其中創建具有新標識列和主鍵的臨時表,然后選擇A和B之間的標識列。

When SQL 2005 came along I found out about Row_Number() and I've been using it ever since...

當SQL 2005出現時,我發現了Row_Number(),從那以后我一直在使用它...

But now, I found a serious performance issue with Row_Number(). It performs very well when you are working with not-so-gigantic result sets and sorting over an identity column. However, it performs very poorly when you are working with large result sets like over 10,000 records and sorting it over non-identity column. Row_Number() performs poorly even if you sort by an identity column if the result set is over 250,000 records. For me, it came to a point where it throws an error, "command timeout!"

但現在,我發現Row_Number()存在嚴重的性能問題。當您使用不那么巨大的結果集並對標識列進行排序時,它的性能非常好。但是,當您處理超過10,000條記錄的大型結果集並將其排序到非標識列時,它的性能非常差。如果結果集超過250,000條記錄,則即使按標識列排序,Row_Number()也表現不佳。對我來說,它突然出現錯誤,“命令超時!”

What do you use to do paginate a large result set on SQL 2005? Is temp table method still better in this case? I'm not sure if this method using temp table with SET ROWCOUNT will perform better... But some say there is an issue of giving wrong row number if you have multi-column primary key.

您在SQL 2005上使用什么分頁大型結果集?在這種情況下,臨時表方法還是更好嗎?我不確定使用帶有SET ROWCOUNT的臨時表的這種方法是否會表現得更好......但是有人說如果你有多列主鍵,則會出現錯誤行號的問題。

In my case, I need to be able to sort the result set by a date type column... for my production web app.

就我而言,我需要能夠按日期類型列對結果集進行排序...對於我的生產Web應用程序。

Let me know what you use for high-performing pagination in SQL 2005. And I'd also like to know a smart way of creating indexes. I'm suspecting choosing right primary keys and/or indexes (clustered/non-clustered) will play a big role here.

讓我知道你在SQL 2005中用於高性能分頁的內容。我也想知道一種創建索引的聰明方法。我懷疑選擇正確的主鍵和/或索引(群集/非群集)將在這里發揮重要作用。

Thanks in advance.

提前致謝。

P.S. Does anyone know what stackoverflow uses?

附:有誰知道stackoverflow使用什么?

EDIT: Mine looks something like...

編輯:我看起來像......

SELECT postID, postTitle, postDate
FROM
   (SELECT postID, postTitle, postDate, 
         ROW_NUMBER() OVER(ORDER BY postDate DESC, postID DESC) as RowNum
    FROM MyTable
   ) as DerivedMyTable
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1

postID: Int, Identity (auto-increment), Primary key

postID:Int,Identity(自動增量),主鍵

postDate: DateTime

EDIT: Is everyone using Row_Number()?

編輯:每個人都使用Row_Number()?

2 个解决方案

#1


0  

Well, for your sample query ROW_COUNT should be pretty fast with thousands of rows, provided you have an index on your PostDate field. If you don't, the server needs to perform a complete clustered index scan on your PK, practically load every page, fetch your PostDate field, sort by it, determine the rows to extract for the result set and again fetch those rows. It's kind of creating a temp index over and over again (you might see an table/index spool in the plain).

好吧,對於您的示例查詢,如果您在PostDate字段上有索引,則ROW_COUNT應該非常快,有數千行。如果不這樣做,服務器需要在PK上執行完整的聚簇索引掃描,實際上加載每個頁面,獲取PostDate字段,按其排序,確定要為結果集提取的行,然后再次獲取這些行。它是一種一遍又一遍地創建臨時索引(你可能會在普通中看到一個表/索引假脫機)。

No wonder you get timeouts.

難怪你得到超時。

My suggestion: set an index on PostDate DESC, this is what ROW_NUMBER will go over - (ORDER BY PostDate DESC, ...)

我的建議:在PostDate DESC上設置一個索引,這是ROW_NUMBER將要經過的 - (ORDER BY PostDate DESC,...)

As for the article you are referring to - I've done pretty much paging and stuff with SQL Server 2000 in the past without ROW_COUNT and the approach used in the article is the most efficient one. It does not work in all circumstances (you need unique or almost unique values). An overview of some other methods is here.

至於你所指的那篇文章 - 我過去做了很多分頁和SQL Server 2000的東西而沒有ROW_COUNT,文章中使用的方法是最有效的方法。它並不適用於所有情況(您需要唯一或幾乎唯一的值)。這里有一些其他方法的概述。

.

#2


7  

The row_number() technique should be quick. I have seen good results for 100,000 rows.

row_number()技術應該很快。我已經看到100,000行的好結果。

Are you using row_number() similiar to the following:

您是否正在使用與以下類似的row_number():

SELECT column_list
FROM
   (SELECT column_list
         ROW_NUMBER() OVER(ORDER BY OrderByColumnName) as RowNum
    FROM MyTable m
   ) as DerivedTableName
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1

...and do you have a covering index for the column_list and/or an index on the 'OrderByColumnName' column?

...你有一個column_list的覆蓋索引和/或'OrderByColumnName'列的索引嗎?


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2008/10/20/9d3e4efba1eab8032901dc50440fce75.html



 
  © 2014-2022 ITdaan.com