在時間戳上創建索引以優化查詢。

[英]Creating an index on a timestamp to optimize query


I have a query of the following form:

我有以下表格的查詢:

SELECT * FROM MyTable WHERE Timestamp > [SomeTime] AND Timestamp < [SomeOtherTime]

I would like to optimize this query, and I am thinking about putting an index on timestamp, but am not sure if this would help. Ideally I would like to make timestamp a clustered index, but MySQL does not support clustered indexes, except for primary keys.

我想優化這個查詢,並且我正在考慮在時間戳上放置一個索引,但是我不確定這是否有用。理想情況下,我希望將時間戳設置為集群索引,但是MySQL不支持集群索引,除了主鍵。

  • MyTable has 4 million+ rows.
  • MyTable有400多萬行。
  • Timestamp is actually of type INT.
  • 時間戳實際上是INT類型。
  • Once a row has been inserted, it is never changed.
  • 一旦插入了一行,它就不會改變。
  • The number of rows with any given Timestamp is on average about 20, but could be as high as 200.
  • 任何給定時間戳的行數平均約為20,但可能高達200。
  • Newly inserted rows have a Timestamp that is greater than most of the existing rows, but could be less than some of the more recent rows.
  • 新插入的行有一個時間戳,該時間戳大於現有的大多數行,但可能小於最近的一些行。

Would an index on Timestamp help me to optimize this query?

時間戳上的索引能幫助我優化這個查詢嗎?

4 个解决方案

#1


38  

No question about it. Without the index, your query has to look at every row in the table. With the index, the query will be pretty much instantaneous as far as locating the right rows goes. The price you'll pay is a slight performance decrease in inserts; but that really will be slight.

這是毫無疑問的。沒有索引,查詢必須查看表中的每一行。對於索引,只要找到正確的行,查詢幾乎是即時的。您將付出的代價是插入的性能略有下降;但這真的是微不足道的。

#2


7  

You should definitely use an index. MySQL has no clue what order those timestamps are in, and in order to find a record for a given timestamp (or timestamp range) it needs to look through every single record. And with 4 million of them, that's quite a bit of time! Indexes are your way of telling MySQL about your data -- "I'm going to look at this field quite often, so keep an list of where I can find the records for each value."

你一定要使用索引。MySQL不知道這些時間戳的順序,為了找到給定時間戳(或時間戳范圍)的記錄,它需要檢查每個記錄。有400萬人,這是相當長的一段時間!索引是告訴MySQL數據的一種方式——“我將經常查看這個字段,所以要保存一個列表,在那里我可以找到每個值的記錄。”

Indexes in general are a good idea for regularly queried fields. The only downside to defining indexes is that they use extra storage space, so unless you're real tight on space, you should try to use them. If they don't apply, MySQL will just ignore them anyway.

一般來說,索引對於定期查詢的字段是一個好主意。定義索引的唯一缺點是它們使用了額外的存儲空間,所以除非您在空間上非常緊張,否則應該嘗試使用它們。如果它們不適用,MySQL還是會忽略它們。

#3


4  

I don't disagree with the importance of indexing to improve select query times, but if you can index on other keys (and form your queries with these indexes), the need to index on timestamp may not be needed.

我不反對索引以改進選擇查詢時間的重要性,但是如果您可以對其他鍵(並通過這些索引形成查詢)進行索引,那么可能不需要對時間戳進行索引。

For example, if you have a table with timestamp, category, and userId, it may be better to create an index on userId instead. In a table with many different users this will reduce considerably the remaining set on which to search the timestamp.

例如,如果您有一個具有時間戳、類別和userId的表,那么最好在userId上創建一個索引。在具有許多不同用戶的表中,這將大大減少搜索時間戳的剩余集。

...and If I'm not mistaken, the advantage of this would be to avoid the overhead of creating the timestamp index on each insertion -- in a table with high insertion rates and highly unique timestamps this could be an important consideration.

…如果我沒弄錯的話,這樣做的好處是可以避免在每次插入時創建時間戳索引的開銷——在具有高插入率和高度惟一時間戳的表中,這可能是一個重要的考慮因素。

I'm struggling with the same problems of indexing based on timestamps and other keys. I still have testing to do so I can put proof behind what I say here. I'll try to postback based on my results.

我正在努力解決基於時間戳和其他鍵的索引問題。我還有測試要做,這樣我就能證明我說的。我將根據我的結果嘗試回發。

A scenario for better explanation:

一個更好解釋的場景:

  1. timestamp 99% unique
  2. 時間戳99%獨特
  3. userId 80% unique
  4. userId 80%獨特
  5. category 25% unique

    類別25%獨特

    • Indexing on timestamp will quickly reduce query results to 1% the table size
    • 對時間戳進行索引將迅速將查詢結果減少到表大小的1%
    • Indexing on userId will quickly reduce query results to 20% the table size
    • userId上的索引將迅速將查詢結果減少到表大小的20%
    • Indexing on category will quickly reduce query results to 75% the table size
    • 分類索引將會快速地將查詢結果減少到表大小的75%。
    • Insertion with indexes on timestamp will have high overhead **
    • 使用時間戳上的索引進行插入將會有很高的開銷**
    • Despite our knowledge that our insertions will respect the fact of have incrementing timestamps, I don't see any discussion of MySQL optimisation based on incremental keys.
    • 盡管我們知道我們的插入將尊重有遞增時間戳的事實,但是我沒有看到任何關於基於遞增鍵的MySQL優化的討論。
    • Insertion with indexes on userId will reasonably high overhead.
    • 在userId上插入索引的開銷將相當高。
    • Insertion with indexes on category will have reasonably low overhead.
    • 在類別上使用索引進行插入將具有相當低的開銷。

** I'm sorry, I don't know the calculated overhead or insertion with indexing.

對不起,我不知道索引的計算開銷或插入。

#4


4  

If your queries are mainly using this timestamp, you could test this design (enlarging the Primary Key with the timestamp as first part):

如果您的查詢主要使用這個時間戳,您可以測試這個設計(以時間戳作為第一部分放大主鍵):

CREATE TABLE perf (
  , ts INT NOT NULL
  , oldPK 
  , ... other columns 
, PRIMARY KEY(ts, oldPK)
, UNIQUE (oldPK)
) ENGINE=InnoDB ;

This will ensure that the queries like the one you posted will be using the clustered (primary) key.

這將確保您發布的查詢將使用群集(主)鍵。

Disadvantage is that your Inserts will be a bit slower. Also, If you have other indices on the table, they will be using a bit more space (as they will include the 4-bytes wider primary key).

缺點是插入會慢一點。另外,如果表上有其他索引,它們將使用更多的空間(因為它們將包含4字節寬的主鍵)。

The biggest advantage of such a clustered index is that queries with big range scans, e.g. queries that have to read large parts of the table or the whole table will find the related rows sequentially and in the wanted order (BY timestamp), which will also be useful if you want to group by day or week or month or year.

的最大優勢聚集索引,查詢與大范圍掃描,如查詢必須讀取表的大部分或全部表會發現相關行和想要的順序(通過時間戳),這也將是有用的,如果你想組每天或每周或每月或每年。

The old PK can still be used to identify rows by keeping a UNIQUE constraint on it.

舊的PK仍然可以通過保留唯一的約束來識別行。


You may also want to have a look at TokuDB, a MySQL (and open source) variant that allows multiple clustered indices.

您可能還想看看TokuDB,一個允許多個聚集索引的MySQL(和開源)變體。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2012/01/31/72098d5eacfc3aa352dead8b23e2ba99.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com