是否有可能進一步優化這些SQL查詢?

[英]Is it possible to optimize these SQL queries further?


I have a Rails app (running on a Heroku account) which is grabbing a bunch of statistics for the home page regarding the number of records which match certain criteria. Each count is displayed as a number on the page. My table (listings) consists of about 22,500 records. On production it takes the page about 350ms to load (still below the threshold but not great for a home page).

我有一個Rails應用程序(運行在Heroku帳戶上),它為主頁獲取了一堆數據,這些數據是關於符合某些標准的記錄的數量。每個計數在頁面上顯示為一個數字。我的表(列表)包含大約22500條記錄。在生產過程中,它需要加載大約350ms的頁面(仍然低於閾值,但對於主頁來說不太好)。

Please bear with the number of queries here, I wanted to portray the redundancy of what I'm trying to do. This feels like it could be done much more efficiently. Any ideas?

請忍受這里的查詢數量,我想描述我正在嘗試做的冗余。這讓人感覺可以更有效地完成。什么好主意嗎?

SELECT COUNT(1) FROM listings WHERE (city in ('Syracuse'))
SELECT COUNT(1) FROM listings WHERE (city in ('Syracuse')) AND (created_at >= '2011-01-30 18:28:44.656702')
SELECT COUNT(1) FROM listings WHERE (city in ('Cicero', 'Clay', 'Lysander', 'VanBuren', 'Salina'))
SELECT COUNT(1) FROM listings WHERE (city in ('Cicero', 'Clay', 'Lysander', 'VanBuren', 'Salina')) AND (created_at >= '2011-01-30 18:28:44.811090')
SELECT COUNT(1) FROM listings WHERE (city in ('DeWitt', 'Manlius', 'Pompey'))
SELECT COUNT(1) FROM listings WHERE (city in ('DeWitt', 'Manlius', 'Pompey')) AND (created_at >= '2011-01-30 18:28:44.954442')
SELECT COUNT(1) FROM listings WHERE (city in ('Onondaga', 'Elbridge', 'Geddes', 'Camillus'))
SELECT COUNT(1) FROM listings WHERE (city in ('Onondaga', 'Elbridge', 'Geddes', 'Camillus')) AND (created_at >= '2011-01-30 18:28:45.105438')
SELECT COUNT(1) FROM listings WHERE (city in ('Fabius', 'Lafayette', 'Marcellus', 'Otisco', 'Skaneateles', 'Spafford', 'Tully'))
SELECT COUNT(1) FROM listings WHERE (city in ('Fabius', 'Lafayette', 'Marcellus', 'Otisco', 'Skaneateles', 'Spafford', 'Tully')) AND (created_at >= '2011-01-30 18:28:45.258860')
SELECT COUNT(1) FROM listings WHERE (city in ('West Monroe', 'Hastings', 'Constantia', 'Palermo', 'Mexico', 'Parish', 'Schroeppel'))
SELECT COUNT(1) FROM listings WHERE (city in ('West Monroe', 'Hastings', 'Constantia', 'Palermo', 'Mexico', 'Parish', 'Schroeppel')) AND (created_at >= '2011-01-30 18:28:45.411138') 

One option I considered is using the after_add and after_remove hooks on my Listing model to update a separate table with these statistics. My only concern with this is the maintenance issues involved. However, new listings are only added a few times throughout the day so updating said table shouldn't cause performance issues in itself.

我考慮的一個選項是使用我的清單模型上的after_add和after_remove鈎子用這些統計信息更新一個單獨的表。我唯一關心的是所涉及的維護問題。但是,每天只添加幾次新列表,所以更新表本身不應該導致性能問題。

Thanks!

謝謝!

3 个解决方案

#1


4  

Various approaches, not all database-oriented.

各種方法,而不是所有面向數據庫的方法。

You can combine all the selects into a single query like so:

您可以將所有選擇合並為一個查詢,如下所示:

SELECT COUNT(CASE WHEN city = 'Syracuse' THEN 1 END) as syracuse,
       COUNT(CASE WHEN city = 'Syracuse' AND created_at >= '2011-01-30 18:28:44.656702' THEN 1 END) as syracuse_recent,
       /* etc... */
FROM listings

This will be just one scan over the table to collect all the stats.

這只是對表進行一次掃描,以收集所有的統計數據。

Alternatively/additionally, cache the statistics extracted from the database in memory in your application, or use something like memcached. If there's no need for the statistics to be up-to-the-minute accurate, this offloads the query from the database completely, after the initial population.

或者/另外,將從數據庫中提取的統計信息緩存到應用程序的內存中,或者使用類似memcached的東西。如果不需要統計數據及時准確,那么在初始填充之后,這將完全卸載數據庫中的查詢。

#2


0  

First you should examine which indexes you have on the tables (try adding and removing indexes on individual fields and also composite indexes in both direction).

首先,您應該檢查表上有哪些索引(嘗試在單個字段上添加和刪除索引,以及在兩個方向上復合索引)。

Also make sure to analyze exactly what does the 350ms compose of (with firebug or something like YSlow).

還要確保准確分析350ms由什么組成(使用firebug或類似YSlow的東西)。

Finally if you really have rare updates and you want to maintain a summary table hooks are not the only way - you can also write trigger that will do this job for you.

最后,如果您確實有罕見的更新,並且您希望維護匯總表鈎子並不是惟一的方法——您還可以編寫觸發器來完成這項工作。

#3


0  

Personally, I would add two new tables, one that contains groups of cities and the other a many to many link table between the groups and cities. You would need "city_group_id", "city_group_name", "dt_count_threshold". The second table would be "city_group_id", "city_id". Then you can perform selects against the many to many link table and join the city table with your date/time restriction.

就我個人而言,我將添加兩個新的表,一個包含了城市組,另一個包含了組和城市之間的許多到許多鏈接表。您需要“city_group_id”、“city_group_name”、“dt_count_threshold”。第二個表將是“city_group_id”、“city_id”。然后,您可以對多個鏈接表執行select,並使用您的日期/時間限制加入city表。

-- unrestricted count
selec cg.city_group_name, count(*) as cnt
from dbo.city_group cg
join dbo.city_group_city cgc on cg.city_group_id = cgc.city_group_id
group by city_group_name

-- restricted
selec cg.city_group_name, count(*) as cnt
from dbo.city_group cg
join dbo.city_group_city cgc on cg.city_group_id = cgc.city_group_id
join dbo.city c on c.city_id = cgc.city_id
group by city_group_name
where c.created_at >= cg.dt_count_threshold

Keep in mind, these are untested queries so there might be some minor adjustments needed. And make sure all indexes are setup correctly to avoid table scans.

請記住,這些都是未經測試的查詢,因此可能需要進行一些微小的調整。並確保所有索引都設置正確,以避免表掃描。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2011/01/31/72f944cb72c76bc27bade0b767a14673.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com