在不殺死服務器的情況下更新用戶排名的最佳方法

[英]Best way to update user rankings without killing the server


I have a website that has user ranking as a central part, but the user count has grown to over 50,000 and it is putting a strain on the server to loop through all of those to update the rank every 5 minutes. Is there a better method that can be used to easily update the ranks at least every 5 minutes? It doesn't have to be with php, it could be something that is run like a perl script or something if something like that would be able to do the job better (though I'm not sure why that would be, just leaving my options open here).

我有一個以用戶排名為中心的網站,但是用戶數已增加到50,000以上,這給服務器帶來了壓力,需要每隔5分鍾更新一次。有沒有更好的方法可以用來輕松更新排名至少每5分鍾?它不一定是用PHP,它可能是像perl腳本運行的東西或類似的東西能夠更好地完成工作(雖然我不知道為什么會這樣,只是離開我的選項在這里打開)。

This is what I currently do to update ranks:

這是我目前要更新排名的方法:

$get_users = mysql_query("SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
$i=0;
while ($a = mysql_fetch_array($get_users)) {
    $i++;
    mysql_query("UPDATE users SET month_rank = '$i' WHERE id = '$a[id]'");
}

UPDATE (solution):

Here is the solution code, which takes less than 1/2 of a second to execute and update all 50,000 rows (make rank the primary key as suggested by Tom Haigh).

這是解決方案代碼,執行和更新所有50,000行所需的時間不到1/2秒(根據Tom Haigh的建議將等級作為主鍵)。

mysql_query("TRUNCATE TABLE userRanks");
mysql_query("INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
mysql_query("UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id");

8 个解决方案

#1


Make userRanks.rank an autoincrementing primary key. If you then insert userids into userRanks in descending rank order it will increment the rank column on every row. This should be extremely fast.

使userRanks.rank成為自動增量主鍵。如果然后以降序排序將userid插入userRanks,它將在每行上增加rank列。這應該非常快。

TRUNCATE TABLE userRanks;
INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC;
UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id;

#2


My first question would be: why are you doing this polling-type operation every five minutes?

我的第一個問題是:你為什么每五分鍾進行一次輪詢式操作?

Surely rank changes will be in response to some event and you can localize the changes to a few rows in the database at the time when that event occurs. I'm pretty certain the entire user base of 50,000 doesn't change rankings every five minutes.

當然,排名更改將響應某些事件,您可以在事件發生時將更改本地化到數據庫中的幾行。我非常確定50,000的整個用戶群不會每五分鍾更改一次排名。

I'm assuming the "status = '1'" indicates that a user's rank has changed so, rather than setting this when the user triggers a rank change, why don't you calculate the rank at that time?

我假設“status ='1'”表示用戶的排名發生了變化,而不是在用戶觸發排名更改時設置此項,為什么不計算當時的排名?

That would seem to be a better solution as the cost of re-ranking would be amortized over all the operations.

這似乎是一個更好的解決方案,因為重新排名的成本將在所有業務中攤銷。

Now I may have misunderstood what you meant by ranking in which case feel free to set me straight.

現在我可能誤解了你的排名,在哪種情況下隨意讓我直截了當。

#3


A simple alternative for bulk update might be something like:

批量更新的簡單替代方法可能是:

set @rnk = 0;
update users 
set month_rank = (@rnk := @rnk + 1)
order by month_score DESC

This code uses a local variable (@rnk) that is incremented on each update. Because the update is done over the ordered list of rows, the month_rank column will be set to the incremented value for each row.

此代碼使用一個局部變量(@rnk),該變量在每次更新時遞增。由於更新是在有序的行列表上完成的,因此month_rank列將設置為每行的遞增值。

#4


Updating the users table row by row will be a time consuming task. It would be better if you could re-organise your query so that row by row updates are not required.

逐行更新用戶表將是一項耗時的任務。如果您可以重新組織查詢以便不需要逐行更新,那會更好。

I'm not 100% sure of the syntax (as I've never used MySQL before) but here's a sample of the syntax used in MS SQL Server 2000

我不是100%確定語法(因為我之前從未使用過MySQL),但這里是MS SQL Server 2000中使用的語法示例

DECLARE @tmp TABLE
(
    [MonthRank] [INT] NOT NULL,
    [UserId] [INT] NOT NULL,
)

INSERT INTO @tmp ([UserId])
SELECT [id] 
FROM [users] 
WHERE [status] = '1' 
ORDER BY [month_score] DESC

UPDATE users 
SET month_rank = [tmp].[MonthRank]
FROM @tmp AS [tmp], [users]
WHERE [users].[Id] = [tmp].[UserId]

In MS SQL Server 2005/2008 you would probably use a CTE.

在MS SQL Server 2005/2008中,您可能會使用CTE。

#5


Any time you have a loop of any significant size that executes queries inside, you've got a very likely antipattern. We could look at the schema and processing requirement with more info, and see if we can do the whole job without a loop.

每當你有一個任何重要大小的循環執行內部查詢時,你就有了一個非常可能的反模式。我們可以通過更多信息查看模式和處理需求,看看我們是否可以在沒有循環的情況下完成整個工作。

How much time does it spend calculating the scores, compared with assigning the rankings?

與分配排名相比,計算分數需要多少時間?

#6


Your problem can be handled in a number of ways. Honestly more details from your server may point you in a totally different direction. But doing it that way you are causing 50,000 little locks on a heavily read table. You might get better performance with a staging table and then some sort of transition. Inserts into a table no one is reading from are probably going to be better.

您的問題可以通過多種方式處理。老實說,來自服務器的更多細節可能會指向一個完全不同的方向。但是這樣做會導致重讀表上有50,000個小鎖。使用臨時表然后進行某種轉換可能會獲得更好的性能。在沒有人閱讀的表格中插入可能會更好。

Consider

mysql_query("delete from month_rank_staging;");
while(bla){
  mysql_query("insert into month_rank_staging values ('$id', '$i');");
}
mysql_query("update month_rank_staging src, users set users.month_rank=src.month_rank where src.id=users.id;");

That'll cause one (bigger) lock on the table, but might improve your situation. But again, that may be way off base depending on the true source of your performance problem. You should probably look deeper at your logs, mysql config, database connections, etc.

這會導致桌子上的一個(更大)鎖定,但可能會改善你的情況。但同樣,根據您的性能問題的真正來源,這可能會偏離基礎。您應該更深入地了解日志,mysql配置,數據庫連接等。

#7


Possibly you could use shards by time or other category. But read this carefully before...

可能您可以按時間或其他類別使用分片。但是在......之前仔細閱讀

#8


You can split up the rank processing and the updating execution. So, run through all the data and process the query. Add each update statement to a cache. When the processing is complete, run the updates. You should have the WHERE portion of the UPDATE reference a primary key set to auto_increment, as mentioned in other posts. This will prevent the updates from interfering with the performance of the processing. It will also prevent users later in the processing queue from wrongfully taking advantage of the values from the users who were processed before them (if one user's rank affects that of another). It also prevents the database from clearing out its table caches from the SELECTS your processing code does.

您可以拆分排名處理和更新執行。因此,遍歷所有數據並處理查詢。將每個更新語句添加到緩存。處理完成后,運行更新。您應該將UPDATE引用的WHERE部分設置為auto_increment的主鍵,如其他帖子中所述。這將防止更新干擾處理的性能。它還會阻止稍后處理隊列中的用戶錯誤地利用在他們之前處理的用戶的值(如果一個用戶的排名影響另一個用戶的排名)。它還可以防止數據庫從您的處理代碼的SELECTS中清除其表緩存。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2009/06/04/2f620452e767780ea253b7bd852a6bc3.html



 
  © 2014-2022 ITdaan.com 联系我们: