[英]Best way to update user rankings without killing the server

I have a website that has user ranking as a central part, but the user count has grown to over 50,000 and it is putting a strain on the server to loop through all of those to update the rank every 5 minutes. Is there a better method that can be used to easily update the ranks at least every 5 minutes? It doesn't have to be with php, it could be something that is run like a perl script or something if something like that would be able to do the job better (though I'm not sure why that would be, just leaving my options open here).


This is what I currently do to update ranks:


$get_users = mysql_query("SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
while ($a = mysql_fetch_array($get_users)) {
    mysql_query("UPDATE users SET month_rank = '$i' WHERE id = '$a[id]'");

UPDATE (solution):

Here is the solution code, which takes less than 1/2 of a second to execute and update all 50,000 rows (make rank the primary key as suggested by Tom Haigh).

这是解决方案代码,执行和更新所有50,000行所需的时间不到1/2秒(根据Tom Haigh的建议将等级作为主键)。

mysql_query("TRUNCATE TABLE userRanks");
mysql_query("INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
mysql_query("UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id");

8 个解决方案


Make userRanks.rank an autoincrementing primary key. If you then insert userids into userRanks in descending rank order it will increment the rank column on every row. This should be extremely fast.


INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC;
UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id;


My first question would be: why are you doing this polling-type operation every five minutes?


Surely rank changes will be in response to some event and you can localize the changes to a few rows in the database at the time when that event occurs. I'm pretty certain the entire user base of 50,000 doesn't change rankings every five minutes.


I'm assuming the "status = '1'" indicates that a user's rank has changed so, rather than setting this when the user triggers a rank change, why don't you calculate the rank at that time?

我假设“status ='1'”表示用户的排名发生了变化,而不是在用户触发排名更改时设置此项,为什么不计算当时的排名?

That would seem to be a better solution as the cost of re-ranking would be amortized over all the operations.


Now I may have misunderstood what you meant by ranking in which case feel free to set me straight.



A simple alternative for bulk update might be something like:


set @rnk = 0;
update users 
set month_rank = (@rnk := @rnk + 1)
order by month_score DESC

This code uses a local variable (@rnk) that is incremented on each update. Because the update is done over the ordered list of rows, the month_rank column will be set to the incremented value for each row.



Updating the users table row by row will be a time consuming task. It would be better if you could re-organise your query so that row by row updates are not required.


I'm not 100% sure of the syntax (as I've never used MySQL before) but here's a sample of the syntax used in MS SQL Server 2000

我不是100%确定语法(因为我之前从未使用过MySQL),但这里是MS SQL Server 2000中使用的语法示例

    [MonthRank] [INT] NOT NULL,
    [UserId] [INT] NOT NULL,

INSERT INTO @tmp ([UserId])
SELECT [id] 
FROM [users] 
WHERE [status] = '1' 
ORDER BY [month_score] DESC

UPDATE users 
SET month_rank = [tmp].[MonthRank]
FROM @tmp AS [tmp], [users]
WHERE [users].[Id] = [tmp].[UserId]

In MS SQL Server 2005/2008 you would probably use a CTE.

在MS SQL Server 2005/2008中,您可能会使用CTE。


Any time you have a loop of any significant size that executes queries inside, you've got a very likely antipattern. We could look at the schema and processing requirement with more info, and see if we can do the whole job without a loop.


How much time does it spend calculating the scores, compared with assigning the rankings?



Your problem can be handled in a number of ways. Honestly more details from your server may point you in a totally different direction. But doing it that way you are causing 50,000 little locks on a heavily read table. You might get better performance with a staging table and then some sort of transition. Inserts into a table no one is reading from are probably going to be better.



mysql_query("delete from month_rank_staging;");
  mysql_query("insert into month_rank_staging values ('$id', '$i');");
mysql_query("update month_rank_staging src, users set users.month_rank=src.month_rank where src.id=users.id;");

That'll cause one (bigger) lock on the table, but might improve your situation. But again, that may be way off base depending on the true source of your performance problem. You should probably look deeper at your logs, mysql config, database connections, etc.



Possibly you could use shards by time or other category. But read this carefully before...



You can split up the rank processing and the updating execution. So, run through all the data and process the query. Add each update statement to a cache. When the processing is complete, run the updates. You should have the WHERE portion of the UPDATE reference a primary key set to auto_increment, as mentioned in other posts. This will prevent the updates from interfering with the performance of the processing. It will also prevent users later in the processing queue from wrongfully taking advantage of the values from the users who were processed before them (if one user's rank affects that of another). It also prevents the database from clearing out its table caches from the SELECTS your processing code does.




  © 2014-2022 ITdaan.com 联系我们: