在不杀死服务器的情况下更新用户排名的最佳方法

[英]Best way to update user rankings without killing the server


I have a website that has user ranking as a central part, but the user count has grown to over 50,000 and it is putting a strain on the server to loop through all of those to update the rank every 5 minutes. Is there a better method that can be used to easily update the ranks at least every 5 minutes? It doesn't have to be with php, it could be something that is run like a perl script or something if something like that would be able to do the job better (though I'm not sure why that would be, just leaving my options open here).

我有一个以用户排名为中心的网站,但是用户数已增加到50,000以上,这给服务器带来了压力,需要每隔5分钟更新一次。有没有更好的方法可以用来轻松更新排名至少每5分钟?它不一定是用PHP,它可能是像perl脚本运行的东西或类似的东西能够更好地完成工作(虽然我不知道为什么会这样,只是离开我的选项在这里打开)。

This is what I currently do to update ranks:

这是我目前要更新排名的方法:

$get_users = mysql_query("SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
$i=0;
while ($a = mysql_fetch_array($get_users)) {
    $i++;
    mysql_query("UPDATE users SET month_rank = '$i' WHERE id = '$a[id]'");
}

UPDATE (solution):

Here is the solution code, which takes less than 1/2 of a second to execute and update all 50,000 rows (make rank the primary key as suggested by Tom Haigh).

这是解决方案代码,执行和更新所有50,000行所需的时间不到1/2秒(根据Tom Haigh的建议将等级作为主键)。

mysql_query("TRUNCATE TABLE userRanks");
mysql_query("INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
mysql_query("UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id");

8 个解决方案

#1


Make userRanks.rank an autoincrementing primary key. If you then insert userids into userRanks in descending rank order it will increment the rank column on every row. This should be extremely fast.

使userRanks.rank成为自动增量主键。如果然后以降序排序将userid插入userRanks,它将在每行上增加rank列。这应该非常快。

TRUNCATE TABLE userRanks;
INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC;
UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id;

#2


My first question would be: why are you doing this polling-type operation every five minutes?

我的第一个问题是:你为什么每五分钟进行一次轮询式操作?

Surely rank changes will be in response to some event and you can localize the changes to a few rows in the database at the time when that event occurs. I'm pretty certain the entire user base of 50,000 doesn't change rankings every five minutes.

当然,排名更改将响应某些事件,您可以在事件发生时将更改本地化到数据库中的几行。我非常确定50,000的整个用户群不会每五分钟更改一次排名。

I'm assuming the "status = '1'" indicates that a user's rank has changed so, rather than setting this when the user triggers a rank change, why don't you calculate the rank at that time?

我假设“status ='1'”表示用户的排名发生了变化,而不是在用户触发排名更改时设置此项,为什么不计算当时的排名?

That would seem to be a better solution as the cost of re-ranking would be amortized over all the operations.

这似乎是一个更好的解决方案,因为重新排名的成本将在所有业务中摊销。

Now I may have misunderstood what you meant by ranking in which case feel free to set me straight.

现在我可能误解了你的排名,在哪种情况下随意让我直截了当。

#3


A simple alternative for bulk update might be something like:

批量更新的简单替代方法可能是:

set @rnk = 0;
update users 
set month_rank = (@rnk := @rnk + 1)
order by month_score DESC

This code uses a local variable (@rnk) that is incremented on each update. Because the update is done over the ordered list of rows, the month_rank column will be set to the incremented value for each row.

此代码使用一个局部变量(@rnk),该变量在每次更新时递增。由于更新是在有序的行列表上完成的,因此month_rank列将设置为每行的递增值。

#4


Updating the users table row by row will be a time consuming task. It would be better if you could re-organise your query so that row by row updates are not required.

逐行更新用户表将是一项耗时的任务。如果您可以重新组织查询以便不需要逐行更新,那会更好。

I'm not 100% sure of the syntax (as I've never used MySQL before) but here's a sample of the syntax used in MS SQL Server 2000

我不是100%确定语法(因为我之前从未使用过MySQL),但这里是MS SQL Server 2000中使用的语法示例

DECLARE @tmp TABLE
(
    [MonthRank] [INT] NOT NULL,
    [UserId] [INT] NOT NULL,
)

INSERT INTO @tmp ([UserId])
SELECT [id] 
FROM [users] 
WHERE [status] = '1' 
ORDER BY [month_score] DESC

UPDATE users 
SET month_rank = [tmp].[MonthRank]
FROM @tmp AS [tmp], [users]
WHERE [users].[Id] = [tmp].[UserId]

In MS SQL Server 2005/2008 you would probably use a CTE.

在MS SQL Server 2005/2008中,您可能会使用CTE。

#5


Any time you have a loop of any significant size that executes queries inside, you've got a very likely antipattern. We could look at the schema and processing requirement with more info, and see if we can do the whole job without a loop.

每当你有一个任何重要大小的循环执行内部查询时,你就有了一个非常可能的反模式。我们可以通过更多信息查看模式和处理需求,看看我们是否可以在没有循环的情况下完成整个工作。

How much time does it spend calculating the scores, compared with assigning the rankings?

与分配排名相比,计算分数需要多少时间?

#6


Your problem can be handled in a number of ways. Honestly more details from your server may point you in a totally different direction. But doing it that way you are causing 50,000 little locks on a heavily read table. You might get better performance with a staging table and then some sort of transition. Inserts into a table no one is reading from are probably going to be better.

您的问题可以通过多种方式处理。老实说,来自服务器的更多细节可能会指向一个完全不同的方向。但是这样做会导致重读表上有50,000个小锁。使用临时表然后进行某种转换可能会获得更好的性能。在没有人阅读的表格中插入可能会更好。

Consider

mysql_query("delete from month_rank_staging;");
while(bla){
  mysql_query("insert into month_rank_staging values ('$id', '$i');");
}
mysql_query("update month_rank_staging src, users set users.month_rank=src.month_rank where src.id=users.id;");

That'll cause one (bigger) lock on the table, but might improve your situation. But again, that may be way off base depending on the true source of your performance problem. You should probably look deeper at your logs, mysql config, database connections, etc.

这会导致桌子上的一个(更大)锁定,但可能会改善你的情况。但同样,根据您的性能问题的真正来源,这可能会偏离基础。您应该更深入地了解日志,mysql配置,数据库连接等。

#7


Possibly you could use shards by time or other category. But read this carefully before...

可能您可以按时间或其他类别使用分片。但是在......之前仔细阅读

#8


You can split up the rank processing and the updating execution. So, run through all the data and process the query. Add each update statement to a cache. When the processing is complete, run the updates. You should have the WHERE portion of the UPDATE reference a primary key set to auto_increment, as mentioned in other posts. This will prevent the updates from interfering with the performance of the processing. It will also prevent users later in the processing queue from wrongfully taking advantage of the values from the users who were processed before them (if one user's rank affects that of another). It also prevents the database from clearing out its table caches from the SELECTS your processing code does.

您可以拆分排名处理和更新执行。因此,遍历所有数据并处理查询。将每个更新语句添加到缓存。处理完成后,运行更新。您应该将UPDATE引用的WHERE部分设置为auto_increment的主键,如其他帖子中所述。这将防止更新干扰处理的性能。它还会阻止稍后处理队列中的用户错误地利用在他们之前处理的用户的值(如果一个用户的排名影响另一个用户的排名)。它还可以防止数据库从您的处理代码的SELECTS中清除其表缓存。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2009/06/04/2f620452e767780ea253b7bd852a6bc3.html



 
  © 2014-2022 ITdaan.com 联系我们: