Mysql隨機取樣——ORDER BY RAND()優化


The palest ink is better than best memory——好記性不如爛筆頭。2013補記


一、關鍵詞:

隨機取樣、order by rand()

二、業務場景:

一款新產品上線后,剛開始用戶比較少,不夠活躍。但,出於運營需要,比如社交產品首頁Feed流隨機出老動態,系統修改發布時間顯示~電商產品的商品列表隨機顯示商品,不至於每次用戶看見的商品都一樣——空城計——一種活躍的假象。所以研發就有了隨機取樣的活兒……

三、事例:

使用mysql的都知道,如下這樣是隨機取樣的最直接方便的方式:

SELECT * FROM product_info ORDER BY RAND() LIMIT 10

執行過程如下:
這里寫圖片描述

可是,ORDER BY RAND()超級慢,在200W記錄表中,單次執行需要17s+….如果這樣上線,用戶並發訪問,數據庫肯定吃不消。

查閱MySQL官方手冊得知:在ORDER BY從句里面不能使用RAND()函數,因為這樣會導致數據列被多次掃描。
如下:http://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand

RAND() in a WHERE clause is evaluated for every row (when selecting from one table) or combination of rows (when selecting from a multiple-table join). Thus, for optimizer purposes, RAND() is not a constant value and cannot be used for index optimizations. For more information, see Section 9.2.1.3.5, “Function Call Optimization”.

Use of a column with RAND() values in an ORDER BY or GROUP BY clause may yield unexpected results because for either clause a RAND() expression can be evaluated multiple times for the same row, each time returning a different result.

搜索Google,網上基本上都是使用MAX() /MIN()、 RAND()、FLOOR()/ROUND()函數相互結合達到效果,如下SQL1::

SELECT *
FROM `product_info` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `product_info`)-(SELECT MIN(id) FROM `product_info`))+(SELECT MIN(id) FROM `product_info`)) AS id) AS t2
WHERE t1.id >= t2.id
ORDER BY t1.id LIMIT 10;

或簡單一點如下SQL2:

SELECT * FROM `product_info` p join (SELECT FLOOR(RAND() * (SELECT MAX(id) FROM `product_info`)) AS id ) as p1 
WHERE p.id >= p1.id
ORDER BY p.id LIMIT 10;

速度是挺快,但是結果集不夠隨機,往往數據在一個區間——不離散,所以不符合需求。

把SQL2改造如下SQL3(兩者的區別就在於JOIN和FROM子查詢的不同):

SELECT * FROM `product_info`
WHERE id >= (SELECT FLOOR(RAND() * (SELECT MAX(id) FROM `product_info`)))
ORDER BY id LIMIT 10;

查詢一次,基本耗時在0.09s,執行過程如下:
這里寫圖片描述

把SQL3繼續改成如下,結果較離散一些(執行效率差不多):

SELECT * FROM `product_info`
WHERE id >= (SELECT FLOOR( RAND()*((SELECT MAX(id) FROM product_info)-(SELECT MIN(id) FROM product_info))+(SELECT MIN(id) FROM product_info)))
ORDER BY id LIMIT 10;

這里寫圖片描述


注意!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。



 
粤ICP备14056181号  © 2014-2021 ITdaan.com