最有效的选择最近点的方法

[英]Most efficient way of selecting closest point


I have two (very large) tables of identical structure, holding two types of locations :

我有两个(非常大的)结构相同的桌子,有两种位置:

LocA

  • Id - INT
  • INT Id -
  • X - FLOAT (latitude)
  • X -浮动(纬度)
  • Y - FLOAT (longitude)
  • Y -浮动(经度)

and

LocB

  • Id - INT
  • INT Id -
  • X - FLOAT (latitude)
  • X -浮动(纬度)
  • Y - FLOAT (longitude)
  • Y -浮动(经度)

Each of them hold several million rows. I need to select all locations in LocA and for each location, the closest location in LocB.

每一个都有几百万行。我需要选择LocA中的所有位置以及每个位置,LocB中最近的位置。

What would be the most efficient query to do this?

最有效的查询是什么?

EDIT1 : The distance algorithm would be a dumb one : SQRT(POWER(LocB.X - LocA.X, 2) + POWER(LocB.Y - LocA.Y, 2))

EDIT1:距离算法是一个傻瓜式的:SQRT(POWER(LocB)。X -轨迹。X,2)+(LocB力量。Y -轨迹。Y,2))

EDIT2 : An implementation that I've done but I'm really not sure if it's optimal (I highly doubt it), would be :

EDIT2:我做过的一个实现,但我真的不确定它是否是最佳的(我非常怀疑),它将是:

SELECT  A.Id    AS AId,
(   SELECT TOP 1 B.Id
    FROM    B
    ORDER BY SQRT(POWER(B.X - A.X, 2) + POWER(B.Y - A.Y, 2)) ASC
)               AS BId
FROM    A

EDIT3 : It's common to have "duplicates" in table LocB but I would want any of the matching "closest" to be returned for a location in LocA, not all.

EDIT3:在表LocB中有“重复”是很常见的,但是我希望将任何匹配的“最近”返回到LocA中的一个位置,而不是全部。

5 个解决方案

#1


0  

How about :

如何:

SELECT id as Aid,x,y,m % 100 as bId
FROM (
SELECT A.id,A.x,A.y,MIN(CAST(((A.x-B.x)*(A.x-B.x)+(A.y-B.y)*(A.y-B.y)) AS BIGINT)*100+B.id)     as m 
FROM A
CROSS JOIN B
GROUP BY A.id,A.x,A.y) j;

#2


2  

This is not likely to be very efficient, but at the moment I can't see a better way:

这可能不是很有效,但目前我看不到更好的办法:

SELECT  a.ID, a.X, a.Y, b.ID, b.X, b.Y, b.Distance
FROM    LocA a
        CROSS APPLY
        (   SELECT  TOP 1 WITH TIES
                    b.ID, 
                    b.X, 
                    b.Y, 
                    Distance = SQRT(POWER(b.X - a.X, 2) + POWER(b.Y - a.Y, 2))
            FROM    LocB b 
            ORDER BY Distance
        ) B;

#3


2  

Have you thought to take into consideration geography::Point, STDistance method, and create a spatial index on those points columns?

您是否考虑过考虑地理::点,STDistance方法,并在这些点列上创建一个空间索引?

If your database structure is fixed, you can add a new persisted computed column.

如果您的数据库结构是固定的,您可以添加一个新的持久化计算列。

#4


1  

The SQRT is not going to change the ORDER - it is just overhead

SQRT不会改变顺序,它只是开销

SELECT  A.Id AS AId,
(   SELECT TOP 1 B.Id
    FROM    B
    ORDER BY POWER(B.X - A.X, 2) + POWER(B.Y - A.Y, 2) ASC
)               AS BId
FROM    A

I am thinking there is a way to perform two passes
You know the distance is <= delta X + delta Y
And the maximum error in that approximation is SQRT(2) - 1

我在想有一种方法可以进行两次遍历你知道距离是<= X + Y这个近似的最大误差是√(2)- 1

This does not deal with duplicates or ties

这并不涉及复制或连接

I suspect the extra IO is not going to make up for the reduced number of POWER calculations but it might be worth a try
Only worth a try if you have #temp on SSD

我怀疑额外的IO不会弥补减少的功耗计算,但是如果你在SSD上有#temp,那么值得一试

create #temp1
IDa
IDb
Xa
Ya
Xb 
Yb 
distSum
distAct 

insert into #temp (IDa, IDb, Xa, Ya, Xb ,Yb, distSum)
select a.ID, b.ID, a.x, a.y, b.x, b.y, abs(a.X-b.X) + abs(a.Y-b.Y)
table as a 
join table as b 
on a.ID < b.ID 

delete #temp 
from #temp 
join 
(select IDa, min(distSum) as minDistSum from #temp group by IDa) as aMin 
on #temp.IDa = aMin.IDa 
and #temp.distSum > 1.414*(minDistSum) 

update #temp 
set distAct = POWER(Xa - Xb, 2) + POWER(Ya - Yb, 2)

#5


0  

This is the code :

这是代码:

WITH S AS (
SELECT *
FROM LOCA CROSS APPLY( select locb.id as ID_B, (POWER(LocB.X - LocA.X, 2) + POWER(LocB.Y - LocA.Y, 2)) D  FROM LOCB ) S
)
SELECT DISTINCT ID,X,Y,d,ID_B
FROM S
where d=(select min(d) from s s1 where s1.ID=s.id)

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.itdaan.com/blog/2013/11/25/b5fd5a01782ca5a6bc72b157afdd5d84.html



 
© 2014-2018 ITdaan.com 粤ICP备14056181号