SQL Server 2005递归查询在数据中有循环,这是可能的吗?

[英]SQL Server 2005 recursive query with loops in data - is it possible?


I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.

我有一个标准的老板/下属员工表。我需要选择一个老板(由ID指定)和他的所有下属(以及他们的下属等)。不幸的是,现实世界的数据中有一些循环(例如,两个公司的所有者都将对方设置为他们的老板)。使用CTE阻塞的简单递归查询(最大递归级别超过100)。还可以选择员工吗?我不关心它们被选择的顺序,只关心它们每一个被选择一次。


Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:

with
UserTbl as -- Selects an employee and his subordinates.
(
    select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = @UserID
    union all
    select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl


Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.

10 个解决方案

#1


2  

I know it has been a while but thought I should share my experience as I tried every single solution and here is a summary of my findings (an maybe this post?):

我知道已经有一段时间了,但我想我应该分享我的经验,因为我尝试了每一个解决方案,这里是我的发现的总结(也许是这个帖子?)

  • Adding a column with the current path did work but had a performance hit so not an option for me.
  • 添加一个包含当前路径的列确实有效,但是对我来说没有一个选项。
  • I could not find a way to do it using CTE.
  • 我找不到使用CTE的方法。
  • I wrote a recursive SQL function which adds employeeIds to a table. To get around the circular referencing, there is a check to make sure no duplicate IDs are added to the table. The performance was average but was not desirable.
  • 我编写了一个递归SQL函数,将employeeIds添加到表中。要绕过循环引用,需要进行检查,以确保表中没有添加重复id。表现一般,但并不理想。

Having done all of that, I came up with the idea of dumping the whole subset of [eligible] employees to code (C#) and filter them there using a recursive method. Then I wrote the filtered list of employees to a datatable and export it to my stored procedure as a temp table. To my disbelief, this proved to be the fastest and most flexible method for both small and relatively large tables (I tried tables of up to 35,000 rows).

完成了所有这些之后,我想到了一个主意,将所有的[合格的]员工子集都写进(c#)中,并使用递归方法对它们进行筛选。然后,我将经过过滤的雇员列表写入一个datatable,并将其作为临时表导出到存储过程。令我难以置信的是,对于小表和相对较大的表,这被证明是最快和最灵活的方法(我尝试了多达35,000行的表)。

#2


1  

this will work for the initial recursive link, but might not work for longer links

这对于初始的递归链接是有效的,但是对于较长的链接可能无效

DECLARE @Table TABLE(
        ID INT,
        PARENTID INT
)

INSERT INTO @Table (ID,PARENTID) SELECT 1, 2

INSERT INTO @Table (ID,PARENTID) SELECT 2, 1

INSERT INTO @Table (ID,PARENTID) SELECT 3, 1

INSERT INTO @Table (ID,PARENTID) SELECT 4, 3

INSERT INTO @Table (ID,PARENTID) SELECT 5, 2


SELECT * FROM @Table

DECLARE @ID INT

SELECT @ID = 1

;WITH boss (ID,PARENTID) AS (
    SELECT  ID,
            PARENTID
    FROM    @Table
    WHERE   PARENTID = @ID
),
 bossChild (ID,PARENTID) AS (
    SELECT  ID,
            PARENTID
    FROM    boss
    UNION ALL
    SELECT  t.ID,
            t.PARENTID
    FROM    @Table t INNER JOIN
            bossChild b ON t.PARENTID = b.ID
    WHERE   t.ID NOT IN (SELECT PARENTID FROM boss)
)
SELECT  *
FROM    bossChild
OPTION (MAXRECURSION 0)

what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.

我需要做的是使用一个while循环,并且只在id不存在的情况下将链接插入到临时表中,这样就可以消除无休止的循环。

#3


1  

Not a generic solution, but might work for your case: in your select query modify this:

不是通用的解决方案,但可能适用于您的情况:在您的select查询中修改如下:

select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])

to become:

成为:

select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID]) 
   and a.[User_ID] <> @UserID

#4


1  

You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.

你不需要递归地做。它可以在一段时间内完成。我保证它会更快:我每次做这两种技术的计时都是这样。这听起来效率很低,但不是因为循环的数量是递归级别。在每次迭代中,您都可以检查循环并纠正它发生的位置。您还可以在临时表上设置一个约束,以便在发生循环时触发错误,尽管您似乎更喜欢处理更优雅的循环。当while循环遍历一定数量的级别时,您还可以触发一个错误(以捕获未检测到的循环?-哦,天哪,这种事有时会发生。

The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when @@rowcount=0! Simple eh?

诀窍是反复插入一个临时表(其中包含根条目),包括一个具有当前迭代号的列,并在临时表中最近的结果和原始表中的子条目之间进行内部连接。@@rowcount=0时跳出循环!简单是吗?

#5


1  

I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.

我知道你刚才问过这个问题,但是这里有一个解决方案可以用来检测无限递归循环。我生成一条路径,并在CTE条件中检查用户ID是否在路径中,如果在路径中,它不会再处理它。希望这个有帮助。

Jose

DECLARE @Table TABLE(
    USER_ID INT,
    MANAGER_ID INT )
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 1, 2
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 2, 1
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 3, 1
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 4, 3
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 5, 2

DECLARE @UserID INT
SELECT @UserID = 1

;with
UserTbl as -- Selects an employee and his subordinates.
(
    select 
        '/'+cast( a.USER_ID as varchar(max)) as [path],
        a.[User_ID], 
        a.[Manager_ID] 
    from @Table a 
    where [User_ID] = @UserID
    union all
    select
        b.[path] +'/'+ cast( a.USER_ID as varchar(max)) as [path],
        a.[User_ID], 
        a.[Manager_ID] 
    from @Table a 
    inner join UserTbl b 
        on (a.[Manager_ID]=b.[User_ID])
    where charindex('/'+cast( a.USER_ID as varchar(max))+'/',[path]) = 0
)
select * from UserTbl

#6


0  

basicaly if you have loops like this in data you'll have to do the retreival logic by yourself. you could use one cte to get only subordinates and other to get bosses.

基本上,如果你在数据中有这样的循环你必须自己做retreival逻辑。你可以用一个cte只招下属,用另一个来招老板。

another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.

另一种想法是让两个公司的老板都有一个虚拟的争吵,这样他们就不会成为彼此的老板,这是荒谬的。这是我的prefferd选项。

#7


0  

I can think of two approaches.

我可以想到两种方法。

1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.

1)生成比您希望的多的行,但是要包含一个检查,以确保它不会递归得太深。然后删除重复的用户记录。

2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.

2)使用一个字符串来保存已经访问过的用户。就像not in subquery idea不工作一样。

Approach 1:

方法1:

; with TooMuchHierarchy as (
    select "User_ID"
        , Manager_ID 
        , 0 as Depth
    from "User" 
    WHERE "User_ID" = @UserID
    union all
    select U."User_ID"
        , U.Manager_ID
        , M.Depth + 1 as Depth
    from TooMuchHierarchy M
    inner join "User" U 
        on U.Manager_ID = M."user_id"
    where Depth < 100) -- Warning MAGIC NUMBER!!
, AddMaxDepth as (
    select "User_ID"
        , Manager_id
        , Depth
        , max(depth) over (partition by "User_ID") as MaxDepth
    from TooMuchHierarchy)
select "user_id", Manager_Id 
from AddMaxDepth
where Depth = MaxDepth

The line where Depth < 100 is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then add option (maxrecursion ... number ...) to whole thing to allow more recursion.

深度< 100的那一行是阻止你得到最大递归误差的原因。使这个数字更小,将产生更少的需要丢弃的记录。把它设置得太小,员工就不会被返回,所以要确保它至少和存储的组织结构图的深度一样大。随着公司的成长,这是一场噩梦。如果它需要更大,那么添加选项(maxrecursion…数字…)以允许更多的递归。

Approach 2:

方法2:

; with Hierarchy as (
    select "User_ID"
        , Manager_ID 
        , '#' + cast("user_id" as varchar(max)) + '#' as user_id_list
    from "User" 
    WHERE "User_ID" = @UserID
    union all
    select U."User_ID"
        , U.Manager_ID
        , M.user_id_list + '#' + cast(U."user_id" as varchar(max)) + '#' as user_id_list
    from Hierarchy M
    inner join "User" U 
        on U.Manager_ID = M."user_id"
    where user_id_list not like '%#' + cast(U."User_id" as varchar(max)) + '#%')
select "user_id", Manager_Id 
from Hierarchy

#8


0  

The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.

首选的解决方案是清理数据,并确保将来没有任何循环——这可以通过在检查约束中封装的触发器或UDF来实现。

However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One

但是,正如我在这里演示的那样,您可以使用多语句UDF:避免无限循环。第一部分

You can add a NOT IN() clause in the join to filter out the cycles.

您可以在联接中添加NOT IN()子句,以过滤出循环。

#9


0  

This is the code I used on a project to chase up and down hierarchical relationship trees.

这是我在一个项目中用来跟踪和跟踪层次关系树的代码。

User defined function to capture subordinates:

用户定义的功能捕捉下属:

CREATE FUNCTION fn_UserSubordinates(@User_ID INT)
RETURNS @SubordinateUsers TABLE (User_ID INT, Distance INT) AS BEGIN
    IF @User_ID IS NULL
        RETURN

    INSERT INTO @SubordinateUsers (User_ID, Distance) VALUES ( @User_ID, 0)

    DECLARE @Distance INT, @Finished BIT
    SELECT @Distance = 1, @Finished = 0

    WHILE @Finished = 0
    BEGIN
        INSERT INTO @SubordinateUsers
            SELECT S.User_ID, @Distance
                FROM Users AS S
                JOIN @SubordinateUsers AS C
                    ON C.User_ID = S.Manager_ID
                LEFT JOIN @SubordinateUsers AS C2
                    ON C2.User_ID = S.User_ID
                WHERE C2.User_ID IS NULL
        IF @@RowCount = 0
            SET @Finished = 1

        SET @Distance = @Distance + 1
    END

    RETURN
END

User defined function to capture managers:

用户定义的功能来捕获管理器:

CREATE FUNCTION fn_UserManagers(@User_ID INT)
RETURNS @User TABLE (User_ID INT, Distance INT) AS BEGIN
    IF @User_ID IS NULL
        RETURN

    DECLARE @Manager_ID INT

    SELECT @Manager_ID = Manager_ID
    FROM UserClasses WITH (NOLOCK)
    WHERE User_ID = @User_ID

    INSERT INTO @UserClasses (User_ID, Distance)
        SELECT User_ID, Distance + 1
        FROM dbo.fn_UserManagers(@Manager_ID)

    INSERT INTO @User (User_ID, Distance) VALUES (@User_ID, 0)

    RETURN
END

#10


0  

You need a some method to prevent your recursive query from adding User ID's already in the set. However, as sub-queries and double mentions of the recursive table are not allowed (thank you van) you need another solution to remove the users already in the list.

您需要一些方法来防止递归查询添加用户ID,但是,由于不允许子查询和递归表的双重提及(谢谢van),您需要另一个解决方案来删除列表中已有的用户。

The solution is to use EXCEPT to remove these rows. This should work according to the manual. Multiple recursive statements linked with union-type operators are allowed. Removing the users already in the list means that after a certain number of iterations the recursive result set returns empty and the recursion stops.

解决方案是使用,除了删除这些行。这应该根据手册工作。允许与union类型的操作符链接的多个递归语句。删除列表中已经存在的用户意味着经过一定数量的迭代后,递归结果集返回空,递归停止。

with UserTbl as -- Selects an employee and his subordinates.
(
    select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = @UserID
    union all
    (
      select a.[User_ID], a.[Manager_ID] 
        from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
        where a.[User_ID] not in (select [User_ID] from UserTbl)
      EXCEPT
        select a.[User_ID], a.[Manager_ID] from UserTbl a 
     )
)
select * from UserTbl;

The other option is to hardcode a level variable that will stop the query after a fixed number of iterations or use the MAXRECURSION query option hint, but I guess that is not what you want.

另一个选项是硬编码一个级别变量,该变量将在一定数量的迭代之后停止查询,或者使用MAXRECURSION查询选项提示,但我认为这不是您想要的。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2009/07/28/a092c2160db71246f6f3d3498033c05f.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com