SQL Server 2005遞歸查詢在數據中有循環,這是可能的嗎?

[英]SQL Server 2005 recursive query with loops in data - is it possible?


I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.

我有一個標准的老板/下屬員工表。我需要選擇一個老板(由ID指定)和他的所有下屬(以及他們的下屬等)。不幸的是,現實世界的數據中有一些循環(例如,兩個公司的所有者都將對方設置為他們的老板)。使用CTE阻塞的簡單遞歸查詢(最大遞歸級別超過100)。還可以選擇員工嗎?我不關心它們被選擇的順序,只關心它們每一個被選擇一次。


Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:

with
UserTbl as -- Selects an employee and his subordinates.
(
    select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = @UserID
    union all
    select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl


Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.

10 个解决方案

#1


2  

I know it has been a while but thought I should share my experience as I tried every single solution and here is a summary of my findings (an maybe this post?):

我知道已經有一段時間了,但我想我應該分享我的經驗,因為我嘗試了每一個解決方案,這里是我的發現的總結(也許是這個帖子?)

  • Adding a column with the current path did work but had a performance hit so not an option for me.
  • 添加一個包含當前路徑的列確實有效,但是對我來說沒有一個選項。
  • I could not find a way to do it using CTE.
  • 我找不到使用CTE的方法。
  • I wrote a recursive SQL function which adds employeeIds to a table. To get around the circular referencing, there is a check to make sure no duplicate IDs are added to the table. The performance was average but was not desirable.
  • 我編寫了一個遞歸SQL函數,將employeeIds添加到表中。要繞過循環引用,需要進行檢查,以確保表中沒有添加重復id。表現一般,但並不理想。

Having done all of that, I came up with the idea of dumping the whole subset of [eligible] employees to code (C#) and filter them there using a recursive method. Then I wrote the filtered list of employees to a datatable and export it to my stored procedure as a temp table. To my disbelief, this proved to be the fastest and most flexible method for both small and relatively large tables (I tried tables of up to 35,000 rows).

完成了所有這些之后,我想到了一個主意,將所有的[合格的]員工子集都寫進(c#)中,並使用遞歸方法對它們進行篩選。然后,我將經過過濾的雇員列表寫入一個datatable,並將其作為臨時表導出到存儲過程。令我難以置信的是,對於小表和相對較大的表,這被證明是最快和最靈活的方法(我嘗試了多達35,000行的表)。

#2


1  

this will work for the initial recursive link, but might not work for longer links

這對於初始的遞歸鏈接是有效的,但是對於較長的鏈接可能無效

DECLARE @Table TABLE(
        ID INT,
        PARENTID INT
)

INSERT INTO @Table (ID,PARENTID) SELECT 1, 2

INSERT INTO @Table (ID,PARENTID) SELECT 2, 1

INSERT INTO @Table (ID,PARENTID) SELECT 3, 1

INSERT INTO @Table (ID,PARENTID) SELECT 4, 3

INSERT INTO @Table (ID,PARENTID) SELECT 5, 2


SELECT * FROM @Table

DECLARE @ID INT

SELECT @ID = 1

;WITH boss (ID,PARENTID) AS (
    SELECT  ID,
            PARENTID
    FROM    @Table
    WHERE   PARENTID = @ID
),
 bossChild (ID,PARENTID) AS (
    SELECT  ID,
            PARENTID
    FROM    boss
    UNION ALL
    SELECT  t.ID,
            t.PARENTID
    FROM    @Table t INNER JOIN
            bossChild b ON t.PARENTID = b.ID
    WHERE   t.ID NOT IN (SELECT PARENTID FROM boss)
)
SELECT  *
FROM    bossChild
OPTION (MAXRECURSION 0)

what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.

我需要做的是使用一個while循環,並且只在id不存在的情況下將鏈接插入到臨時表中,這樣就可以消除無休止的循環。

#3


1  

Not a generic solution, but might work for your case: in your select query modify this:

不是通用的解決方案,但可能適用於您的情況:在您的select查詢中修改如下:

select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])

to become:

成為:

select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID]) 
   and a.[User_ID] <> @UserID

#4


1  

You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.

你不需要遞歸地做。它可以在一段時間內完成。我保證它會更快:我每次做這兩種技術的計時都是這樣。這聽起來效率很低,但不是因為循環的數量是遞歸級別。在每次迭代中,您都可以檢查循環並糾正它發生的位置。您還可以在臨時表上設置一個約束,以便在發生循環時觸發錯誤,盡管您似乎更喜歡處理更優雅的循環。當while循環遍歷一定數量的級別時,您還可以觸發一個錯誤(以捕獲未檢測到的循環?-哦,天哪,這種事有時會發生。

The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when @@rowcount=0! Simple eh?

訣竅是反復插入一個臨時表(其中包含根條目),包括一個具有當前迭代號的列,並在臨時表中最近的結果和原始表中的子條目之間進行內部連接。@@rowcount=0時跳出循環!簡單是嗎?

#5


1  

I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.

我知道你剛才問過這個問題,但是這里有一個解決方案可以用來檢測無限遞歸循環。我生成一條路徑,並在CTE條件中檢查用戶ID是否在路徑中,如果在路徑中,它不會再處理它。希望這個有幫助。

Jose

DECLARE @Table TABLE(
    USER_ID INT,
    MANAGER_ID INT )
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 1, 2
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 2, 1
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 3, 1
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 4, 3
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 5, 2

DECLARE @UserID INT
SELECT @UserID = 1

;with
UserTbl as -- Selects an employee and his subordinates.
(
    select 
        '/'+cast( a.USER_ID as varchar(max)) as [path],
        a.[User_ID], 
        a.[Manager_ID] 
    from @Table a 
    where [User_ID] = @UserID
    union all
    select
        b.[path] +'/'+ cast( a.USER_ID as varchar(max)) as [path],
        a.[User_ID], 
        a.[Manager_ID] 
    from @Table a 
    inner join UserTbl b 
        on (a.[Manager_ID]=b.[User_ID])
    where charindex('/'+cast( a.USER_ID as varchar(max))+'/',[path]) = 0
)
select * from UserTbl

#6


0  

basicaly if you have loops like this in data you'll have to do the retreival logic by yourself. you could use one cte to get only subordinates and other to get bosses.

基本上,如果你在數據中有這樣的循環你必須自己做retreival邏輯。你可以用一個cte只招下屬,用另一個來招老板。

another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.

另一種想法是讓兩個公司的老板都有一個虛擬的爭吵,這樣他們就不會成為彼此的老板,這是荒謬的。這是我的prefferd選項。

#7


0  

I can think of two approaches.

我可以想到兩種方法。

1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.

1)生成比您希望的多的行,但是要包含一個檢查,以確保它不會遞歸得太深。然后刪除重復的用戶記錄。

2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.

2)使用一個字符串來保存已經訪問過的用戶。就像not in subquery idea不工作一樣。

Approach 1:

方法1:

; with TooMuchHierarchy as (
    select "User_ID"
        , Manager_ID 
        , 0 as Depth
    from "User" 
    WHERE "User_ID" = @UserID
    union all
    select U."User_ID"
        , U.Manager_ID
        , M.Depth + 1 as Depth
    from TooMuchHierarchy M
    inner join "User" U 
        on U.Manager_ID = M."user_id"
    where Depth < 100) -- Warning MAGIC NUMBER!!
, AddMaxDepth as (
    select "User_ID"
        , Manager_id
        , Depth
        , max(depth) over (partition by "User_ID") as MaxDepth
    from TooMuchHierarchy)
select "user_id", Manager_Id 
from AddMaxDepth
where Depth = MaxDepth

The line where Depth < 100 is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then add option (maxrecursion ... number ...) to whole thing to allow more recursion.

深度< 100的那一行是阻止你得到最大遞歸誤差的原因。使這個數字更小,將產生更少的需要丟棄的記錄。把它設置得太小,員工就不會被返回,所以要確保它至少和存儲的組織結構圖的深度一樣大。隨着公司的成長,這是一場噩夢。如果它需要更大,那么添加選項(maxrecursion…數字…)以允許更多的遞歸。

Approach 2:

方法2:

; with Hierarchy as (
    select "User_ID"
        , Manager_ID 
        , '#' + cast("user_id" as varchar(max)) + '#' as user_id_list
    from "User" 
    WHERE "User_ID" = @UserID
    union all
    select U."User_ID"
        , U.Manager_ID
        , M.user_id_list + '#' + cast(U."user_id" as varchar(max)) + '#' as user_id_list
    from Hierarchy M
    inner join "User" U 
        on U.Manager_ID = M."user_id"
    where user_id_list not like '%#' + cast(U."User_id" as varchar(max)) + '#%')
select "user_id", Manager_Id 
from Hierarchy

#8


0  

The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.

首選的解決方案是清理數據,並確保將來沒有任何循環——這可以通過在檢查約束中封裝的觸發器或UDF來實現。

However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One

但是,正如我在這里演示的那樣,您可以使用多語句UDF:避免無限循環。第一部分

You can add a NOT IN() clause in the join to filter out the cycles.

您可以在聯接中添加NOT IN()子句,以過濾出循環。

#9


0  

This is the code I used on a project to chase up and down hierarchical relationship trees.

這是我在一個項目中用來跟蹤和跟蹤層次關系樹的代碼。

User defined function to capture subordinates:

用戶定義的功能捕捉下屬:

CREATE FUNCTION fn_UserSubordinates(@User_ID INT)
RETURNS @SubordinateUsers TABLE (User_ID INT, Distance INT) AS BEGIN
    IF @User_ID IS NULL
        RETURN

    INSERT INTO @SubordinateUsers (User_ID, Distance) VALUES ( @User_ID, 0)

    DECLARE @Distance INT, @Finished BIT
    SELECT @Distance = 1, @Finished = 0

    WHILE @Finished = 0
    BEGIN
        INSERT INTO @SubordinateUsers
            SELECT S.User_ID, @Distance
                FROM Users AS S
                JOIN @SubordinateUsers AS C
                    ON C.User_ID = S.Manager_ID
                LEFT JOIN @SubordinateUsers AS C2
                    ON C2.User_ID = S.User_ID
                WHERE C2.User_ID IS NULL
        IF @@RowCount = 0
            SET @Finished = 1

        SET @Distance = @Distance + 1
    END

    RETURN
END

User defined function to capture managers:

用戶定義的功能來捕獲管理器:

CREATE FUNCTION fn_UserManagers(@User_ID INT)
RETURNS @User TABLE (User_ID INT, Distance INT) AS BEGIN
    IF @User_ID IS NULL
        RETURN

    DECLARE @Manager_ID INT

    SELECT @Manager_ID = Manager_ID
    FROM UserClasses WITH (NOLOCK)
    WHERE User_ID = @User_ID

    INSERT INTO @UserClasses (User_ID, Distance)
        SELECT User_ID, Distance + 1
        FROM dbo.fn_UserManagers(@Manager_ID)

    INSERT INTO @User (User_ID, Distance) VALUES (@User_ID, 0)

    RETURN
END

#10


0  

You need a some method to prevent your recursive query from adding User ID's already in the set. However, as sub-queries and double mentions of the recursive table are not allowed (thank you van) you need another solution to remove the users already in the list.

您需要一些方法來防止遞歸查詢添加用戶ID,但是,由於不允許子查詢和遞歸表的雙重提及(謝謝van),您需要另一個解決方案來刪除列表中已有的用戶。

The solution is to use EXCEPT to remove these rows. This should work according to the manual. Multiple recursive statements linked with union-type operators are allowed. Removing the users already in the list means that after a certain number of iterations the recursive result set returns empty and the recursion stops.

解決方案是使用,除了刪除這些行。這應該根據手冊工作。允許與union類型的操作符鏈接的多個遞歸語句。刪除列表中已經存在的用戶意味着經過一定數量的迭代后,遞歸結果集返回空,遞歸停止。

with UserTbl as -- Selects an employee and his subordinates.
(
    select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = @UserID
    union all
    (
      select a.[User_ID], a.[Manager_ID] 
        from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
        where a.[User_ID] not in (select [User_ID] from UserTbl)
      EXCEPT
        select a.[User_ID], a.[Manager_ID] from UserTbl a 
     )
)
select * from UserTbl;

The other option is to hardcode a level variable that will stop the query after a fixed number of iterations or use the MAXRECURSION query option hint, but I guess that is not what you want.

另一個選項是硬編碼一個級別變量,該變量將在一定數量的迭代之后停止查詢,或者使用MAXRECURSION查詢選項提示,但我認為這不是您想要的。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2009/07/28/a092c2160db71246f6f3d3498033c05f.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com