使用XSLT刪除簡單XML文件中的重復項

[英]Using XSLT to remove duplicate entires in a simple XML file


I am new to XSLT and am having a problem with removing duplicates from a simple XML file. Spent a lot of time trying to get it but it's never quite right. Here is the source file:

我是XSLT的新手,在從簡單的XML文件中刪除重復項時遇到問題。花了很多時間試圖得到它,但它永遠不會是正確的。這是源文件:

<?xml version="1.0" encoding="UTF-16"?>
<language>
    <lang name="welcome">welcom</lang>
    <lang name="open">Open</lang>
    <lang name="close">Close</lang>
    <lang name="welcome">Welcome</lang>
    <lang name="copy">Copy</lang>
</language>

Desired output is this:

期望的輸出是這樣的:

<?xml version="1.0" encoding="UTF-16"?>
<language>
    <lang name="open">Open</lang>
    <lang name="close">Close</lang>
    <lang name="welcome">Welcome</lang>
    <lang name="copy">Copy</lang>
</language>

The actual files are much larger than this and "lang" and "name" may change later in the file, and I only want to keep the last duplicate. Basically, if the tag and attributes are duplicated, only keep the last entry. I hope this is possible with XSLT 1.0. If not, I can always use multiple scripts in case lang does change to something else. Thank you in advance!

實際文件比這大得多,“lang”和“name”可能會在文件后面更改,我只想保留最后一個副本。基本上,如果標記和屬性是重復的,則只保留最后一個條目。我希望XSLT 1.0可以實現這一點。如果沒有,我總是可以使用多個腳本,以防lang確實改為其他東西。先感謝您!

2 个解决方案

#1


5  

The following XSLT should answer your question :

以下XSLT應該回答您的問題:

    <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="lang[@name=following-sibling::lang/@name]"/>
</xsl:stylesheet>

This way, you filter every lang element that have a following sibling lang element with the same value for the name attribute.

這樣,您可以過濾具有以下同級語言元素的每個lang元素,並使用相同的name屬性值。

#2


1  

A more general and much more efficient (linear) solution than the quadratical time complexity (O(N^2)) of the currently accepted answer. This is especially important in processing a large XML document, as the OP has told us the actual documents are:

比當前接受的答案的二次時間復雜度(O(N ^ 2))更通用且更有效(線性)的解決方案。這在處理大型XML文檔時尤其重要,因為OP告訴我們實際的文檔是:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kLangByName" match="lang" use="@name"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "lang[not(generate-id()
      =
       generate-id(key('kLangByName', @name)[last()]))]"/>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

在提供的XML文檔上應用此轉換時:

<language>
    <lang name="welcome">welcom</lang>
    <lang name="open">Open</lang>
    <lang name="close">Close</lang>
    <lang name="welcome">Welcome</lang>
    <lang name="copy">Copy</lang>
</language>

the wanted, correct result is produced:

產生了想要的正確結果:

<language>
   <lang name="open">Open</lang>
   <lang name="close">Close</lang>
   <lang name="welcome">Welcome</lang>
   <lang name="copy">Copy</lang>
</language>

Explanation:

Using the Muenchian grouping method.

使用Muenchian分組方法。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2012/10/23/72f6d3096e958fd03c5c08fd249daa9a.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com