C# - 部分加載XML文件

[英]C# - Loading XML file in parts


My task is to load new set of data (which is written in XML file) and then compare it to the 'old' set (also in XML). All the changes are written to another file.

我的任務是加載新的數據集(用XML文件編寫),然后將它與'舊'集合(也用XML格式)進行比較。所有更改都寫入另一個文件。

My program loads new and old file into two datasets, then row after row I compare primary key from the new set with the old one. When I find corresponding row, I check all fields and if there are differences with the old one, I write it to third set and then this set to a file.

我的程序將新舊文件加載到兩個數據集中,然后一行一行地將新集合中的主鍵與舊集合進行比較。當我找到相應的行時,我檢查所有字段,如果與舊字段有差異,我將其寫入第三組,然后將其設置為文件。

Right now I use:

現在我使用:

    newDS.ReadXml("data.xml");
    oldDS.ReadXml("old.xml");

and then I just find rows with corresponding primary key and compare other fields. It is working quite good for small files.

然后我只找到具有相應主鍵的行並比較其他字段。它對小文件非常有用。

The problem is that my files may have up to about 4GB. If my new and old data are that big it is quite problematic to load 8GB of data to memory.

問題是我的文件最多可能有4GB左右。如果我的新舊數據那么大,那么將8GB的數據加載到內存中是非常有問題的。

I would like to load my data in parts, but to compare I need whole old data (or how to get specific row with corresponding primary key from XML file?).

我想在部分中加載我的數據,但為了比較我需要整個舊數據(或者如何從XML文件中獲取具有相應主鍵的特定行?)。

Another problem is that I don't know the structure of a XML file. It is defined by user.

另一個問題是我不知道XML文件的結構。它由用戶定義。

What is the best way to work with such a big files? I thought about using LINQ to XML, but I don't know if it has options that can help with my problem. Maybe it would be better to leave XML and use something different?

使用如此大的文件的最佳方法是什么?我想過使用LINQ to XML,但我不知道它是否有可以幫助解決我的問題的選項。也許最好留下XML並使用不同的東西?

1 个解决方案

#1


-2  

You are absolutely right that you should leave XML. It is not a good tool for datasets this size, especially if the dataset consists of many 'records' all with the same structure. Not only are 4GB files unwieldy, but almost anything you use to load and parse them is going to use even more memory overhead than the size of the file.

你絕對應該留下XML。對於這樣大小的數據集來說,它不是一個好工具,特別是如果數據集由許多具有相同結構的“記錄”組成。 4GB文件不僅不實用,而且幾乎任何用於加載和解析它們的東西都將使用比文件大小更多的內存開銷。

I would recommend that you look at solutions involving an SQL database, but I have no idea how it can make sense to be analysing a 4GB file where you "don't know the structure [of the file]" because "it is defined by the user". What meaning do you ascribe to 'rows' and 'primary keys' if you don't understand the structure of the file? What do you know about the XML?

我建議您查看涉及SQL數據庫的解決方案,但我不知道如何分析4GB文件,其中“您不知道文件的結構”,因為“它是由用戶”。如果您不了解文件的結構,您會將“行”和“主鍵”賦予什么意義?您對XML有何了解?

It might make sense eg. to read one file, store all the records with primary keys in a certain range, do the same for the other file, do the comparison of that data, then carry on. By segmenting the key space you make sure that you always find matches if they exist. It could also make sense to break your files into smaller chunks in the same way (although I still think XML storage this large is usually inappropriate). Can you say a little more about the problem?

這可能是有道理的,例如。讀取一個文件,將主鍵存儲在一定范圍內的所有記錄,對另一個文件執行相同操作,對該數據進行比較,然后繼續。通過對密鑰空間進行分段,您可以確保始終找到匹配項(如果存在)。以同樣的方式將文件分成更小的塊也是有意義的(盡管我仍然認為這種大的XML存儲通常是不合適的)。你能再談一談這個問題嗎?


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2013/08/26/2fdcbf1e0f776e5ec4b5e81af7bf93bb.html



 
  © 2014-2022 ITdaan.com 联系我们: