C# - 部分加載XML文件

[英]C# - Loading XML file in parts

My task is to load new set of data (which is written in XML file) and then compare it to the 'old' set (also in XML). All the changes are written to another file.


My program loads new and old file into two datasets, then row after row I compare primary key from the new set with the old one. When I find corresponding row, I check all fields and if there are differences with the old one, I write it to third set and then this set to a file.


Right now I use:



and then I just find rows with corresponding primary key and compare other fields. It is working quite good for small files.


The problem is that my files may have up to about 4GB. If my new and old data are that big it is quite problematic to load 8GB of data to memory.


I would like to load my data in parts, but to compare I need whole old data (or how to get specific row with corresponding primary key from XML file?).


Another problem is that I don't know the structure of a XML file. It is defined by user.


What is the best way to work with such a big files? I thought about using LINQ to XML, but I don't know if it has options that can help with my problem. Maybe it would be better to leave XML and use something different?

使用如此大的文件的最佳方法是什么?我想過使用LINQ to XML,但我不知道它是否有可以幫助解決我的問題的選項。也許最好留下XML並使用不同的東西?

1 个解决方案



You are absolutely right that you should leave XML. It is not a good tool for datasets this size, especially if the dataset consists of many 'records' all with the same structure. Not only are 4GB files unwieldy, but almost anything you use to load and parse them is going to use even more memory overhead than the size of the file.

你絕對應該留下XML。對於這樣大小的數據集來說,它不是一個好工具,特別是如果數據集由許多具有相同結構的“記錄”組成。 4GB文件不僅不實用,而且幾乎任何用於加載和解析它們的東西都將使用比文件大小更多的內存開銷。

I would recommend that you look at solutions involving an SQL database, but I have no idea how it can make sense to be analysing a 4GB file where you "don't know the structure [of the file]" because "it is defined by the user". What meaning do you ascribe to 'rows' and 'primary keys' if you don't understand the structure of the file? What do you know about the XML?


It might make sense eg. to read one file, store all the records with primary keys in a certain range, do the same for the other file, do the comparison of that data, then carry on. By segmenting the key space you make sure that you always find matches if they exist. It could also make sense to break your files into smaller chunks in the same way (although I still think XML storage this large is usually inappropriate). Can you say a little more about the problem?




  © 2014-2022 ITdaan.com 联系我们: