C# - 部分加载XML文件

[英]C# - Loading XML file in parts

My task is to load new set of data (which is written in XML file) and then compare it to the 'old' set (also in XML). All the changes are written to another file.


My program loads new and old file into two datasets, then row after row I compare primary key from the new set with the old one. When I find corresponding row, I check all fields and if there are differences with the old one, I write it to third set and then this set to a file.


Right now I use:



and then I just find rows with corresponding primary key and compare other fields. It is working quite good for small files.


The problem is that my files may have up to about 4GB. If my new and old data are that big it is quite problematic to load 8GB of data to memory.


I would like to load my data in parts, but to compare I need whole old data (or how to get specific row with corresponding primary key from XML file?).


Another problem is that I don't know the structure of a XML file. It is defined by user.


What is the best way to work with such a big files? I thought about using LINQ to XML, but I don't know if it has options that can help with my problem. Maybe it would be better to leave XML and use something different?

使用如此大的文件的最佳方法是什么?我想过使用LINQ to XML,但我不知道它是否有可以帮助解决我的问题的选项。也许最好留下XML并使用不同的东西?

1 个解决方案



You are absolutely right that you should leave XML. It is not a good tool for datasets this size, especially if the dataset consists of many 'records' all with the same structure. Not only are 4GB files unwieldy, but almost anything you use to load and parse them is going to use even more memory overhead than the size of the file.

你绝对应该留下XML。对于这样大小的数据集来说,它不是一个好工具,特别是如果数据集由许多具有相同结构的“记录”组成。 4GB文件不仅不实用,而且几乎任何用于加载和解析它们的东西都将使用比文件大小更多的内存开销。

I would recommend that you look at solutions involving an SQL database, but I have no idea how it can make sense to be analysing a 4GB file where you "don't know the structure [of the file]" because "it is defined by the user". What meaning do you ascribe to 'rows' and 'primary keys' if you don't understand the structure of the file? What do you know about the XML?


It might make sense eg. to read one file, store all the records with primary keys in a certain range, do the same for the other file, do the comparison of that data, then carry on. By segmenting the key space you make sure that you always find matches if they exist. It could also make sense to break your files into smaller chunks in the same way (although I still think XML storage this large is usually inappropriate). Can you say a little more about the problem?




  © 2014-2022 ITdaan.com 联系我们: