C# - 部分加载XML文件

[英]C# - Loading XML file in parts


My task is to load new set of data (which is written in XML file) and then compare it to the 'old' set (also in XML). All the changes are written to another file.

我的任务是加载新的数据集(用XML文件编写),然后将它与'旧'集合(也用XML格式)进行比较。所有更改都写入另一个文件。

My program loads new and old file into two datasets, then row after row I compare primary key from the new set with the old one. When I find corresponding row, I check all fields and if there are differences with the old one, I write it to third set and then this set to a file.

我的程序将新旧文件加载到两个数据集中,然后一行一行地将新集合中的主键与旧集合进行比较。当我找到相应的行时,我检查所有字段,如果与旧字段有差异,我将其写入第三组,然后将其设置为文件。

Right now I use:

现在我使用:

    newDS.ReadXml("data.xml");
    oldDS.ReadXml("old.xml");

and then I just find rows with corresponding primary key and compare other fields. It is working quite good for small files.

然后我只找到具有相应主键的行并比较其他字段。它对小文件非常有用。

The problem is that my files may have up to about 4GB. If my new and old data are that big it is quite problematic to load 8GB of data to memory.

问题是我的文件最多可能有4GB左右。如果我的新旧数据那么大,那么将8GB的数据加载到内存中是非常有问题的。

I would like to load my data in parts, but to compare I need whole old data (or how to get specific row with corresponding primary key from XML file?).

我想在部分中加载我的数据,但为了比较我需要整个旧数据(或者如何从XML文件中获取具有相应主键的特定行?)。

Another problem is that I don't know the structure of a XML file. It is defined by user.

另一个问题是我不知道XML文件的结构。它由用户定义。

What is the best way to work with such a big files? I thought about using LINQ to XML, but I don't know if it has options that can help with my problem. Maybe it would be better to leave XML and use something different?

使用如此大的文件的最佳方法是什么?我想过使用LINQ to XML,但我不知道它是否有可以帮助解决我的问题的选项。也许最好留下XML并使用不同的东西?

1 个解决方案

#1


-2  

You are absolutely right that you should leave XML. It is not a good tool for datasets this size, especially if the dataset consists of many 'records' all with the same structure. Not only are 4GB files unwieldy, but almost anything you use to load and parse them is going to use even more memory overhead than the size of the file.

你绝对应该留下XML。对于这样大小的数据集来说,它不是一个好工具,特别是如果数据集由许多具有相同结构的“记录”组成。 4GB文件不仅不实用,而且几乎任何用于加载和解析它们的东西都将使用比文件大小更多的内存开销。

I would recommend that you look at solutions involving an SQL database, but I have no idea how it can make sense to be analysing a 4GB file where you "don't know the structure [of the file]" because "it is defined by the user". What meaning do you ascribe to 'rows' and 'primary keys' if you don't understand the structure of the file? What do you know about the XML?

我建议您查看涉及SQL数据库的解决方案,但我不知道如何分析4GB文件,其中“您不知道文件的结构”,因为“它是由用户”。如果您不了解文件的结构,您会将“行”和“主键”赋予什么意义?您对XML有何了解?

It might make sense eg. to read one file, store all the records with primary keys in a certain range, do the same for the other file, do the comparison of that data, then carry on. By segmenting the key space you make sure that you always find matches if they exist. It could also make sense to break your files into smaller chunks in the same way (although I still think XML storage this large is usually inappropriate). Can you say a little more about the problem?

这可能是有道理的,例如。读取一个文件,将主键存储在一定范围内的所有记录,对另一个文件执行相同操作,对该数据进行比较,然后继续。通过对密钥空间进行分段,您可以确保始终找到匹配项(如果存在)。以同样的方式将文件分成更小的块也是有意义的(尽管我仍然认为这种大的XML存储通常是不合适的)。你能再谈一谈这个问题吗?


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2013/08/26/2fdcbf1e0f776e5ec4b5e81af7bf93bb.html



 
  © 2014-2022 ITdaan.com 联系我们: