使用Boost對復雜數據進行C ++序列化

[英]C++ serialization of complex data using Boost

I have a set of classes I wish to serialize the data from. There is a lot of data though, (we're talking a std::map with up to a million or more class instances).

我有一組我希望序列化數據的類。但是有很多數據,(我們正在討論一個帶有多達一百萬或更多類實例的std :: map)。

Not wishing to optimize my code too early, I thought I'd try a simple and clean XML implementation, so I used tinyXML to save the data out to XML, but it was just far too slow. So I've started looking at using Boost.Serialization writing and reading standard ascii or binary.


It seems to be much better suited to the task as I don't have to allocate all this memory as an overhead before I get started.


My question is essentially how to go about planning an optimal serialization strategy for a file format. I don't particularly want to serialize the whole map if it's not necessary, as it's really only the contents I'm after. Having played around with serialization a little (and looked at the output), I don't understand how loading the data back in could know when it's reached the end of the map for example, if I simply save out all the items one after another. What issues do you need to consider when planning a serialization strategy?



5 个解决方案


There are many advantages to boost.serialization. For instance, as you say, just including a method with a specified signature, allows the framework to serialize and deserialize your data. Also, boost.serialization includes serializers and readers for all the standard STL containers, so you don't have to bother if all keys have been stored (they will) or how to detect the last entry in the map when deserializing (it will be detected automatically).


There are, however, some considerations to make. For example, if you have a field in your class that it is calculated, or used to speed-up, such as indexes or hash tables, you don't have to store these, but you have to take into account that you have to reconstruct these structures from the data read from the disk.


As for the "file format" you mention, I think some times we try to focus in the format rather than in the data. I mean, the exact format of the file don't matter as long as you are able to retrieve the data seamlessly using (say) boost.serialization. If you want to share the file with other utilities that don't use serialization, that's another thing. But just for the purposes of (de)serialization, you don't have to care about the internal file format.



Read this FAQ! Does that help to get started?



I don't particularly want to serialize the whole map if it's not necessary, as it's really only the contents I'm after.


Does that mean you don't really need to serialize the whole object? Maybe you should reconsider just using a text-based format. If you really need to serialize only a subset of the key/value pairs in a map then you should probably just write them to a text file and read them in later. You don't necessarily need XML; just one line per map key followed by one line with the value should work.



If all you want is key value pairs then the important thing is the types the keys and values take, this will colour how you deal with things.


Serialising the map itself would be a poor plan in general since you may wish to change your associative container type later but not invalidate (or have to translate) previous serialised files.


Serialising the container can be useful in certain circumstances if you wish to avoid the cost of rebuilding the container again (but pre-sizing the container is normally sufficient to avoid the vast majority of this overhead) but this should be a decision based on specific aspects of your application and usage.


If you supply the type of the key/values we can help more. without this here are some general tips:


  • If they are amenable to string representation then a simple CSV file may be sufficient (but use an existing reader writer library for it, reading and writing legit CSV is harder than it looks superficially)
  • 如果它們適合字符串表示,那么一個簡單的CSV文件就足夠了(但是使用現有的讀寫器庫,讀取和寫入合法的CSV比表面看起來更難)

  • IF they are fixed width then a simple binary format will make reading and writing very easy (and quick) but care should be taken to acknowledge the issues of:
    • endianess
    • whether you wish to allow simple catting of such files together or add CRC like values for integrity (you can do both but it's harder)
    • 是否希望允許簡單地將這些文件放在一起,或者為了完整性而添加類似CRC的值(你可以做到這兩點但是更難)

    • You lose the ability to grep the files (this is a real loss, you may end having to reinvent parts of your toolchain for this)
    • 你失去了grep文件的能力(這是一個真正的損失,你可能最終必須重新發明你的工具鏈的一部分)

    • whether changing platform/compiler/size_t will break the format
    • 更改platform / compiler / size_t是否會破壞格式

  • 如果它們是固定寬度,則簡單的二進制格式將使讀取和寫入非常容易(並且快速)但是應該注意以下問題:endianess是否允許簡單地將這些文件一起捕獲或者添加類似於CRC的值完整性(你可以做到這兩點,但它更難)你失去了grep文件的能力(這是一個真正的損失,你可能最終不得不重新發明你的工具鏈的部分)是否更改platform / compiler / size_t將打破格式

  • Some structured textual format that is lighter than XML. There are several JSOM/YAML etc. These will provide extensibility you quite likely don't require.
  • 一些比XML輕的結構化文本格式。有幾個JSOM / YAML等。這些將提供您很可能不需要的可擴展性。


Use Google's Protocol Buffers which is a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats.

使用Google的協議緩沖區,這是一種與語言無關,平台無關,可擴展的序列化結構化數據的方式,可用於通信協議,數據存儲等。 Google對幾乎所有內部​​RPC協議和文件格式都使用Protocol Buffers。

There are bindings for C++, Java, Python, Perl, C#, and Ruby.

有C ++,Java,Python,Perl,C#和Ruby的綁定。

You describe your data in metadata .proto files


message Person {
  required int32 id = 1;
  required string name = 2;
  optional string email = 3;

Then you would use it in C++ like this:

然后你會在C ++中使用它,如下所示:

Person person;

fstream out("person.pb", ios::out | ios::binary | ios::trunc);

Or like this:


Person person;
fstream in("person.pb", ios::in | ios::binary);
if (!person.ParseFromIstream(&in)) {
  cerr << "Failed to parse person.pb." << endl;

cout << "ID: " << person.id() << endl;
cout << "name: " << person.name() << endl;
if (person.has_email()) {
  cout << "e-mail: " << person.email() << endl;

For a more complete example, see the tutorials.




粤ICP备14056181号  © 2014-2021 ITdaan.com