通过从json创建新对象来消除嵌套

[英]Eliminate nesting by creating new objects from json


I have a standard nested json file which looks like the below: They are multi level nested and I have to eliminate all the nesting by creating new objects.

我有一个标准的嵌套json文件,如下所示:它们是多层嵌套的,我必须通过创建新对象来消除所有嵌套。

Nested json file.

嵌套的json文件。

{
"persons": [{
    "id": "f4d322fa8f552",
    "address": {
        "building": "710",
        "coord": "[123, 465]",
        "street": "Avenue Road",
        "zipcode": "12345"
    },
    "cuisine": "Chinese",
    "grades": [{
        "date": "2013-03-03T00:00:00.000Z",
        "grade": "B",
        "score": {
          "x": 3,
          "y": 2
        }
    }, {
        "date": "2012-11-23T00:00:00.000Z",
        "grade": "C",
        "score": {
          "x": 1,
          "y": 22
        }
    }],
    "name": "Shash"
}]
}

The new objects that needs to be created

需要创建的新对象

persons 
[
{
"id": "f4d322fa8f552",
"cuisine": "Chinese",
"name": "Shash"
}
]

persons_address
[
{
"id": "f4d322fa8f552",
"building": "710",
"coord": "[123, 465]",
"street": "Avenue Road",
"zipcode": "12345"
}
]

persons_grade
[
{
"id": "f4d322fa8f552",
"__index": "0",
"date": "2013-03-03T00:00:00.000Z",
"grade": "B"
},
{
"id": "f4d322fa8f552",
"__index": "1",
"date": "2012-11-23T00:00:00.000Z",
"grade": "C"
},
]

persons_grade_score
[
{

"id": "f4d322fa8f552",
"__index": "0",
"x": "3",
"y": "2"

},
{

"id": "f4d322fa8f552",
"__index": "1",
"x": "1",
"y": "22"

},
]

My approach : I used a normalise function to make all the lists into dicts. Added another function which can add id to all the nested dicts.

我的方法:我使用normalize函数将所有列表都放入dicts中。添加了另一个可以为所有嵌套dicts添加id的函数。

Now I am not able to traverse each level and create new objects. Is there any way to get to this.

现在我无法遍历每个级别并创建新对象。有没有办法达到这个目的。

The whole idea after new objects are created we can load it into a database.

创建新对象后的整个想法我们可以将其加载到数据库中。

3 个解决方案

#1


6  

Concepts

Here is a generic solution that does what you need. The concept it uses is recursively looping through all values of the top-level "persons" dictionary. Based on the type of each value it finds, it proceeds.

这是一个通用的解决方案,可以满足您的需求。它使用的概念是递归循环遍历顶级“人物”字典的所有值。根据它找到的每个值的类型,它继续进行。

So for all the non-dict/non-lists it finds in each dictionary, it puts those into the top-level object you need.

因此,对于它在每个字典中找到的所有非字典/非列表,它将它们放入您需要的顶级对象中。

Or if it finds a dictionary or a list, it recursively does the same thing again, finding more non-dict/non-lists or lists or dictionaries.

或者,如果它找到字典或列表,它会递归地再次执行相同的操作,查找更多非字典/非列表或列表或字典。

Also using collections.defaultdict lets us easily populate an unknown number of lists for each key, into a dictionary, so that we can get those 4 top-level objects you want.

同样使用collections.defaultdict可以让我们轻松地将每个键的未知数量的列表填充到字典中,这样我们就可以获得所需的4个顶级对象。

Code example

from collections import defaultdict

class DictFlattener(object):
def __init__(self, object_id_key, object_name):
    """Constructor.

    :param object_id_key: String key that identifies each base object
    :param object_name: String name given to the base object in data.

    """
    self._object_id_key = object_id_key
    self._object_name = object_name

    # Store each of the top-level results lists.
    self._collected_results = None

def parse(self, data):
    """Parse the given nested dictionary data into separate lists.

    Each nested dictionary is transformed into its own list of objects,
    associated with the original object via the object id.

    :param data: Dictionary of data to parse.

    :returns: Single dictionary containing the resulting lists of
        objects, where each key is the object name combined with the
        list name via an underscore.

    """

    self._collected_results = defaultdict(list)

    for value_to_parse in data[self._object_name]:
        object_id = value_to_parse[self._object_id_key]
        parsed_object = {}

        for key, value in value_to_parse.items():
            sub_object_name = self._object_name + "_" + key
            parsed_value = self._parse_value(
                value,
                object_id,
                sub_object_name,
            )
            if parsed_value:
                parsed_object[key] = parsed_value

        self._collected_results[self._object_name].append(parsed_object)

    return self._collected_results

def _parse_value(self, value_to_parse, object_id, current_object_name, index=None):
    """Parse some value of an unknown type.

    If it's a list or a dict, keep parsing, otherwise return it as-is.

    :param value_to_parse: Value to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    :returns: None if value_to_parse is a dict or a list, otherwise returns
        value_to_parse.

    """
    if isinstance(value_to_parse, dict):
        self._parse_dict(
            value_to_parse,
            object_id,
            current_object_name,
            index=index,
        )
    elif isinstance(value_to_parse, list):
        self._parse_list(
            value_to_parse,
            object_id,
            current_object_name,
        )
    else:
        return value_to_parse

def _parse_dict(self, dict_to_parse, object_id, current_object_name,
                index=None):
    """Parse some value of a dict type and store it in self._collected_results.

    :param dict_to_parse: Dict to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    """
    parsed_dict = {
        self._object_id_key: object_id,
    }
    if index is not None:
        parsed_dict["__index"] = index

    for key, value in dict_to_parse.items():
        sub_object_name = current_object_name + "_" + key
        parsed_value = self._parse_value(
            value,
            object_id,
            sub_object_name,
            index=index,
        )
        if parsed_value:
            parsed_dict[key] = value

    self._collected_results[current_object_name].append(parsed_dict)

def _parse_list(self, list_to_parse, object_id, current_object_name):
    """Parse some value of a list type and store it in self._collected_results.

    :param list_to_parse: Dict to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    """
    for index, sub_dict in enumerate(list_to_parse):
        self._parse_value(
            sub_dict,
            object_id,
            current_object_name,
            index=index,
        )

Then to use it:

然后使用它:

parser = DictFlattener("id", "persons")
results = parser.parse(test_data)

Notes

  1. that there were some inconsistencies in your example data vs expected, like scores were strings vs ints. So you'll need to tweak those when you compare given to expected.
  2. 您的示例数据与预期存在一些不一致的情况,例如分数是字符串与整数。所以当你比较预期时,你需要调整那些。
  3. There's always more refactoring one could do, or it could be made more functional rather than being a class. But hopefully looking at this helps you understand how to do it.
  4. 总有更多的重构可以做,或者它可以变得更有功能而不是一个类。但希望看到这个有助于您了解如何做到这一点。
  5. As @jbernardo said, if you will be inserting these into a relational database they shouldn't all just have "id" as the key, it should be "person_id".
  6. 正如@jbernardo所说,如果你将这些插入到关系数据库中,他们不应该只将“id”作为密钥,它应该是“person_id”。

#2


3  

Here is pseudo code to help you out after parsing the json file like this Parsing values from a JSON file?

这里是伪代码,可以帮助你解析json文件,就像从JSON文件中解析这些值一样?

top_level = []
for key, val in data['persons']:
    if not (isinstance(val, dict) or isinstance(val, list)):
        top_level.append(key)

all_second_level = []
for key, val in data['persons']:
    if isinstance(val, dict):
        second_level = []
        for key1, val1 in data['persons']['key']:
            second_level.append(key)
        all_second_level.append(second_level)
    elif isinstance(val, list):
        second_level = []
        for index, item in enumerate(list):
            second_level_entity = []
            for key1, val1 in item:
                if not isinstance(val1, dict):
                    second_level_entity.append(key1)
                else:
                    # append it to third level entity
            # append index to the second_level_entity
            second_level.append(second_level_entity)
        all_second_level.append(second_level)

# in the end append id to all items of entities at each level

#3


2  

# create 4 empty lists
persons = []
persons_address = []
persons_grade = []
persons_grade_score = []


# go through all your data and put the correct information in each list
for data in yourdict['persons']:
    persons.append({
        'id': data['id'],
        'cuisine': data['cuisine'],
        'name': data['name'],
    })

    _address = data['address'].copy()
    _address['id'] = data['id']
    persons_address.append(_address)

    persons_grade.extend({
        'id': data['id'].
        '__index': n,
        'date': g['date'],
        'grade': g['grade'],
    } for n, g in enumerate(data['grades']))

    persons_grade_score.extend({
        'id': data['id'].
        '__index': n,
        'x': g['x'],
        'y': g['y']
    } for n, g in enumerate(data['grades']))

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.itdaan.com/blog/2018/07/16/2984a529730f091841db128c6ea49ce.html



 
© 2014-2018 ITdaan.com 粤ICP备14056181号