[翻译]  Why do two identical lists have a different memory footprint?

[CHINESE]  为什么两个相同的列表具有不同的内存占用?


I created two lists l1 and l2, but each one with a different creation method:

我创建了两个列表l1和l2,但每个列表都有不同的创建方法:

import sys

l1 = [None] * 10
l2 = [None for _ in range(10)]

print('Size of l1 =', sys.getsizeof(l1))
print('Size of l2 =', sys.getsizeof(l2))

But the output surprised me:

但输出令我惊讶:

Size of l1 = 144
Size of l2 = 192

The list created with a list comprehension is a bigger size in memory, but the two lists are identical in Python otherwise.

使用列表推导创建的列表在内存中的大小更大,但是这两个列表在Python中是相同的。

Why is that? Is this some CPython internal thing, or some other explanation?

这是为什么?这是CPython内部的一些东西,还是其他一些解释?

3 个解决方案

#1


142  

When you write [None] * 10, Python knows that it will need a list of exactly 10 objects, so it allocates exactly that.

当你写[None] * 10时,Python知道它需要一个恰好包含10个对象的列表,所以它就是这样分配的。

When you use a list comprehension, Python doesn't know how much it will need. So it gradually grows the list as elements are added. For each reallocation it allocates more room than is immediately needed, so that it doesn't have to reallocate for each element. The resulting list is likely to be somewhat bigger than needed.

当您使用列表推导时,Python不知道它需要多少。因此,随着元素的添加,它会逐渐增加列表。对于每次重新分配,它分配的空间比立即需要的多,因此不必为每个元素重新分配。结果列表可能比需要的要大一些。

You can see this behavior when comparing lists created with similar sizes:

比较使用相似大小创建的列表时,您可以看到此行为:

>>> sys.getsizeof([None]*15)
184
>>> sys.getsizeof([None]*16)
192
>>> sys.getsizeof([None for _ in range(15)])
192
>>> sys.getsizeof([None for _ in range(16)])
192
>>> sys.getsizeof([None for _ in range(17)])
264

You can see that the first method allocates just what is needed, while the second one grows periodically. In this example, it allocates enough for 16 elements, and had to reallocate when reaching the 17th.

您可以看到第一种方法只分配所需的内容,而第二种方法则定期增长。在这个例子中,它为16个元素分配足够的元素,并且在到达17日时必须重新分配。

#2


43  

As noted in this question the list-comprehension uses list.append under the hood, so it will call the list-resize method, which overallocates.

正如在这个问题中所指出的,list-comprehension使用了list.append,因此它将调用list-resize方法,该方法进行了全面的分配。

To demonstrate this to yourself, you can actually use the dis dissasembler:

为了向您自己演示,您实际上可以使用dis dissasembler:

>>> code = compile('[x for x in iterable]', '', 'eval')
>>> import dis
>>> dis.dis(code)
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x10560b810, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (iterable)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x10560b810, file "", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (x)
              8 LOAD_FAST                1 (x)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
>>>

Notice the LIST_APPEND opcode in the disassembly of the <listcomp> code object. From the docs:

注意 代码对象的反汇编中的LIST_APPEND操作码。来自文档:

LIST_APPEND(i)

LIST_APPEND(I)

Calls list.append(TOS[-i], TOS). Used to implement list comprehensions.

调用list.append(TOS [-i],TOS)。用于实现列表推导。

Now, for the list-repetition operation, we have a hint about what is going on if we consider:

现在,对于列表重复操作,如果我们考虑,我们会有一个提示:

>>> import sys
>>> sys.getsizeof([])
64
>>> 8*10
80
>>> 64 + 80
144
>>> sys.getsizeof([None]*10)
144

So, it seems to be able to exactly allocate the size. Looking at the source code, we see this is exactly what happens:

所以,它似乎能够准确地分配大小。看一下源代码,我们看到这正是发生的事情:

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
    Py_ssize_t i, j;
    Py_ssize_t size;
    PyListObject *np;
    PyObject **p, **items;
    PyObject *elem;
    if (n < 0)
        n = 0;
    if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
        return PyErr_NoMemory();
    size = Py_SIZE(a) * n;
    if (size == 0)
        return PyList_New(0);
    np = (PyListObject *) PyList_New(size);

Namely, here: size = Py_SIZE(a) * n;. The rest of the functions simply fills the array.

即,这里:size = Py_SIZE(a)* n;。其余的函数只是填充数组。

#3


4  

None is a block of memory, but it is not a pre-specified size. In addition to that, there is some extra spacing in an array between array elements. You can see this yourself by running:

None是内存块,但它不是预先指定的大小。除此之外,数组元素之间的数组中还有一些额外的间距。你可以通过运行来自己看到:

for ele in l2:
    print(sys.getsizeof(ele))

>>>>16
16
16
16
16
16
16
16
16
16

Which does not total the size of l2, but rather is less.

这不是l2的总大小,而是更小。

print(sys.getsizeof([None]))
72

And this is much greater than one tenth of the size of l1.

而这远远大于l1的十分之一。

Your numbers should vary depending on both the details of your operating system and the details of current memory usage in your operating system. The size of [None] can never be bigger than the available adjacent memory where the variable is set to be stored, and the variable may have to be moved if it is later dynamically allocated to be larger.

您的数字应根据操作系统的详细信息和操作系统中当前内存使用情况的详细信息而有所不同。 [None]的大小永远不会大于设置要存储变量的可用相邻内存,如果稍后动态分配变量,则可能必须移动该变量。


注意!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。



 
© 2014-2018 ITdaan.com 粤ICP备14056181号