### 0或空的numpy数组

#### [英]numpy array of zeros or empty

I am writing code and efficiency is very important. Actually I need 2d array, that I am filling with 0 and 1 in for loop. What is better and why?

1. Make empty array and fill it with "0" and "1". It's pseudocode, my array will be much bigger.

创建空数组，并用“0”和“1”填充。它是伪代码，我的数组会更大。

2. Make array filled by zeros and make if() and if not zero - put one.

使数组填充为0，并使if()和if不是0 - put 1。

So I need information what is more efficiency: 1. Put every element "0" and "1" to empty array or 2. Make if() (efficiency of 'if') and then put only "1" element.

## 3 个解决方案

### #1

2

• empty() does not initialize the memory, therefore your array will be filled with garbage and you will have to initialize all cells.
• empty()没有初始化内存，因此数组将被填满垃圾，必须初始化所有单元格。
• zeros() initializes everything to 0. Therefore, if your final result includes lots of zeros, this will save you the time to set all those array cells to zero manually.
• 0()初始化为0。因此，如果最终结果包含大量的0，这将节省您手动将所有这些数组单元格设置为0的时间。

I would go with zeros(). The performance bottleneck will be your python for loop anyway.

Fortunately, Numpy now as a JIT compiler, which can turn your crummy and slow python for loop into machine code:

http://numba.pydata.org/

http://numba.pydata.org/

I tried it. It's a bit rough around the edges, but the speedups can be quite spectacular compared to bare python code. Of course the best choice is to vectorize using numpy, but you don't always have a choice.

### #2

1

``````Ae = np.empty(10000)
A0 = np.zeros((10000)
``````

differ slightly in how memory is initially allocated. But any differences in time will be minor if you go on and do something like

``````for i in range(10000):
Ae[i] = <some calc>
``````

or

``````for i in range(10000):
val = <some calc>
if val>0:
A0[i] = val
``````

If I had to loop like this, I'd go ahead and use `np.zeros`, and also use the unconditional assignment. It keeps the code simpler, and compared to everything else that is going on, the time differences will be minor.

Sample times:

``````In : def foo0(N):
...:     A = np.empty(N,int)
...:     for i in range(N):
...:         A[i] = np.random.randint(0,2)
...:     return A
...:
In : def foo1(N):
...:     A = np.zeros(N,int)
...:     for i in range(N):
...:         val = np.random.randint(0,2)
...:         if val:
...:             A[i] = val
...:     return A
...:
``````

3 ways of assigning 10 0/1 values

``````In : foo0(10)
Out: array([0, 0, 1, 0, 0, 1, 0, 1, 1, 0])
In : foo1(10)
Out: array([0, 1, 1, 1, 1, 1, 1, 1, 0, 0])
In : np.random.randint(0,2,10)
Out: array([0, 1, 1, 0, 1, 1, 1, 0, 0, 1])
``````

times:

``````In : timeit foo0(1000)
100 loops, best of 3: 4.06 ms per loop
In : timeit foo1(1000)
100 loops, best of 3: 3.95 ms per loop
In : timeit np.random.randint(0,2,1000)
... cached.
100000 loops, best of 3: 13.6 µs per loop
``````

The 2 loop times are nearly the same.

### #3

1

It is better to create array of zeros and fill it using if-else. Even conditions makes slow your code, reshaping empty array or concatenating it with new vectors each iteration of loop is more slower operation, because each time new array of new size is created and old array is copied there together with new vector value by value.