在csr_matrix中添加一個0列。

[英]Adding a column of zeroes to a csr_matrix


I have an MxN sparse csr_matrix, and I'd like to add a few columns with only zeroes to the right of the matrix. In principle, the arrays indptr, indices and data keep the same, so I only want to change the dimensions of the matrix. However, this seems to be not implemented.

我有一個MxN稀疏的csr_matrix,我想在矩陣的右邊添加幾個只有0的列。原則上,數組indptr、索引和數據保持不變,所以我只想改變矩陣的維數。然而,這似乎沒有實現。

>>> A = csr_matrix(np.identity(5), dtype = int)
>>> A.toarray()
array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1]])
>>> A.shape
(5, 5)
>>> A.shape = ((5,7))
NotImplementedError: Reshaping not implemented for csr_matrix.

Also horizontally stacking a zero matrix does not seem to work.

同樣水平疊加一個零矩陣似乎也不起作用。

>>> B = csr_matrix(np.zeros([5,2]), dtype = int)
>>> B.toarray()
array([[0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0]])
>>> np.hstack((A,B))
array([ <5x5 sparse matrix of type '<type 'numpy.int32'>'
    with 5 stored elements in Compressed Sparse Row format>,
       <5x2 sparse matrix of type '<type 'numpy.int32'>'
    with 0 stored elements in Compressed Sparse Row format>], dtype=object)

This is what I want to achieve eventually. Is there a quick way to reshape my csr_matrix without copying everything in it?

這就是我最終想要達到的目標。是否有一種快速的方法來重構我的csr_matrix,而不需要復制它中的所有內容?

>>> C = csr_matrix(np.hstack((A.toarray(), B.toarray())))
>>> C.toarray()
array([[1, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0]])

2 个解决方案

#1


4  

What you want to do isn't really what numpy or scipy understand as a reshape. But for your particular case, you can create a new CSR matrix reusing the data, indices and indptr from your original one, without copying them:

你想要做的,並不是真正意義上的“麻木”或“scipy”。但是對於您的特殊情況,您可以創建一個新的CSR矩陣,將數據、索引和indptr從您的原始數據中重新使用,而不需要復制它們:

import scipy.sparse as sps

a = sps.rand(10000, 10000, density=0.01, format='csr')

In [19]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
...                             shape=(10000, 10020), copy=True)
100 loops, best of 3: 6.26 ms per loop

In [20]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
...                             shape=(10000, 10020), copy=False)
10000 loops, best of 3: 47.3 us per loop

In [21]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
...                             shape=(10000, 10020))
10000 loops, best of 3: 48.2 us per loop

So if you no longer need your original matrix a, since the default is copy=False, simply do:

因此,如果你不再需要你的原始矩陣a,因為默認是copy=False,簡單地做:

a = sps.csr_matrix((a.data, a.indices, a.indptr), shape=(10000, 10020))

#2


5  

You can use scipy.sparse.vstack or scipy.sparse.hstack to do it faster:

您可以使用scipy.sparse。vstack或scipy.sparse。hstack加快速度:

from scipy.sparse import csr_matrix, vstack, hstack

B = csr_matrix((5, 2), dtype=int)
C = csr_matrix((5, 2), dtype=int)
D = csr_matrix((10, 10), dtype=int)

B2 = vstack((B, C))
#<10x2 sparse matrix of type '<type 'numpy.int32'>'
#        with 0 stored elements in COOrdinate format>

hstack((B2, D))
#<10x12 sparse matrix of type '<type 'numpy.int32'>'
#        with 0 stored elements in COOrdinate format>

Note that the output is a coo_matrix, which can be efficiently converted to the CSR or CSC formats.

注意,輸出是一個coo_matrix,它可以有效地轉換為CSR或CSC格式。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2014/10/20/729857fdca0b1c594da2a2723b69bb76.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com