這是時間復雜度是O(n ^ 2)?

[英]Is this time complexity actually O(n^2)?


I am working on a problem out of CTCI.

我正在研究CTCI的一個問題。

The third problem of chapter 1 has you take a string such as

第1章的第三個問題是你使用了一個字符串,比如。

'Mr John Smith '

約翰·史密斯先生的

and asks you to replace the intermediary spaces with %20:

並要求您將中間空格替換為%20:

'Mr%20John%20Smith'

“史密斯% 20約翰% 20”

The author offers this solution in Python, calling it O(n):

作者用Python給出了這個解決方案,稱之為O(n):

def urlify(string, length):
    '''function replaces single spaces with %20 and removes trailing spaces'''
    counter = 0
    output = ''
    for char in string:
        counter += 1
        if counter > length:
            return output
        elif char == ' ':
            output = output + '%20'
        elif char != ' ':
            output = output + char
    return output

My question:

我的問題:

I understand that this is O(n) in terms of scanning through the actual string from left to right. But aren't strings in Python immutable? If I have a string and I add another string to it with the + operator, doesn't it allocate the necessary space, copy over the original, and then copy over the appending string?

我知道這是O(n)從左到右掃描整個弦。但是Python中的字符串不是不可變的嗎?如果我有一個字符串,我用+運算符給它添加了另一個字符串,它不是分配了必要的空間嗎?

If I have a collection of n strings each of length 1, then that takes:

如果我有一個n個長度為1的字符串的集合,那么它取:

1 + 2 + 3 + 4 + 5 + ... + n = n(n+1)/2

1 + 2 + 3 + 4 + 5 +…+ n = n(n + 1)/ 2

or O(n^2) time, yes? Or am I mistaken in how Python handles appending?

或O(n ^ 2)時間,是嗎?還是我在Python如何處理附加的問題上弄錯了?

Alternatively, if you'd be willing to teach me how to fish: How would I go about finding this out for myself? I've been unsuccessful in my attempts to Google an official source. I found https://wiki.python.org/moin/TimeComplexity but this doesn't have anything on strings.

或者,如果你願意教我如何釣魚:我該如何去發現這一點呢?我試圖通過官方渠道來傳播這些信息,但沒有成功。我找到了https://wiki.python.org/moin/timecomple散漫,但是這里面沒有任何字符串。

4 个解决方案

#1


68  

In CPython, the standard implementation of Python, there's an implementation detail that makes this usually O(n), implemented in the code the bytecode evaluation loop calls for + or += with two string operands. If Python detects that the left argument has no other references, it calls realloc to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it's an implementation detail and because if realloc ends up needing to move the string frequently, performance degrades to O(n^2) anyway.

在Python的標准實現CPython中,有一個實現細節,使這個通常為O(n)的實現在代碼中實現,字節碼計算循環調用+或+=,並使用兩個字符串操作數。如果Python檢測到左邊的參數沒有其他引用,它會調用realloc,試圖通過調整字符串大小來避免復制。這不是你應該依賴,因為它是一個實現細節,因為如果realloc最終需要經常移動字符串,性能降低O(n ^ 2)。

Without the weird implementation detail, the algorithm is O(n^2) due to the quadratic amount of copying involved. Code like this would only make sense in a language with mutable strings, like C++, and even in C++ you'd want to use +=.

沒有奇怪的實現細節,算法是O(n ^ 2)由於二次所涉及的復制工作量。像這樣的代碼只在具有可變字符串的語言中才有意義,比如c++,甚至在c++中,您都希望使用+=。

#2


32  

The author relies on an optimization that happens to be here, but is not explicitly dependable. strA = strB + strC is typically O(n), making the function O(n^2). However, it is pretty easy to make sure it the whole process is O(n), use an array:

作者依靠的優化碰巧在這里,但不是顯式可靠。箍= strB + strC通常是O(n),使O(n ^ 2)的函數。但是,很容易確定整個過程是O(n),使用數組:

output = []
    # ... loop thing
    output.append('%20')
    # ...
    output.append(char)
# ...
return ''.join(output)

In a nutshell, the append operation is amortized O(1), (although you can make it strong O(1) by pre-allocating the array to the right size), making the loop O(n).

簡而言之,append操作被平攤為O(1),(盡管可以通過將數組預分配到正確的大小來使其成為強壯的O(1))),從而形成循環O(n)。

And then the join is also O(n), but that's okay because it is outside the loop.

然后連接也是O(n)但是沒關系因為它在循環之外。

#3


23  

I found this snippet of text on Python Speed > Use the best algorithms and fastest tools:

我在Python Speed >上發現了這段文本,使用了最好的算法和最快的工具:

String concatenation is best done with ''.join(seq) which is an O(n) process. In contrast, using the '+' or '+=' operators can result in an O(n^2) process because new strings may be built for each intermediate step. The CPython 2.4 interpreter mitigates this issue somewhat; however, ''.join(seq) remains the best practice

字符串連接最好使用“.join(seq),這是一個O(n)進程。相比之下,使用“+”或“+ =”操作符會導致一個O(n ^ 2)的過程,因為新字符串可能為每一個中間步驟。CPython 2.4解釋器在一定程度上緩解了這個問題;然而,“.join(seq)仍然是最佳實踐

#4


0  

For future visitors: Since it is a CTCI question, any reference to learning urllib package is not required here, specifically as per OP and the book, this question is about Arrays and Strings.

對於未來的訪問者:因為這是一個CTCI問題,所以這里不需要任何關於學習urllib包的參考,特別是根據OP和這本書,這個問題是關於數組和字符串的。

Here's a more complete solution, inspired from @njzk2's pseudo:

這里有一個更完整的解決方案,靈感來自@njzk2的偽代碼:

text = 'Mr John Smith'#13 
special_str = '%20'
def URLify(text, text_len, special_str):
    url = [] 
    for i in range(text_len): # O(n)
        if text[i] == ' ': # n-s
            url.append(special_str) # append() is O(1)
        else:
            url.append(text[i]) # O(1)

    print(url)
    return ''.join(url) #O(n)


print(URLify(text, 13, '%20'))

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2015/11/30/3b50b4ff55e1e306b28e0ba019cf69fc.html



 
  © 2014-2022 ITdaan.com 联系我们: