### 這是時間復雜度是O(n ^ 2)?

#### [英]Is this time complexity actually O(n^2)?

I am working on a problem out of CTCI.

The third problem of chapter 1 has you take a string such as

`'Mr John Smith '`

and asks you to replace the intermediary spaces with `%20`:

`'Mr%20John%20Smith'`

“史密斯% 20約翰% 20”

The author offers this solution in Python, calling it O(n):

``````def urlify(string, length):
'''function replaces single spaces with %20 and removes trailing spaces'''
counter = 0
output = ''
for char in string:
counter += 1
if counter > length:
return output
elif char == ' ':
output = output + '%20'
elif char != ' ':
output = output + char
return output
``````

My question:

I understand that this is O(n) in terms of scanning through the actual string from left to right. But aren't strings in Python immutable? If I have a string and I add another string to it with the `+` operator, doesn't it allocate the necessary space, copy over the original, and then copy over the appending string?

If I have a collection of `n` strings each of length 1, then that takes:

`1 + 2 + 3 + 4 + 5 + ... + n = n(n+1)/2`

1 + 2 + 3 + 4 + 5 +…+ n = n(n + 1)/ 2

or O(n^2) time, yes? Or am I mistaken in how Python handles appending?

Alternatively, if you'd be willing to teach me how to fish: How would I go about finding this out for myself? I've been unsuccessful in my attempts to Google an official source. I found https://wiki.python.org/moin/TimeComplexity but this doesn't have anything on strings.

## 4 个解决方案

### #1

68

In CPython, the standard implementation of Python, there's an implementation detail that makes this usually O(n), implemented in the code the bytecode evaluation loop calls for `+` or `+=` with two string operands. If Python detects that the left argument has no other references, it calls `realloc` to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it's an implementation detail and because if `realloc` ends up needing to move the string frequently, performance degrades to O(n^2) anyway.

Without the weird implementation detail, the algorithm is O(n^2) due to the quadratic amount of copying involved. Code like this would only make sense in a language with mutable strings, like C++, and even in C++ you'd want to use `+=`.

### #2

32

The author relies on an optimization that happens to be here, but is not explicitly dependable. `strA = strB + strC` is typically `O(n)`, making the function `O(n^2)`. However, it is pretty easy to make sure it the whole process is `O(n)`, use an array:

``````output = []
# ... loop thing
output.append('%20')
# ...
output.append(char)
# ...
return ''.join(output)
``````

In a nutshell, the `append` operation is amortized `O(1)`, (although you can make it strong `O(1)` by pre-allocating the array to the right size), making the loop `O(n)`.

And then the `join` is also `O(n)`, but that's okay because it is outside the loop.

### #3

23

I found this snippet of text on Python Speed > Use the best algorithms and fastest tools:

String concatenation is best done with `''.join(seq)` which is an `O(n)` process. In contrast, using the `'+'` or `'+='` operators can result in an `O(n^2)` process because new strings may be built for each intermediate step. The CPython 2.4 interpreter mitigates this issue somewhat; however, `''.join(seq)` remains the best practice

### #4

0

For future visitors: Since it is a CTCI question, any reference to learning urllib package is not required here, specifically as per OP and the book, this question is about Arrays and Strings.

Here's a more complete solution, inspired from @njzk2's pseudo:

``````text = 'Mr John Smith'#13
special_str = '%20'
def URLify(text, text_len, special_str):
url = []
for i in range(text_len): # O(n)
if text[i] == ' ': # n-s
url.append(special_str) # append() is O(1)
else:
url.append(text[i]) # O(1)

print(url)
return ''.join(url) #O(n)

print(URLify(text, 13, '%20'))
``````

#### 注意！

© 2014-2022 ITdaan.com 联系我们： 