[英]Python: Most Efficient Way to Split and Analyze String

I know from other answers that using string.split() over Regex will be a lot faster with short strings. A particular script I am working on requires me to split a string on a delimiter and analyze every split a particular way depending on how many delimiters it is after. For example, my string may look like abd-123-32-few-333-1212 this case, the delimiter is a '-'. Depending on the situation, I will either need a few or all of the splits (aka, for this example, I may need 32 and 333 and I will need to know that the 32 is after the 2nd dash and the 33 after the 4th).

我从其他的答案中知道,使用string.split()在Regex上使用短字符串会快得多。我正在处理的一个特定脚本要求我在一个分隔符上分割一个字符串,并根据它后面有多少个分隔符以特定的方式分析每个分割。例如,我的字符串可能看起来像abd - 123 - 32 -数- 333 - 1212等....在这种情况下,分隔符是“-”。根据具体情况,我将需要一些或全部的分划(在本例中,我可能需要32和333,我需要知道32在第2个破折号之后,33在第4个破折号之后)。

My current solution is to split on the delimiter and just iterate through that list for the values I need. What I am curious about is if there is a better/faster way to do this. I have already confirmed that Regex is slower and does not meet my needs right out of the box...any other suggestions?


1 个解决方案



This might solve your problem, though since you didn't post your code, I have no idea if it's similar or different or slower or faster:


s = 'abd-123-32-few-333-1212'
check = {'32', '333'}
s = s.split('-')
print(dict((y, x) for (x, y) in enumerate(s) if y in check))

This prints:


{'32': 2, '333': 4}

FWIW, you should probably run code like this inside a function rather than in the main body of a script, because local lookups are much faster than global ones.





© 2014-2019 粤ICP备14056181号