將Python regex匹配多次

[英]Python regex to match multiple times


I'm trying to match a pattern against strings that could have multiple instances of the pattern. I need every instance separately. re.findall() should do it but I don't know what I'm doing wrong.

我正在嘗試將一個模式與可能有多個模式實例的字符串進行匹配。我需要每個實例分開。findall()應該這樣做,但我不知道自己做錯了什么。

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)
match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')

I need 'http://url.com/123', http://url.com/456 and the two numbers 123 & 456 to be different elements of the match list.

我需要“http://url.com/123”、http://url.com/456和兩個數字123 & 456作為匹配列表的不同元素。

I have also tried '/review: ((http://url.com/(\d+)\s?)+)/' as the pattern, but no luck.

我也嘗試過“/review: (http://url.com/(\d+)\s?)+)/”作為模式,但沒有運氣。

3 个解决方案

#1


12  

Use this. You need to place 'review' outside the capturing group to achieve the desired result.

用這個。您需要在捕獲組之外放置“review”以實現所需的結果。

pattern = re.compile(r'(?:review: )?(http://url.com/(\d+))\s?', re.IGNORECASE)

This gives output

這使輸出

>>> match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')
>>> match
[('http://url.com/123', '123'), ('http://url.com/456', '456')]

#2


5  

You've got extra /'s in the regex. In python the pattern should just be a string. e.g. instead of this:

你在正則表達式中得到了額外的/'s。在python中,模式應該是一個字符串。例如,而不是這樣的:

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)

It should be:

應該是:

pattern = re.compile('review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

Also typically in python you'd actually use a "raw" string like this:

通常在python中,你會使用這樣的“原始”字符串:

pattern = re.compile(r'review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

The extra r on the front of the string saves you from having to do lots of backslash escaping etc.

在字符串前面的額外的r可以避免你不得不做大量的反斜杠轉義等等。

#3


0  

Use a two-step approach: First get everything from "review:" to EOL, then tokenize that.

使用兩步方法:首先獲取從“review:”到EOL的所有內容,然后進行標記。

msg = 'this is the message. review: http://url.com/123 http://url.com/456'

review_pattern = re.compile('.*review: (.*)$')
urls = review_pattern.findall(msg)[0]

url_pattern = re.compile("(http://url.com/(\d+))")
url_pattern.findall(urls)

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2013/07/01/55dfa06178483b8b4867aff74c515e6f.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com