[翻译]  Non-greedy regex quantifier gives greedy result

[CHINESE]  非贪婪的正则表达式量词给出了贪心的结果


I have a .net regex which I am testing using Windows Powershell. The output is as follows:

我有一个.net正则表达式,我正在使用Windows Powershell进行测试。输出如下:

> [System.Text.RegularExpressions.Regex]::Match("aaa aaa bbb", "aaa.*?bbb")


Groups   : {aaa aaa bbb}
Success  : True
Captures : {aaa aaa bbb}
Index    : 0
Length   : 11
Value    : aaa aaa bbb

My expectation was that using the ? quantifier would cause the match to be aaa bbb, as the second group of a's is sufficient to satisfy the expression. Is my understanding of non-greedy quantifiers flawed, or am I testing incorrectly?

我的期望是使用?量词将导致匹配为aaa bbb,因为a的第二组足以满足表达式。我对非贪婪量词的理解有缺陷,还是我测试不正确?

Note: this is plainly not the same problem as Regular Expression nongreedy is greedy

注意:这与正则表达式明显不同,nongreedy是贪婪的

4 个解决方案

#1


5  

This is a common misunderstanding. Lazy quantifiers do not guarantee the shortest possible match. They only make sure that the current quantifier, from the current position, does not match more characters than needed for an overall match.

这是一个常见的误解。懒惰量词不保证尽可能短的匹配。它们只确保当前位置的当前量词与总体匹配所需的字符数不匹配。

If you truly want to ensure the shortest possible match, you need to make that explicit. In this case, this means that instead of .*?, you want a subregex that matches anything that is neither aaa nor bbb. The resulting regex will therefore be

如果你真的想确保尽可能短的匹配,你需要明确说明。在这种情况下,这意味着代替。* ?,你需要一个匹配任何既不是aaa也不是bbb的子规则。由此产生的正则表达式将是

aaa(?:(?!aaa|bbb).)*bbb

#2


5  

Compare the result for the string aaa aaa bbb bbb:

比较字符串aaa aaa bbb bbb的结果:

regex: aaa.*?bbb 
result: aaa aaa bbb

regex: aaa.*bbb
result: aaa aaa bbb bbb

The regex engine finds first occurrence of aaa and then skips all characters (.*?) until first occurrence of bbb, but for the greedy operator (.*) it will go on to find a larger result and therefore match the last occurrence of bbb.

正则表达式引擎首先发现aaa,然后跳过所有字符(。*?)直到第一次出现bbb,但是对于贪婪的运算符(。*),它会继续查找更大的结果,因此匹配最后一次出现的bbb 。

#3


1  

This is not a greedy/lazy problem. The problem comes to the fact that your string is analysed from left to right. When the first aaa is matched, the regex engine add characters one by one to have the complete pattern.

这不是一个贪婪/懒惰的问题。问题在于从左到右分析您的字符串。当第一个aaa匹配时,正则表达式引擎逐个添加字符以获得完整的模式。

Note that with a greedy behaviour, in your example, you obtain the same result: the first aaa is matched, the regex engine take all the last characters and backtrack character by character until having the complete match.

请注意,对于贪婪的行为,在您的示例中,您获得相同的结果:第一个aaa匹配,正则表达式引擎获取所有最后的字符并逐个字符地回溯,直到完全匹配。

#4


0  

Well it's really simple, we have the following string

嗯,这很简单,我们有以下字符串

aaa aaa bbb

aaa aaa bbb

Let's see we have this regex aaa.*?bbb. The regex engine will start with aaa

让我们看看我们有这个正则表达式aaa。*?bbb。正则表达式引擎将以aaa开头

aaa aaa bbb

aaa aaa bbb

The regex engine has now .*?bbb. It will proceed with the space

正则表达式引擎现在有。*?bbb。它将继续进行空间

aaa space aaa bbb

aaa space aaa bbb

but we still have some characters until bbb ? So the regex engine will continue it's way and match the second set of a

但是我们还有一些角色直到bbb?所以正则表达式引擎将继续它的方式并匹配第二组a

aaa aaa space bbb

aaa aaa space bbb

Finally the regex engine will match bbb:

最后,正则表达式引擎将匹配bbb:

aaa aaa bbb

aaa aaa bbb


So let's see, if we only want to match the second aaa we could use the following regex:

所以让我们看看,如果我们只想匹配第二个aaa,我们可以使用以下正则表达式:

(?<!^)aaa.*?bbb, this means to match aaa that is not at the beginning of the sentence.

(?<!^)aaa。*?bbb,这意味着匹配不在句子开头的aaa。

We may also use aaa(?= bbb).*?bbb, this means to match aaa that is followed by space bbb.

我们也可以使用aaa(?= bbb)。*?bbb,这意味着匹配aaa后跟空格bbb。

See it working 1 - 2.

看它工作1 - 2。

Just came to my senses, but why don't you directly use aaa bbb ?

刚刚感觉到,但你为什么不直接使用aaa bbb?


注意!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。



 
© 2014-2018 ITdaan.com 粤ICP备14056181号