为什么boost regex耗尽了堆栈空间?

[英]Why did boost regex run out of stack space?


#include <boost/regex.hpp>
#include <string>
#include <iostream>

using namespace boost;
static const regex regexp(
        "std::vector<"
            "(std::map<"
                   "(std::pair<((\\w+)(::)?)+, (\\w+)>,?)+"
             ">,?)+"
        ">");

std::string errorMsg =
"std::vector<"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">,"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">"
">";
int main()
{
    smatch result;
    if(regex_match(errorMsg, result, regexp))
    {  
        for (unsigned i = 0; i < result.size(); ++i)
        {  
            std::cout << result[i] << std::endl;
        }
    }

//    std::cout << errorMsg << std::endl;

    return 0;
}

this produces:

这产生:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error>
>'   what():  Ran out of stack space trying to match the regular expression.

compiled with

编译和

g++ regex.cc -lboost_regex

EDIT

编辑

my platform:

我的平台:

g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
libboost-regex1.42
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
So the latest Ubuntu 64 bit

2 个解决方案

#1


13  

((\\w+)(::)?)+ is one of the so called "pathological" regular expressions -- it's going to take exponential time, because you have two expressions which are dependent upon each other one right after the other. That is, it fails due to catastrophic backtracking.

(\ w+)(:) +是一种所谓的“病态”正则表达式——它需要指数级的时间,因为有两个表达式相互依赖,一个接着一个。也就是说,由于灾难性的回溯,它失败了。

Consider if we follow the example of the link, and reduce "something more complicated" to "x". Let's do that with \\w:

考虑一下我们是否遵循链接的示例,将“更复杂的东西”简化为“x”。让我们用\\w:

  • ((x+)(::)?)+
  • ((x +)(::)?)+

Let's also assume that our input is never going to have ::. Having this actually makes the regex more complex, so if we throw out complexity then we really should be making things simpler if nothing else:

我们还假设我们的输入永远不会有::。这样做实际上会使regex变得更复杂,所以如果我们抛开复杂性,那么我们真的应该让事情变得更简单,如果没有其他的东西:

  • (x+)+
  • (x +)+

Now you've got a textbook nested quantifier problem like that detailed in the link above.

现在你有了一个教科书嵌套量词问题,就像上面链接中详细描述的那样。

There are a few ways to fix this but the simplest way is probably to just disallow backtracking on the inner match using the atomic group modifier "(?>":

有几种方法可以解决这个问题,但最简单的方法可能是使用原子组修饰符“(?>)”来禁止对内部匹配进行回溯:

  • ((?>\\w+)(::)?)+
  • ((? > \ \ w +)(::)?)+

#2


1  

Tested it locally and it worked fine, my guess is your compiler is doing something weird.

我猜你的编译器在做一些奇怪的事情。

What version of gcc? what platform? which version of boost?

什么版本的gcc ?什么平台?哪个版本的提高?

 -> ./regex
std::vector<std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>,std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>>
std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>
std::pair<Test::Test, int>
Test
Test
::
int

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2010/12/21/71250b7cf013bdf1baf4208fbb4669aa.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com