正則表達式以匹配外括號

[英]Regular Expression to match outer brackets


I need a regular expression to select all the text between two outer brackets.

我需要一個正則表達式來選擇兩個外括號之間的所有文本。

Example: some text(text here(possible text)text(possible text(more text)))end text

示例:一些文本(這里的文本(可能的文本)文本(可能的文本(更多的文本))

Result: (text here(possible text)text(possible text(more text)))

結果:(這里的文本(可能的文本)文本(可能的文本(更多的文本)))

I've been trying for hours, mind you my regular expression knowledge isn't what I'd like it to be :-) so any help will be gratefully received.

我已經試了好幾個小時了,請注意,我的常規表達知識並不是我想要的:-)所以任何幫助都會得到感激。

14 个解决方案

#1


110  

Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.

正則表達式是錯誤的工具,因為您正在處理嵌套結構,例如遞歸。

But there is a simple algorithm to do this, which I described in this answer to a previous question.

但是有一個簡單的算法可以做到這一點,我在之前的一個問題的答案中描述過。

#2


66  

You can use regex recursion:

您可以使用regex遞歸:

\(([^()]|(?R))*\)

#3


55  

I want to add this answer for quickreference. Feel free to update.

我想為quickreference添加這個答案。隨時更新。


.NET Regex using balancing groups.

使用平衡組。

\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)

Where c is used as the depth counter.

其中c作為深度計數器。

Demo at Regexstorm.com

演示Regexstorm.com


PCRE using a recursive pattern.

使用遞歸模式的PCRE。

\((?>[^)(]+|(?R))*\)

Demo at regex101; Or without alternation:

在regex101演示;或沒有變更:

\((?>[^)(]*(?R)?)*\)

Demo at regex101. The pattern is pasted at (?R) which represents (?0).

在regex101演示。模式粘貼在(?R)處,表示(?0)。

Perl, PHP, Notepad++, R: perl=TRUE, Python: Regex package with (?V1) for Perl behaviour.

Perl, PHP, Notepad++, R: Perl =TRUE, Python: Regex包,用於Perl行為(?V1)。


Ruby using subexpression calls.

Ruby使用子表達式調用。

With Ruby 2.0 \g<0> can be used to call full pattern.

使用Ruby 2.0 \g<0>可以用來調用full pattern。

\((?>[^)(]+|\g<0>)*\)

Demo at Rubular; Ruby 1.9 only supports capturing group recursion:

在Rubular演示;Ruby 1.9只支持捕獲組遞歸:

(\((?>[^)(]+|\g<1>)*\))

Demo at Rubular  (atomic grouping since Ruby 1.9.3)

Ruby 1.9.3以來的原子分組演示


JavaScript  API :: XRegExp.matchRecursive

JavaScript API::XRegExp.matchRecursive

XRegExp.matchRecursive(str, '\\(', '\\)', 'g');

JS, Java and other regex flavors without recursion up to 2 levels of nesting:

JS、Java和其他regex風味,不遞歸地嵌套最高可達2層:

\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)

Demo at regex101. Deeper nesting needs to be added to pattern.
To fail faster on unbalanced parenthesis drop the + quantifier.

在regex101演示。需要向模式中添加更深層次的嵌套。為了在不平衡括號上更快地失敗,刪除+量詞。


Java: An interesting idea using forward references by @jaytea.

Java: @jaytea轉發引用的有趣想法。


Reference - What does this regex mean?

參考-這個regex是什么意思?

#4


25  

[^\(]*(\(.*\))[^\)]*

[^\(]* matches everything that isn't an opening bracket at the beginning of the string, (\(.*\)) captures the required substring enclosed in brackets, and [^\)]* matches everything that isn't a closing bracket at the end of the string. Note that this expression does not attempt to match brackets; a simple parser (see dehmann's answer) would be more suitable for that.

[^ \]*匹配的一切並不是一個開放架在字符串的開始,(\(. * \))捕獲所需的子字符串括在括號,和[^ \]*匹配一切不是括號的字符串。注意,這個表達式不嘗試匹配括號;一個簡單的解析器(參見dehmann的答案)更適合這種情況。

#5


14  

(?<=\().*(?=\))

If you want to select text between two matching parentheses, you are out of luck with regular expressions. This is impossible(*).

如果您想在兩個匹配的圓括號之間選擇文本,那么使用正則表達式就不太合適了。這是不可能的(*)。

This regex just returns the text between the first opening and the last closing parentheses in your string.

這個regex只是返回字符串中第一個開始和最后一個結束括號之間的文本。


(*) Unless your regex engine has features like balancing groups or recursion. The number of engines that support such features is slowly growing, but they are still not a commonly available.

(*)除非您的regex引擎具有平衡組或遞歸等特性。支持這些特性的引擎數量正在緩慢增長,但它們仍然不是普遍可用的。

#6


11  

It is actually possible to do it using .NET regular expressions, but it is not trivial, so read carefully.

使用。net正則表達式實際上是有可能實現的,但是這並不簡單,所以請仔細閱讀。

You can read a nice article here. You also may need to read up on .NET regular expressions. You can start reading here.

你可以在這里讀一篇好文章。您可能還需要閱讀。net正則表達式。你可以從這里開始閱讀。

Angle brackets <> were used because they do not require escaping.

使用尖括號<>是因為它們不需要轉義。

The regular expression looks like this:

正則表達式是這樣的:

<
[^<>]*
(
    (
        (?<Open><)
        [^<>]*
    )+
    (
        (?<Close-Open>>)
        [^<>]*
    )+
)*
(?(Open)(?!))
>

#7


3  

This is the definitive regex:

這是最終的regex:

\(
(?<arguments> 
(  
  ([^\(\)']*) |  
  (\([^\(\)']*\)) |
  '(.*?)'

)*
)
\)

Example:

例子:

input: ( arg1, arg2, arg3, (arg4), '(pip' )

output: arg1, arg2, arg3, (arg4), '(pip'

note that the '(pip' is correctly managed as string. (tried in regulator: http://sourceforge.net/projects/regulator/)

注意,'(pip) '被正確地作為字符串管理。(在監管機構:http://sourceforge.net/projects/regulator/)

#8


2  

I have written a little javascript library called balanced to help with this task, you can accomplish this by doing

我編寫了一個名為balanced的javascript庫來幫助完成這項任務,您可以通過這樣做來完成這項任務

balanced.matches({
    source: source,
    open: '(',
    close: ')'
});

you can even do replacements

你甚至可以做替換。

balanced.replacements({
    source: source,
    open: '(',
    close: ')',
    replace: function (source, head, tail) {
        return head + source + tail;
    }
});

heres a more complex and interactive example JSFiddle

下面是一個更復雜的交互式示例JSFiddle

#9


2  

The regular expression using Ruby (version 1.9.3 or above):

使用Ruby的正則表達式(版本1.9.3或以上):

/(?<match>\((?:\g<match>|[^()]++)*\))/

Demo on rubular

演示在rubular

#10


2  

This answer explains the theoretical limitation of why regular expressions are not the right tool for this task.

這個答案解釋了為什么正則表達式不是這個任務的合適工具的理論局限性。


Regular expressions can not do this.

正則表達式不能這樣做。

Regular expressions are based on a computing model known as Finite State Automata (FSA). As the name indicates, a FSA can remember only the current state, it has no information about the previous states.

正則表達式基於一種稱為有限狀態自動機的計算模型。顧名思義,FSA只記得當前的狀態,它沒有關於前幾個州的信息。

FSA

In the above diagram, S1 and S2 are two states where S1 is the starting and final step. So if we try with the string 0110 , the transition goes as follows:

在上面的圖中,S1和S2是兩個狀態,其中S1是開始和最后一步。如果我們嘗試用弦0110,過渡是這樣的:

      0     1     1     0
-> S1 -> S2 -> S2 -> S2 ->S1

In the above steps, when we are at second S2 i.e. after parsing 01 of 0110, the FSA has no information about the previous 0 in 01 as it can only remember the current state and the next input symbol.

在上述步驟中,當我們在第二個S2(即解析0110的01之后)時,FSA沒有關於01中的前一個0的信息,因為它只能記住當前狀態和下一個輸入符號。

In the above problem, we need to know the no of opening parenthesis; this means it has to be stored at some place. But since FSAs can not do that, a regular expression can not be written.

在上面的問題中,我們需要知道開括號的no;這意味着它必須存儲在某個地方。但是由於FSAs不能這樣做,所以不能編寫正則表達式。

However, an algorithm can be written to achieve the goal. Algorithms are generally falls under Pushdown Automata (PDA). PDA is one level above of FSA. PDA has an additional stack to store something. PDAs can be used to solve the above problem, because we can 'push' the opening parenthesis in the stack and 'pop' them once we encounter a closing parenthesis. If at the end, stack is empty, then opening parenthesis and closing parenthesis matches. Otherwise not.

但是,可以編寫一個算法來實現這個目標。算法通常屬於下推自動機(PDA)。PDA比FSA高一級。PDA有一個額外的堆棧來存儲一些東西。PDAs可以用於解決上述問題,因為我們可以在堆棧中“推”開括號,並在遇到閉括號時“彈出”它們。如果在末尾,堆棧為空,則打開括號和關閉括號匹配。否則不。

A detailed discussion can be found here.

詳細的討論可以在這里找到。

#11


1  

The answer depends on whether you need to match matching sets of brackets, or merely the first open to the last close in the input text.

答案取決於您是需要匹配括號集合,還是只需要在輸入文本中從第一個打開到最后一個關閉。

If you need to match matching nested brackets, then you need something more than regular expressions. - see @dehmann

如果您需要匹配嵌套的括號,那么您需要的不僅僅是正則表達式。——看到@dehmann

If it's just first open to last close see @Zach

如果是第一次打開到最后一次關閉,請查看@Zach

Decide what you want to happen with:

決定你想要發生什么:

abc ( 123 ( foobar ) def ) xyz ) ghij

You need to decide what your code needs to match in this case.

您需要決定在這種情況下您的代碼需要匹配什么。

#12


1  

so you need first and last parenthess, use smth like this str.indexOf('('); - it will give you first occurance str.lastIndexOf(')'); - last one

所以你需要第一個和最后一個括號,像這樣使用smth。indexof ('(');-它將為您提供第一次出現的str.lastIndexOf(')';——最后一個

so u need string between, String searchedString = str.substring(str1.indexOf('('),str1.lastIndexOf(')');

所以u需要string between, string searchedString = string .substring(str1.indexOf('),str1.lastIndexOf(')');

#13


1  

Here is a customizable solution allowing single character literal delimiters in Java:

這里有一個可定制的解決方案,允許在Java中使用單個字符文字分隔符:

public static List<String> getBalancedSubstrings(String s, Character markStart, 
                                 Character markEnd, Boolean includeMarkers) 

{
        List<String> subTreeList = new ArrayList<String>();
        int level = 0;
        int lastOpenDelimiter = -1;
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (c == markStart) {
                level++;
                if (level == 1) {
                    lastOpenDelimiter = (includeMarkers ? i : i + 1);
                }
            }
            else if (c == markEnd) {
                if (level == 1) {
                    subTreeList.add(s.substring(lastOpenDelimiter, (includeMarkers ? i + 1 : i)));
                }
                if (level > 0) level--;
            }
        }
        return subTreeList;
    }
}

Sample usage:

示例用法:

String s = "some text(text here(possible text)text(possible text(more text)))end text";
List<String> balanced = getBalancedSubstrings(s, '(', ')', true);
System.out.println("Balanced substrings:\n" + balanced);
// => [(text here(possible text)text(possible text(more text)))]

#14


0  

"""
Here is a simple python program showing how to use regular
expressions to write a paren-matching recursive parser.

This parser recognises items enclosed by parens, brackets,
braces and <> symbols, but is adaptable to any set of
open/close patterns.  This is where the re package greatly
assists in parsing. 
"""

import re


# The pattern below recognises a sequence consisting of:
#    1. Any characters not in the set of open/close strings.
#    2. One of the open/close strings.
#    3. The remainder of the string.
# 
# There is no reason the opening pattern can't be the
# same as the closing pattern, so quoted strings can
# be included.  However quotes are not ignored inside
# quotes.  More logic is needed for that....


pat = re.compile("""
    ( .*? )
    ( \( | \) | \[ | \] | \{ | \} | \< | \> |
                           \' | \" | BEGIN | END | $ )
    ( .* )
    """, re.X)

# The keys to the dictionary below are the opening strings,
# and the values are the corresponding closing strings.
# For example "(" is an opening string and ")" is its
# closing string.

matching = { "(" : ")",
             "[" : "]",
             "{" : "}",
             "<" : ">",
             '"' : '"',
             "'" : "'",
             "BEGIN" : "END" }

# The procedure below matches string s and returns a
# recursive list matching the nesting of the open/close
# patterns in s.

def matchnested(s, term=""):
    lst = []
    while True:
        m = pat.match(s)

        if m.group(1) != "":
            lst.append(m.group(1))

        if m.group(2) == term:
            return lst, m.group(3)

        if m.group(2) in matching:
            item, s = matchnested(m.group(3), matching[m.group(2)])
            lst.append(m.group(2))
            lst.append(item)
            lst.append(matching[m.group(2)])
        else:
            raise ValueError("After <<%s %s>> expected %s not %s" %
                             (lst, s, term, m.group(2)))

# Unit test.

if __name__ == "__main__":
    for s in ("simple string",
              """ "double quote" """,
              """ 'single quote' """,
              "one'two'three'four'five'six'seven",
              "one(two(three(four)five)six)seven",
              "one(two(three)four)five(six(seven)eight)nine",
              "one(two)three[four]five{six}seven<eight>nine",
              "one(two[three{four<five>six}seven]eight)nine",
              "oneBEGINtwo(threeBEGINfourENDfive)sixENDseven",
              "ERROR testing ((( mismatched ))] parens"):
        print "\ninput", s
        try:
            lst, s = matchnested(s)
            print "output", lst
        except ValueError as e:
            print str(e)
    print "done"

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2009/02/13/c3eacaaf16673efd35546ce4c2e2fb2a.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com