使用正則表達式查找任意長度的連續塊

[英]Using regex to find arbitrary length consecutive blocks


I have a string containing ones and zeroes. I want to determine if there are substrings of 1 or more characters that are repeated at least 3 consecutive times. For example, the string '000' has a length 1 substring consisting of a single zero character that is repeated 3 times. The string '010010010011' actually has 3 such substrings that each are repeated 3 times ('010', '001', and '100').

我有一個包含1和0的字符串。我想確定是否有至少連續3次重復的1個或多個字符的子字符串。例如,字符串'000'的長度為1個子字符串,由一個重復3次的零字符組成。字符串'010010010011'實際上有3個這樣的子串,每個子串重復3次('010','001'和'100')。

Is there a regex expression that can find these repeating patterns without knowing either the specific pattern or the pattern's length? I don't care what the pattern is nor what its length is, only that the string contains a 3-peat pattern.

是否有正則表達式可以在不知道特定模式或模式長度的情況下找到這些重復模式?我不關心模式是什么,也不關心它的長度是什么,只是字符串包含3-peat模式。

3 个解决方案

#1


2  

(.+)\1\1

The \ might be a different charactor depending on your language choice. This means match any string then try to match it again twice more.

\可能是一個不同的字符取決於您的語言選擇。這意味着匹配任何字符串,然后嘗試再次匹配它兩次。

The \1 means repeat the 1st match.

\ 1表示重復第一場比賽。

#2


3  

Here's something that might work, however, it will only tell you if there is a pattern repeated three times, and (I don't think) can't be extended to tell you if there are others:

這可能有用,但是,它只會告訴你是否有一個模式重復三次,並且(我認為)不能擴展告訴你是否還有其他模式:

     /(.+).*?\1.*?\1/

Breaking that out:

打破這一點:

   (.+)          matches any 1 or more characters, starting anywhere in the string
   .*?           allows any length of interposing other characters (0 or more)
   \1            matches whatever was captured by the (...+) parentheses
   .*?           0 or more of anything
   \1            the original pattern, again

If you want the repetitions to occur immediately adjacent, then instead use

如果您希望重復緊鄰,則使用

     /(.+)\1\1/

… as suggested by @Buh Buh — the \1 vs. $1 notation may vary, depending on your regexp system.

...正如@Buh Buh所建議的那樣,\ 1對1美元符號可能會有所不同,具體取決於您的正則表達式系統。

#3


0  

it looks weird, but this could be the solution:

它看起來很奇怪,但這可能是解決方案:

/000000000|100100100|010010010|001001001|110110110|011011011|101101101|111111111/

This contains all possible combinations for three times. So your regular expression will match for these numbers (i.e.):

這包含三次所有可能的組合。因此,您的正則表達式將匹配這些數字(即):

  1. 10010010011
  2. 00010010011
  3. 10110110110

But not for these:

但不是這些:

  1. 101010101010
  2. 001110111110
  3. 111000111000

And it doesn't matter where the sequence appears in the whole string.

並且序列在整個字符串中出現的位置無關緊要。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2011/12/28/72534c5f44e69d60d41865a8161febc4.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com