.NET regex捕獲的不是預期的順序

[英].NET regex captures not in expected order


In .NET, regex is not organizing captures as I would expect. (I won't call this a bug, because obviously someone intended it. However, it's not how I'd expect it to work nor do I find it helpful.)

在。net中,regex並沒有像我預期的那樣組織捕獲。(我不會稱它為bug,因為顯然是有人想要它。)然而,這並不是我所期望的效果,我也不覺得它有用。

This regex is for recipe ingredients (simplified for sake of example):

這個regex是用於配方成分(為了示例而簡化):

(?<measurement>           # begin group
  \s*                     # optional beginning space or group separator
  (
    (?<integer>\d+)|      # integer
    (
      (?<numtor>\d+)      # numerator
      /
      (?<dentor>[1-9]\d*) # denominator. 0 not allowed
    )
  )
  \s(?<unit>[a-zA-Z]+)
)+                        # end group. can have multiple

My string: 3 tbsp 1/2 tsp

我的弦:3湯匙半茶匙

Resulting groups and captures:

導致的團體和截圖:

[measurement][0]=3 tbsp
[measurement][1]= 1/2 tsp
[integer][0]=3
[numtor][0]=1
[dentor][0]=2
[unit][0]=tbsp
[unit][1]=tsp

[測量][0]=3 tbsp[測量][1]= 1/2 tsp[整數][0]=3 [numtor][0]=1[0]=2[單元][0]=tbsp[單元][1]=tsp。

Notice how even though 1/2 tsp is in the 2nd Capture, it's parts are in [0] since these spots were previously unused.

注意,即使在第2個捕獲中有1/2 tsp,它的部分仍然在[0]中,因為這些點以前沒有使用過。

Is there any way to get all of the parts to have predictable useful indexes without having to re-run each group through the regex again?

有什么方法可以讓所有的部分都具有可預測的有用索引,而不必重新運行每個組,通過regex嗎?

3 个解决方案

#1


1  

Is there any way to get all of the parts to have predictable useful indexes without having to re-run each group through the regex again?

有什么方法可以讓所有的部分都具有可預測的有用索引,而不必重新運行每個組,通過regex嗎?

Not with Captures. And if you're going to perform multiple matches anyway, I suggest you remove the + and match each component of the measurement separately, like so:

而不是捕獲。如果你要執行多個匹配,我建議你移除+並分別匹配測量的每個分量,比如:

  string s = @"3 tbsp 1/2 tsp";

  Regex r = new Regex(@"\G\s* # anchor to end of previous match
    (?<measurement>           # begin group
      (
        (?<integer>\d+)       # integer
      |
        (
          (?<numtor>\d+)      # numerator
          /
          (?<dentor>[1-9]\d*) # denominator. 0 not allowed
        )
      )
      \s+(?<unit>[a-zA-Z]+)
    )                         # end group.
  ", RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture);

  foreach (Match m in r.Matches(s))
  {
    for (int i = 1; i < m.Groups.Count; i++)
    {
      Group g = m.Groups[i];
      if (g.Success)
      {
        Console.WriteLine("[{0}] = {1}", r.GroupNameFromNumber(i), g.Value);
      }
    }
    Console.WriteLine("");
  }

output:

輸出:

[measurement] = 3 tbsp
[integer] = 3
[unit] = tbsp

[measurement] = 1/2 tsp
[numtor] = 1
[dentor] = 2
[unit] = tsp

The \G at the beginning ensures that matches occur only at the point where the previous match ended (or at the beginning of the input if this is the first match attempt). You can also save the match-end position between calls, then use the two-argument Matches method to resume parsing at that same point (as if that were really the beginning of the input).

開始時的\G確保匹配只發生在前一個匹配結束的地方(如果這是第一次匹配嘗試,則在輸入的開始)。您還可以保存調用之間的配對結束位置,然后使用雙參數匹配方法在同一點上恢復解析(就好像這是輸入的開始一樣)。

#2


1  

Seems like you probably need to loop through the input, matching one measurement at a time. Then you would have predictable access to the parts of that measurement, during the loop iteration for that measurement.

似乎您可能需要對輸入進行循環,每次匹配一個度量值。然后,在該測量的循環迭代過程中,您可以對該度量的部分進行可預測的訪問。

#3


-1  

Having a look at this....here's a couple of suggestions that might help improve the regexp

在看看這個....下面是一些可能有助於改進regexp的建議

(?<measurement>           # begin group
  \s*                     # optional beginning space or group separator
  (
    (?<integer>\d+)\.?|   # integer
    (
      (?<numtor>\d+)      # numerator
      /
      (?<dentor>[1-9]\d*) # denominator. 0 not allowed
    )
  )
  \s(?<unit>[a-zA-Z]+)
)+                        # end group. can have multiple
  • The regex is expecting a space at the start.... after the measurement tag....
  • 正則表達式是期待一個空間開始....后測量標記....
  • (?<integer>\d+) I would try \s? instead of \. to capture the whitespace as that is escaping the full-stop and would be expecting a full-stop to appear somewhere..
  • (? \d+)我會試試\s?而不是\。要捕獲空白,因為它正在從full-stop中逃逸,並期望在某個地方出現一個full-stop。
  • Escape the / like this to make it as a literal \/
  • 轉義/像這樣使它成為一個字面的\/
  • What's the | separator for? that's making two exclusively mutual parts - either a 'integer' or a 'numtor' with a 'dentor'... that part looks confusing...
  • |分隔符是做什么用的?這就產生了兩個完全相互的部分——要么是“整數”,要么是“numtor”和“dentor”……這部分看起來混亂…

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2010/08/20/725de883366a6dd2e13e7986fd980db4.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com