正則表達式中“(?u)”的作用是什么?

[英]What does “(?u)” do in a regex?


I looked into how tokenization is implemented in scikit-learn and found this regex (source):

我研究了如何在scikit-learn中實現標記化並找到了這個正則表達式(源代碼):

token_pattern = r"(?u)\b\w\w+\b"

The regex is pretty straightforward but I have never seen the (?u) part before. Can someone explain me what this part is doing?

正則表達式非常簡單,但我以前從未見過(?u)部分。有人能解釋一下這部分是做什么的嗎?

1 个解决方案

#1


16  

It switches on the re.U (re.UNICODE) flag for this expression.

它打開此表達式的re.U(re.UNICODE)標志。

From the module documentation:

從模塊文檔:

(?iLmsux)

(?iLmsux)

(One or more letters from the set 'i', 'L', 'm', 's', 'u', 'x'.) The group matches the empty string; the letters set the corresponding flags: re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode dependent), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function.

(來自集合'i','L','m','s','u','x'的一個或多個字母。)該組匹配空字符串;字母設置相應的標志:re.I(忽略大小寫),re.L(依賴於語言環境),re.M(多行),re.S(點匹配所有),re.U(取決於Unicode),以及re.X(詳細),表示整個正則表達式。 (標志在模塊內容中描述。)如果您希望將標志包含在正則表達式的一部分中,而不是將標志參數傳遞給re.compile()函數,這將非常有用。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2016/01/27/720c7a1c1fdcff5d32d233a6fe2f501b.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com