正則表達式——提取子域和域[重復]

[英]Regular Expression - Extract subdomain & domain [duplicate]


This question already has an answer here:

這個問題已經有了答案:

I'm trying to form a regular expression (javascript/node.js) which will extract the sub-domain & domain part from any given URL. This is what I ended up with:

我正在嘗試形成一個正則表達式(javascript/node.js),它將從任何給定的URL中提取子域和域部分。這就是我最后得出的結論:

[^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)

Right now, I'm just considering http, https for protocol & exclude "www." portion from the subdomain+domain portion of an URL. I checked the expression & it almost works. But, here is the issue:

現在,我只考慮http、https作為協議,並將“www.”部分從URL的子域+域部分中排除。我檢查了表達式&它幾乎可以工作。但問題是:

Success

成功

'http://mplay.google.co.in/sadfask/asdkfals?dk=10'.match(/[^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)/i)

'http://lplay.google.co.in/sadfask/asdkfals?dk=10'.match(/[^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)/i)

Failure

失敗

'http://play.google.co.in/sadfask/asdkfals?dk=10'.match(/[^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)/i)

'http://tplay.google.co.in/sadfask/asdkfals?dk=10'.match(/[^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)/i)

I just use the first element from the result array. I'm not able to understand why "play." & "tplay." doesn't work. Could anyone please help me in this regard?

我只使用結果數組中的第一個元素。我不能理解為什么“玩”。&“tplay。”是行不通的。在這方面誰能幫助我?

Does "/p" and "/t" have any meaning for the regular expression evaluator?

“/p”和“/t”對正則表達式求值器有意義嗎?

Is there any other way of extracting sub-domain & domain from any given URL using a regular expression?

使用正則表達式從任何給定URL中提取子域和域是否還有其他方法?

Edit -

編輯-

Example:

例子:

https://play.google.com/store/apps/details?id=com.skgames.trafficracer => play.google.com

https://play.google.com/store/apps/details?id=com.skgames.trafficracer = > play.google.com

https://mail.google.com/mail/u/0/#inbox => mail.google.com

https://mail.google.com/mail/u/0/收件箱= > mail.google.com

5 个解决方案

#1


42  

Your regex doesn't seem correct. Try this regex:

你的正則表達式似乎不正確。試試這個正則表達式:

/^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)/im

RegEx Demo

#2


5  

You are about the one millionth person to try to parse URLs in JavaScript. I'm a little bit surprised you didn't see any of the existing questions on SO dating back years. The last thing you want to do is write yet another broken regexp, with all due respect to those that provided answers to your question.

您大概是第100萬人嘗試解析JavaScript中的url。讓我有點驚訝的是,在這么多年前你沒有看到任何現存的問題。您最不希望做的事情是編寫另一個已損壞的regexp,完全尊重為您的問題提供答案的那些人。

There are many well documented libraries and approaches to handling this. Google it. The simplest way is to create an a element in memory, assign it an href, and then access its hostname and other properties. See http://tutorialzine.com/2013/07/quick-tip-parse-urls/. If that does not float your boat, then use a library like uri.js.

有許多很好的文檔化庫和處理方法。谷歌它。最簡單的方法是在內存中創建一個元素,給它分配一個href,然后訪問它的主機名和其他屬性。見http://tutorialzine.com/2013/07/quick-tip-parse-urls/。如果不能使您的船漂浮,那么使用uri.js這樣的庫。

If you really don't want to use a library, and insist on reinventing the wheel, then at least do something like the following:

如果你真的不想使用圖書館,並堅持重新發明輪子,那么至少做以下的事情:

function get_domain_from_url(url) {
    var a = document.createElement('a').
    a.setAttribute('href', url);
    return a.hostname;
}

Essentially, you are delegating the extraction of the subdomain/domain part of the URL to the browser's URL parsing logic, which is MUCH better than anything you will ever write.

從本質上說,您正在將URL的子域/域部分的提取委托給瀏覽器的URL解析邏輯,這比您所編寫的任何內容都要好得多。

Also see Parse URL with jquery/ javascript?, Parse URL with Javascript, How do I parse a URL into hostname and path in javascript?, or parse URL with JavaScript or jQuery. How did you miss those? Sorry, I have to vote to close this as a duplicate.

還可以看到jquery/ javascript的解析URL嗎?,使用Javascript解析URL,如何將URL解析為Javascript中的主機名和路徑?或使用JavaScript或jQuery解析URL。你怎么會錯過這些呢?不好意思,我得投票把這關了。

#3


3  

Here's a solution ignoring everything before ://

這里有一個解決方案,它忽略了之前的一切://

.*\://?([^\/]+)

Incase you want to ignore www.

如果你想忽略www。

.*\://(?:www.)?([^\/]+)

#4


3  

The same RegExp as in anubhava's accepted answer, only added support for protocol-relative URLs like //google.com:

與anubhava所接受的答案相同,只是增加了對協議相關url(如/google.com)的支持:

/^(?:https?:)?(?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)/im

RegEx Demo

#5


1  

Your regex expression works pretty well. You only need to remove the brackets. The final expression is:

你的正則表達式很好用。您只需刪除括號。最后一個表達式是:

^(?:http:\/\/|www\.|https:\/\/)([^\/]+)

Hope it's useful!

希望它是有用的!


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2014/09/06/729a7c860a5bf4450333202c2f7dbcc2.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com