Windows Phone Developers

Sunday, June 1, 2008

Extract Ref Links From WebPage using VB.Net Regular Expressions


Extract Links From WebPage using VB.Net Regular Expressions

Sub Extract_Links_From_WebPage()

Dim oReg As Regex

Dim oMat As Match

Dim sInputString As String

Dim sLink As String

sInputString = "some have links that direct to some html files

"

oReg = New Regex("href\s*=\s*(?:""(?<>[^""]*)(?<>\S+))", RegexOptions.Compiled Or RegexOptions.IgnoreCase)

oMat = oReg.Match(sInputString)

While oMat.Success

sLink = oMat.Groups("link").ToString

End While

End Sub

The above code uses Group class, which represents the results from a single capturing group. Because Group can capture zero, one, or more strings in a single match (using quantifiers), it contains a collection of Capture objects. Because Group inherits from Capture, the last substring captured can be accessed directly (the Group instance itself is equivalent to the last item of the collection returned by the Captures property).

Instances of Group are returned by indexing the GroupCollection object returned by the Groups property. The indexer can be a group number or the name of a capture group if the "(?< groupname >)" grouping construct is used. For example, in C# code you can use Match.Groups[groupnum] or Match.Groups["groupname"], or in Visual Basiccode you can use Match.Groups(groupnum) or Match.Groups("groupname").

In the above example ?<>[^""]*stores the match found by [^""]* pattern in the group ‘link’, Which can be accessed by Groups("link")

See Also
Extract Ref Links From WebPage using VB.Net Regular Expressions
Remove HTML Tags from String using .NET Regular Expressions
VB.NET Regular Expression to Check URL
VB.NET Regular Expression to Check Email Addresses
VB.NET Regular Expression to Check MAC Address
Regular Expression to Check Zip Code
Validate eMail Addresses using VB.NET Function
Regular Expressions in Dot Net (.NET)

Digg Technorati Delicious StumbleUpon Reddit BlinkList Furl Mixx Facebook Google Bookmark Yahoo
ma.gnolia squidoo newsvine live netscape tailrank mister-wong blogmarks slashdot spurl StumbleUpon

3 comments:

  1. Good article.

    I found a sample script at http://www.biterscripting.com/SS_URLs.html . This script extracts Ref Links from web page.

    They have other sample scripts for extracting information from the internet automatically .

    The scripts are written in biterScripting (http://www.biterscripting.com for free download). One can take a look at their sample scripts and translate them into other scripting languages if necessary.

    Patrick Mc

    ReplyDelete
  2. Utterly confusing - where is the path to the webpage in this? As is, it returns an exception when executed in vb2010.

    ReplyDelete