Sanibel Logic LLC

...Scalable Technologiesfor the Enterprise

Regular Expression Syntax Testing and Validation

Another common question I get some from licensees of SSLRedirect, HttpCompressionAgent and nUrlRewriter is how to construct meaningful Regular Expressions which will match an incoming URL for the licensee's intended conditions.   Of course, it is my intention to use Regular Expressions within these products because Regular Expressions are an effective tool for testing a broad range of conditions.  However, it is not my intention to educate you on Regular Expression syntax.  I have found a number of Internet sources for such Regular Expression syntax knowledge....simply by Googling "Regular Expressions".

I did develop a Test Regular Expression web page, which allows you to key in an incoming string, followed by a Regular Expression string to see if the Regular Expression will match the incoming string.  CLICK HERE to use this Regular Expression test web page.

The Test Regular Expression web page is illustrated below:

RETest

Regular Expression String Encoding

Sanibel Logic has a number of products which utilize Regular Expressions.  One common question asked by licensees is how to encode Regular Expression strings within the web.config file.  Because the web.config conforms to rules for HTML encoding, any such string must first be HTML encoded, as provided by the .NET System.Web.HttpUtility.HtmlEncode method.  After the string is HTML encoded then we must consider Regular Expression special characters, which must be escaped and include the following special characters:

  • "\" - Backslash
  • "^" - Carat
  • "." - Period
  • "[" - Left bracket
  • "$" - Dollar sign
  • "(" - Left parenthesis
  • ")" - Right parenthesis
  • "|" - Pipe
  • "*" - Asterisk
  • "+" - Plus
  • "?" - Question mark
  • "{" - Left curly bracket

SSLRedirect uses Regular Expression values within the web.config SSLRedirect configuration section, within the urlsIn and urlsOut Xml tags.

HttpCompressionAgent also uses Regular Expression values within its web.config HttpCompression configuration section, within the mimeTypes, urls, assemblies and urlReferrers Xml tags.

To shortcut the process of browsing the Internet, so one can determine how to HTML encode and properly allow for Regular Expression special characters, I have written a basic web transaction which will transform an input Regular Expression string.   Click HERE to try the Regular Expression transformation web transaction.

Sample output from the Regular Expression Transformation web transaction is included for your benefit below...

Regular Expression Transformation Web Transaction

A New Generation of URL Rewriters ?

I have always had problems finding a suitable ASP.NET URL Rewriter that fully meets my needs, so about one year ago, after much frustration with attempting to follow third party open-source code and related poor documentation, I decided to write my own. I have since repackaged this URL Rewriter/ Redirector into the name of nUrlRewriter Version 2.  nUrlRewriter has been posted on popular download sites, such as codeplex.com and code.msdn.microsoft.com   The nUrlRewriter Version 2 open source VS.NET 2008 project can also be downloaded from our site by clicking HERE.

nUrlRewriter is a ASP.NET Http Module written in managed C# code. nUrlRewriter examines incoming Http requests and applies user defined criteria which may result in a Http request being redirected or rewritten. Web pages within existing web sites are often archived or retired, however many Internet based hyperlinks may exist for such web pages. nUrlRewriter solves this problem by providing a facility which can easily redirect or rewrite such Http requests to other web site web pages or web applications. For example, a discontinued product web page may be redirected to a general product category web page. nUrlRewriter differentiates itself from other redirectors/rewriters in that nUrlRewriter also supports the IIS7 Integrated Pipeline, enabling nUrlRewriter to redirect/rewrite any incoming web application URL supported by the IIS7 web server, such as but not limited to native HTML applications (htm, html), classic ASP applications (asp), PHP applications (php) as well as ASP.NET (aspx) applications.

Incoming Http requests which are redirected are returned to the originating browser with a status code of either 301 (permanent) or 302 (temporary) to indicate that the requested web page has been moved to a new target URL provided to the browser. the browser will then issue a new Http request for the new URL. Http status code 301 indicates that the URL has been permanently moved and the browser should use the new URL in any new Http requests. Http status code 302 indicates that the URL has been temporarily moved and the browser should use the new URL only for the outstanding Http request.

Incoming Http requests which are rewritten are rewritten to a different URL location within IIS. Since the originating browser is not informed of the URL rewrite, the browser URL address bar will continue to display the originating URL before the URL rewrite.

nUrlRewriter works equally as well with IIS5 and IIS6.

Background

Clearly, with an all new internal architecture and with new enriched Remote Administration and Feature Delegation enhancements, IIS7 opens up a new generation of web hosting and demands a new generation of support infrastructure tools as well; such as Url Rewriters/ Redirectors.

nUrlRewriter not only works well in an IIS5 or IIS6 environment, but nUrlRewriter takes advantage of the new IIS7 Integrated ASP.NET Pipeline to simplify communications between native HTML applications (.htm, .html), classic ASP (.asp), PHP applications (.php) and ASP.NET (.aspx).

nUrlRewriter can be used for the following:

  1. To redirect/rewrite in case of retired or archived web pages.
  2. To rewrite from a friendly URL format to an internal URL format.
  3. To redirect to secondary folder locations; for example language specific web site content may reside in separate folders, or a single IP address may be hosting multiple web sites, with the domain specific content residing in sub-folders.
  4. To provide for URL specific robots.txt files.  For example if a single IP address is supporting multiple web sites, then web site specific robots.txt files can be returned when requested.

 

Redirecting Incoming Http Requests

WebRedirection_6

 

Rewriting Incoming Http Requests

WebRewriting[2]

 

Using with IIS Version 5 or 6

nUrlRewriter executes in the form of a Http Module, so nUrlRewriter must be defined in the web.config <system.web> <httpModules> configuration section, as illustrated below:

   1:    <system.web>
   2:      <httpModules>
   3:        <add name="nUrlRewriter"
   4:             type="nUrlRewriter.HttpModule, 
   5:                    nUrlRewriter,
   6:                    Version=2.0.0.0,
   7:                    Culture=neutral,
   8:                    PublicKeyToken=741b921e11e02781"/>      
   9:      </httpModules>
  10:    </system.web>
 

To configure specific nUrlRewriter options, the nUrlRewriter web.config configuration section must now be declared, as illustrated below:

   1:    <configSections>
   2:      <section name="nUrlRewriter" 
   3:               type="nUrlRewriter.Configuration2.Configuration, 
   4:                      nUrlRewriter,
   5:                      Version=2.0.0.0,
   6:                      Culture=neutral,
   7:                      PublicKeyToken=741b921e11e02781"/>
   8:    </configSections>

Once the nUrlRewriter Http Module is defined and the nUrlRewriter configuration section is declared, the nUrlRewriter specific configuration section must be defined within the web.config.

I am currently supporting two web sites; my corporate web site -  www.sanibellogic.com and my personal web site -  www.plippard.com.  The entry point for both sites is at the root level (wwwRoot), with a common IP address.  Any future additional web sites being hosted will also share the same IP address. The root level of my web site redirects based on domain name, and then more specific redirect/ rewrite logic is applied at the sub-folder (or actual web site sub-folder) level.  Essentially, at the wwwRoot level, sanibellogic.com domains are redirected to the “/SL” sub-folder, and plippard.com domains are redirected to the “/PGL” sub-folder. Because these sites are currently being hosted on IIS6, IIS7 specific features are not currently used. IIS7 specific features will be discussed later in this article.

The following exhibit shows the root level nUrlRewriter configuration section for achieving the above discussed domain redirection:

   1:  <?xml version="1.0" encoding="utf-8" ?>
   2:  <nUrlRewriter xmlns="http://schemas.sanibellogic.com/nUrlRewriter/config/2/0/0/0"
   3:                enabled="true"
   4:                trace="false">
   5:    <urls>
   6:      <clear/>
   7:   
   8:      <add name="RuleSanibelLogic"
   9:           action="redirect"
  10:           ignoreCase="true" 
  11:           redirectType="temporary"
  12:           transformType="RegExReplace"
  13:           fromScope="absolute"
  14:           from="^http(?&lt;SSL&gt;[s]?)://(www.)?sanibellogic.com/(?&lt;WebPage&gt;(.*))$"
  15:           to="http${SSL}://www.sanibellogic.com/sl/${WebPage}" />
  16:   
  17:      <add name="RulePLippard"
  18:           action="redirect"
  19:           ignoreCase="true"
  20:           redirectType="temporary"
  21:           transformType="RegExReplace"
  22:           fromScope="absolute"
  23:           from="^http(?&lt;SSL&gt;[s]?)://(www.)?plippard.com/(?&lt;WebPage&gt;(.*))$"
  24:           to="http${SSL}://www.plippard.com/pgl/${WebPage}" />
  25:   
  26:    </urls>
  27:  </nUrlRewriter>

The above configuration section exhibit primarily examines the incoming domain name, transforms the absence of a sub-domain to "www." and then redirects the incoming request to a defined ASP.NET application sub-folder (either /SL or /PGL) based on whether the domain is "sanibellogic.com" or "plippard.com". All sub-folder or query string content following the domain name is also appended to the redirected request.

Once the incoming request is intercepted and redirected to the proper ASP.NET application sub-folder, nUrlRewriter (operating in the context of the /SL or /PGL sub-folder) will again have an opportunity to intercept and redirect or rewrite the new incoming request, .  The exhibit below shows the ASP.NET application folder level  configuration section used by nUrlRewriter when operating in the context of the /SL sub-folder:

   1:  <nUrlRewriter xmlns="http://schemas.sanibellogic.com/nUrlRewriter/config/2/0/0/0"
   2:                enabled="true"
   3:                trace="false">
   4:    
   5:    <urls>
   6:      <clear/>
   7:   
   8:      <!-- Rewrite one time DownloadMe.aspx root level DownloadMe page to new location,
   9:            Note: rewriteType="transferrequest" requires IIS7 Integrated Pipeline mode -->
  10:      <add name="RuleDownloadMe"
  11:           action="rewrite"
  12:           rewriteType="rewritePath"
  13:           ignoreCase="true"
  14:           transformType="RegExReplace"
  15:           from="~/DownloadMe.aspx(?&lt;QStrings&gt;(.*))$"
  16:           to="~/Common/S/DownloadMe.aspx${QStrings}" />
  17:   
  18:      <!-- Rewrite ECartDownload.aspx web page to new location,
  19:            Note: rewriteType="transferrequest" requires IIS7 Integrated Pipeline mode -->
  20:      <add name="RuleECartDownload"
  21:           action="rewrite"
  22:           rewriteType="rewritePath"
  23:           ignoreCase="true"
  24:           transformType="RegExReplace"
  25:           from="~/Common/ECartDownload.aspx(?&lt;QStrings&gt;(.*))$"
  26:           to="~/Common/S/ECartDownload.aspx${QStrings}" />
  27:   
  28:      <!-- Rewrite PaypalIPN web page to new location,
  29:            Note: rewriteType="transferrequest" requires IIS7 Integrated Pipeline mode -->
  30:      <add name="RulePaypalIPN"
  31:           action="rewrite"
  32:           rewriteType="rewritePath"
  33:           ignoreCase="true"
  34:           transformType="RegExReplace"
  35:           from="~/Common/PaypalIPN.aspx(?&lt;QStrings&gt;(.*))$"
  36:           to="~/Common/S/PaypalIPN.aspx${QStrings}" />
  37:   
  38:      <!-- Redirect one time Products.aspx root level Products page to new location -->
  39:      <add name="RuleProducts"
  40:           action="redirect"
  41:           ignoreCase="true"
  42:           redirectType="permanent"
  43:           transformType="RegExReplace"
  44:           from="~/Products.aspx(?&lt;QStrings&gt;(.*))$"
  45:           to="~/Common/Products.aspx${QStrings}" />
  46:   
  47:      <!-- Redirect discontinued Products.aspx DotNetNuke SSLRedirect queries -->
  48:      <add name="RuleDotNetNuke00200"
  49:           action="redirect"
  50:           ignoreCase="true"
  51:           redirectType="permanent"
  52:           transformType="RegExReplace"
  53:           from="~/Common/Products.aspx\?Cat=DotNetNuke&amp;PLong=00200$"
  54:           to="~/Common/Products.aspx?Cat=ASP.NET&amp;PLong=00401" />
  55:   
  56:      <!-- Redirect discontinued Products.aspx DotNetNuke SSLRedirect SDK queries -->
  57:      <add name="RuleDotNetNuke00201"
  58:           action="redirect"
  59:           ignoreCase="true"
  60:           redirectType="permanent"
  61:           transformType="RegExReplace"
  62:           from="~/Common/Products.aspx\?Cat=DotNetNuke&amp;PLong=00201$"
  63:           to="~/Common/Products.aspx?Cat=ASP.NET&amp;PLong=00402" />
  64:   
  65:      <!-- Redirect discontinued Products.aspx DotNetNuke SmartNews queries -->
  66:      <add name="RuleDotNetNuke00202"
  67:           action="redirect"
  68:           ignoreCase="true"
  69:           redirectType="permanent"
  70:           transformType="RegExReplace"
  71:           from="~/Common/Products.aspx\?Cat=DotNetNuke&amp;PLong=00202$"
  72:           to="~/Common/Products.aspx?Cat=ASP.NET" />
  73:   
  74:      <!-- Redirect discontinued Products.aspx DotNetNuke SmartNews SDK queries -->
  75:      <add name="RuleDotNetNuke00203"
  76:           action="redirect"
  77:           ignoreCase="true"
  78:           redirectType="permanent"
  79:           transformType="RegExReplace"
  80:           from="~/Common/Products.aspx\?Cat=DotNetNuke&amp;PLong=00203$"
  81:           to="~/Common/Products.aspx?Cat=ASP.NET" />
  82:   
  83:      <!-- Redirect discontinued Products.aspx DotNetNuke HttpCompressionAgent queries -->
  84:      <add name="RuleDotNetNuke00204"
  85:           action="redirect"
  86:           ignoreCase="true"
  87:           redirectType="permanent"
  88:           transformType="RegExReplace"
  89:           from="~/Common/Products.aspx\?Cat=DotNetNuke&amp;PLong=00204$"
  90:           to="~/Common/Products.aspx?Cat=ASP.NET&amp;PLong=00400" />
  91:   
  92:      <!-- Redirect discontinued Products.aspx DotNetNuke product category -->
  93:      <add name="RuleDotNetNuke"
  94:           action="redirect"
  95:           ignoreCase="true"
  96:           redirectType="permanent"
  97:           transformType="RegExReplace"
  98:           from="~/Common/Products.aspx\?Cat=DotNetNuke$"
  99:           to="~/Common/Products.aspx?Cat=ASP.NET" />
 100:      
 101:    </urls> 
 102:   
 103:  </nUrlRewriter>

As can be seen from the above nUrlRewriter configuration sections, nUrlRewriter can be easily extended without code source changes, simply by  defining the actions required in the configuration section.

Using with IIS Version 7 (Windows Server 2008 and Vista)

Once released, I fully expect to take advantage of the much publicized and anticipated Windows Server 2008 IIS7 features. When one plans on utilizing the more advanced IIS7 features, and one’s web application is defined to IIS7 as executing in “Integrated ASP.NET Pipeline” mode then nUrlRewriter is defined in a slightly different manner within the web.config. The nUrlRewriter Http Module is now defined within the new  <system.webserver> configuration section, illustrated below:

   1:    <system.webServer>
   2:      <modules>
   3:        <add name="nUrlRewriter"
   4:             type="nUrlRewriter.HttpModule, 
   5:                    nUrlRewriter,
   6:                    Version=2.0.0.0,
   7:                    Culture=neutral,
   8:                    PublicKeyToken=741b921e11e02781"/>
   9:      </modules>    
  10:    </system.webServer>

The nUrlRewriter configuration section continues to be declared and defined as with IIS5/6, illustrated as follows:

   1:    <configSections>
   2:      <section name="nUrlRewriter"
   3:               type="nUrlRewriter.Configuration2.Configuration, 
   4:                      nUrlRewriter,
   5:                      Version=2.0.0.0,
   6:                      Culture=neutral,
   7:                      PublicKeyToken=741b921e11e02781"/>
   8:    </configSections>

One of the nUrlRewriter configuration options available with IIS7 is the ability to rewrite/ redirect between any web application supported by the IIS7 web server, such as but not limited to native HTML applications (htm, html), classic ASP applications (asp), PHP applications (php) as well as ASP.NET (aspx) applications. To achieve this objection (for example), one would configure the nUrlRewriter configuration section as follows:

   1:  <nUrlRewriter xmlns="http://schemas.sanibellogic.com/nUrlRewriter/config/2/0/0/0"
   2:                              enabled="true"
   3:                              trace="false">
   4:   
   5:      <topLevelExtensions>
   6:          <clear/>
   7:          <add extension="asp" />
   8:          <add extension="aspx" />
   9:          <add extension="htm" />
  10:          <add extension="html" />
  11:          <add extension="php" />
  12:      </topLevelExtensions>
  13:   
  14:      <urls>
  15:          <clear/>
  16:   
  17:          <!-- Redirect WordPress blog URL, which was a sub-folder within an ASP.NET App
  18:                      to new ASP.NET BlogEngine.Net sub-folder -->
  19:          <add name="RuleBlog"
  20:                   action="redirect"
  21:                   from="~/support/wpblog/(.+)"
  22:                   to="~/support/beblog/" />
  23:      </urls>
  24:   
  25:  </nUrlRewriter>

Please note that the topLevelExtensions tag defines the candidate web page extensions, required for PHP application visibility in this example.

With IIS7, another common use of nUrlRewriter will be support for multiple robots.txt files.   The robots.txt file is ordinarily required to be at the root level of a web site. The following nUrlRewriter configuration section illustrates rewriting a web request for a root level robots.txt file to a sub-folder resident robots.txt, making its sub-folder location transparent to the invoking search engine. The to attribute MUST be transformed into a relative URL location, which starts with "~/", because both the System.Web.HttpServerUtility.TransferRequest and System.Web.HttpContext.RewritePath methods require a relative URL path. Please note that the "txt" extension must also be included within the topLevelExtensions collection.

   1:    <nUrlRewriter xmlns="http://schemas.sanibellogic.com/nUrlRewriter/config/2/0/0/0"
   2:                  enabled="true"
   3:                  trace="false">
   4:   
   5:          <topLevelExtensions>
   6:              <clear/>
   7:                  <add extension="asp" />
   8:                  <add extension="aspx" />
   9:                  <add extension="htm" />
  10:                  <add extension="html" />
  11:                  <add extension="php" />
  12:                  <add extension="txt" />
  13:   
  14:          </topLevelExtensions>
  15:          
  16:          <urls>
  17:              <clear/>
  18:   
  19:              <!-- Based on incoming domain of SanibelLogic.com, rewrite to a -->
  20:              <!-- sub-folder location for the robots.txt file -->
  21:              <add name="RuleSanibelLogicRobots"
  22:                      action="rewrite"
  23:                      rewriteType="transferRequest"
  24:                      fromScope="absolute"
  25:                      from="^http(?&lt;SSL&gt;[s]?)://(www.)?sanibellogic.com/(?&lt;robotsFile&gt;(robots.txt))$"
  26:                      to="~/SL/$(robotsFile)" />
  27:   
  28:              <!-- Based on incoming domain of PLippard.com, rewrite to a -->
  29:              <!-- sub-folder location for the robots.txt file -->
  30:              <add name="RulePLippardRobots"
  31:                      action="rewrite"
  32:                      rewriteType="transferRequest"
  33:                      fromScope="absolute"
  34:                      from="^http(?&lt;SSL&gt;[s]?)://(www.)?plippard.com/(?&lt;robotsFile&gt;(robots.txt))$"
  35:                      to="~/PGL/$(robotsFile)" />
  36:   
  37:          </urls>
  38:    </nUrlRewriter>

A URL rewriting/ redirecting web component is essential for effective management of multiple as well as single web sites. nUrlRewriter utilizes new and advanced IIS7 Integrated ASP.NET Pipeline features to ensure that all web applications, such as but not limited to native HTML applications (htm, html), classic ASP applications (asp), PHP applications (php), and ASP.NET applications (aspx) are integrated in an effective manner.    nUrlRewriter also utilizes regular expressions and a flexible configuration design to ensure ease of expansion without source code changes.  nUrlRewriter documentation in .chm help file form is available with the VS.NET 2008 source code project download.

Click HERE to download the nUrlRewriter open source VS.NET 2008 project.