Monday, October 31, 2011

Apache Output Rewrite/Filter

When performing migration or code upgrades on public facing website, it is a common practice to test the changes in a development environment.   Occasionally, due to various reasons, you are forced to reference the development instance of the site using the public domain name.

For example, if the public facing site in question is, and you want to setup a development instance in your lab, but want to be able to reference the site using the public DNS record, you have essentially two options: (1) Modify your DNS resolver (2) Modify your 'host' file.

The DNS option requires more effort but it is a better option if you have multiple developers, all using the same DNS server.

Host file trick is often used when you do not have a split-horizon DNS option or you do not control your DNS server.  Host file trick is the easiest and the most commonly used trick for such needs, however it comes with a bit of risk.  There is always the risk that a developer forgot to add the entry in his/her host file, and accidentally made changes to the production instance.  One simple trick to address such accidents is to modify the development site's header image or something similar.  However, in some applications which make extensive use of caching, sometimes this trick fails or leads to confusion.

Recently, I found myself on one such project (Drupal CMS) where we required not just a development instance, but also a QA instance.  To make matters worse, the developers needed to jump between production, development and QA instances multiple times a day.  I wanted to figure out a way to modify the web content to reflect the environment (dev or QA) without modifying the application PHP code.  Drupal has 3rd party extensions which can address this need, but I wanted to find a solution which was independent of the application (Drupal) and the server-side scripting technology (PHP).  I needed to make the modification at Apache level.

After doing some research, I quickly discovered Apache mod_ext_filter, a standard Apache module.  mod_ext_filter met all my criteria:

  1. Web application independent
  2. Server-side scripting technology independent
  3. Flexible and simple
  4. No 3rd party Apache modules or modifications to existing modules
  5. Performance should be acceptable
If you know of a better way, than the one proposed below, to accomplish these goals, please post a comment.

mod_ext_filter presents a simple and familiar programming model for filters. With this module, a program which reads from stdin and writes to stdout (i.e., a Unix-style filter command) can be a filter for Apache. This filtering mechanism is much slower than using a filter which is specially written for the Apache API and runs inside of the Apache server process, but it does have the following benefits:
  • the programming model is much simpler
  • any programming/scripting language can be used, provided that it allows the program to read from standard input and write to standard output
  • existing programs can be used unmodified as Apache filters

To use mod_ext_filter, enable it in your httpd.conf file by adding the following line:
LoadModule setenvif_module modules/
For my needs, my goal was to inject a line of text in the header and footer of each page to identify the environment (dev or QA).

Using ext_filter module, a custom stdout filter can be as simple as adding a call to 'sed' in the module configuration file (see "Using sed to replace text in the response" example).  The problem with this approach is that you need to restart Apache every time you made a change to your filter, which can get annoying very quickly.
A better approach is to call an external script, which can be written in any language of choice.  In addition to not having to restart Apache, this approach has the advantage of allowing for easier troubleshooting.

Here is my /etc/httpd/conf.d/ext_filter.conf calling an external script.
ExtFilterDefine banner mode=output intype=text/html \
cmd="/bin/sh /var/www/"
<Location />
        SetOutputFilter banner
 In my, I decided to perform a simple find/replace on the <body> and </body> tag.  To avoid any font and background color conflict issues, I decided to use uncommon colors for top and bottom banner.  Here is my

#!/bin/sh# Insert banner after and before the body opening and closing tags.
/bin/sed -r 's/(<body.*>$)/\1\<div align=center\>\<font size=4 color=#00FFFF\>Development Instance\<\/font\>\<div\>/1MI' | /bin/sed -r 's/\s*(<\/body.*>)/<div align=center\>\<font size=4 color=#00FF00\>Development Instance\<\/font\>\<div\&/1MI'

Notice, there are two sed find/replace happening, first one adds the header and second one adds the footer.  Here is a brief explanation of the script above:
-r: Use regular expressions

(<body.*>$)/\1: find the first instance of body tag.

\<div align=center\>\<font size=4 color=#00FFFF\>Development Instance\<\/font\>\<div\> : Fancy font work, with '\' for escaping special characters.

1MI: Stop after first find/replace, multi-line, and case insensitive

Escaping special characters makes the above script unreadable.  A better choice might be a Python script, especially if you plan to do something more elaborate, since Python can be compiled into object code.

The impact of page load performance is only noticeable on large pages using the proposed solution.  Beware, if the script has syntactic or other errors, the result is often a blank web page.  No amount of logging will reveal anything useful, and the only solution is the run the script independent of mod_ext_filter (e.g cat test.html | sh ).

Hope this helps!

No comments:

Post a Comment