<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:admin="http://webns.net/mvcb/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
   xmlns:wfw="http://wellformedweb.org/CommentAPI/"
   xmlns:content="http://purl.org/rss/1.0/modules/content/"
   xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule">
<channel>
    <title>blogZero - Perl</title>
    <link>http://loadaveragezero.com/app/s9y/</link>
    <description>Web Development Community News, Culture and Commentary, Tools and Techniques</description>
    <dc:language>en</dc:language>
    <admin:errorReportsTo rdf:resource="mailto:" />
    <generator>Serendipity 0.8.3 - http://www.s9y.org/</generator>
    
    <image>
        <url>http://loadaveragezero.com/img/blogzero.gif</url>
        <title>RSS: blogZero - Perl - Web Development Community News, Culture and Commentary, Tools and Techniques</title>
        <link>http://loadaveragezero.com/app/s9y/</link>
        <width>96</width>
        <height>52</height>
    </image>
<item>
    <title>W3C LogValidator</title>
    <link>http://loadaveragezero.com/app/s9y/index.php?/archives/151-W3C-LogValidator.html</link>
<category>Perl</category>    <comments>http://loadaveragezero.com/app/s9y/index.php?/archives/151-W3C-LogValidator.html#comments</comments>
    <wfw:comment>http://loadaveragezero.com/app/s9y/wfwcomment.php?cid=151</wfw:comment>
    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://loadaveragezero.com/app/s9y/rss.php?version=2.0&amp;type=comments&amp;cid=151</wfw:commentRss>
    <author>dwclifton@gmail.com (Douglas Clifton)</author>
    <content:encoded>
&lt;p&gt;
&lt;img src=&quot;http://loadaveragezero.com/img/w3c.png&quot; alt=&quot;w3c&quot; class=&quot;right&quot; title=&quot; W3C: Word Wide Web Consortium &quot; /&gt;
&lt;img src=&quot;http://loadaveragezero.com/img/fav/drx/CPAN.gif&quot; class=&quot;icon&quot; alt=&quot;book&quot; title=&quot; CPAN: Comprehensive Perl Archive Network &quot; /&gt;This article documents my experience installing, configuring, and using the &lt;a href=&quot;http://loadaveragezero.com/app/drx/Internet/WWW/Design_and_Development/Standards#w3c:logvalidator&quot; title=&quot; W3C::LogValidator &quot;&gt;W&lt;sub&gt;3&lt;/sub&gt;C LogValidator&lt;/a&gt;. Hopefully it will be useful to anyone new to this, and in particular those who are not comfortable installing &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/Perl&quot; title=&quot; Programming: Languages: Perl &quot;&gt;Perl&lt;/a&gt; &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/Perl#CPAN:org&quot; title=&quot; Comprehensive Perl Archive Network &quot;&gt;CPAN&lt;/a&gt; modules.
&lt;/p&gt;

&lt;h3&gt;Goals&lt;/h3&gt;

&lt;p&gt;When I set out on this project I wanted to satisfy the following criteria. First, determine the &lt;strong&gt;N&lt;/strong&gt; most popular documents where &lt;strong&gt;N&lt;/strong&gt; is some configurable number. And second, amongst those pages determine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the markup, be it &lt;a href=&quot;http://loadaveragezero.com/app/drx/Data_Formats/Markup_Languages/HTML&quot; title=&quot; Hypertext Markup Language &quot;&gt;HTML&lt;/a&gt; or &lt;a href=&quot;http://loadaveragezero.com/app/drx/Data_Formats/Markup_Languages/XHTML&quot; title=&quot; Extensible Hypertext Markup Language &quot;&gt;XHTML&lt;/a&gt;, valid?&lt;/li&gt;
&lt;li&gt;Are the &lt;a href=&quot;http://loadaveragezero.com/app/drx/Data_Formats/Style_Sheets/CSS&quot; title=&quot; Cascading Style Sheets &quot;&gt;CSS&lt;/a&gt; stylesheets valid?&lt;/li&gt;
&lt;li&gt;Do any of them have broken links?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As it turns out, all three of these criteria can be met with LogValidator and they correspond with the following modules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://search.cpan.org/~oliviert/W3C-LogValidator-1.3.1/lib/W3C/LogValidator/Basic.pm&quot; title=&quot; Sort Web server log entries by popularity &quot;&gt;LogValidator::Basic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://search.cpan.org/~oliviert/W3C-LogValidator-1.3.1/lib/W3C/LogValidator/HTMLValidator.pm&quot; title=&quot; Batch HTML validation &quot;&gt;LogValidator::HTML&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://search.cpan.org/~oliviert/W3C-LogValidator-1.3.1/lib/W3C/LogValidator/CSSValidator.pm&quot; title=&quot; Batch validation of CSS stylesheets &quot;&gt;LogValidator::CSSValidator &lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://search.cpan.org/~oliviert/W3C-LogValidator-1.3.1/lib/W3C/LogValidator/LinkChecker.pm&quot; title=&quot; Find broken links in Web pages &quot;&gt;LogValidator::LinkChecker&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both the HTML and CSS modules use the W3C validation services that most developers are familiar with, so there is no need to install &lt;a href=&quot;http://loadaveragezero.com/app/drx/Data_Formats/Markup_Languages/HTML#tidy:html&quot;&gt;HTML Tidy&lt;/a&gt; or any complicated &lt;a href=&quot;http://jigsaw.w3.org/css-validator/DOWNLOAD.html&quot; title=&quot; Download and install the CSS validator &quot;&gt;Java packages&lt;/a&gt;. For the LinkChecker module to function, you also need to install &lt;a href=&quot;http://search.cpan.org/~scop/W3C-LinkChecker-4.3/bin/checklink.pod&quot; title=&quot; Check the validity of links in an HTML or XHTML document &quot;&gt;checklink&lt;/a&gt; (which is a useful little utility on its own).&lt;/p&gt;

&lt;p&gt;All this is fine and dandy, but how do you view the results? I'll get to this in more detail below, but the
two most popular methods are to have LogValidator generate an HTML document, or send you an email&amp;#8212;which is handy if you want to set-up the system to run automatically using &lt;a href=&quot;http://loadaveragezero.com/app/s9y/index.php?/archives/131-Advanced-crontab-Tutorial.html&quot; title=&quot; Advanced crontab tutorial &quot;&gt;cron&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Installation&lt;/h3&gt;

&lt;p&gt;This is command-line land folks, so I'm hoping you are comfortable using a shell. If you are, and you do have any experience installing Perl CPAN modules, you should be all set. Otherwise you may consult a &lt;a href=&quot;http://loadaveragezero.com/txt/logvalidator.txt&quot; title=&quot; Installing W3C::LogValidator &quot;&gt;complete session&lt;/a&gt; of my own installation experience. Good luck with all that, it's kind of gory.&lt;/p&gt;

&lt;p&gt;First, you need the &lt;a href=&quot;http://search.cpan.org/~oliviert/W3C-LogValidator-1.3.1/lib/W3C/LogValidator.pm&quot;&gt;W3C::LogValidator&lt;/a&gt; module, which may or may not require some other prerequisite modules. Thankfully, the &lt;a href=&quot;http://search.cpan.org/~andk/CPAN-1.9301/lib/CPAN.pm&quot; title=&quot; CPAN -- Query, download and build Perl modules from CPAN sites &quot;&gt;CPAN&lt;/a&gt; module itself is smart enough to determine if anything is missing and automatically install them as well. I'm not going to get into installing this software in non-standard locations, so you are going to have to be root (aka super-user) to do most of what follows. Note that the `#' character prompt indicates the command is run as root. To get started, install LogValidator like so:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;# perl -MCPAN -e 'install W3C::LogValidator'&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I found that installing LinkChecker did not work using CPAN, so I used the old-school method of downloading the package (aka &quot;tarball&quot;), unpacking it in some temporary directory, and building it and installing it from there. Again, only the actual installation step requires you to be root, all others you are better off as an ordinary user (as in you). Note the change in shell prompt to the `&gt;' character.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/tmp/CPAN&amp;gt; tar xzvf W3C-LinkChecker-4.3.tar.gz&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;W3C-LinkChecker-4.3/&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;W3C-LinkChecker-4.3/...&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;/tmp/CPAN&amp;gt; cd W3C-LinkChecker-4.3&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;/tmp/CPAN/W3C-LinkChecker-4.3&amp;gt; perl Makefile.PL&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;/tmp/CPAN/W3C-LinkChecker-4.3&amp;gt; make&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;/tmp/CPAN/W3C-LinkChecker-4.3&amp;gt; make test&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I'm not showing you the output here because it gets a little messy. Again, consult my &lt;a href=&quot;http://loadaveragezero.com/txt/logvalidator.txt&quot;&gt;session&lt;/a&gt; if you are new to this. The final step is to install LinkChecker as root:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/tmp/CPAN/W3C-LinkChecker-4.3&amp;gt; su&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;Password:&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;# make install&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;...&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;# ^d&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now you should have all the software installed and can proceed to the configuration step. Note that the ^d above means Ctrl+D (EOF), which exits you from the root login (typing &quot;&lt;code&gt;exit&lt;/code&gt;&quot; also works).&lt;/p&gt;

&lt;h3&gt;Configuration&lt;/h3&gt;

&lt;p&gt;LogValidator comes with a sample configuration file that you need to copy to some directory of your choice. The sample will be in root's hidden .cron directory, but it's readable so exit from your root login and create this directory.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/tmp/CPAN/W3C-LinkChecker-4.3&gt; cd /var/www/mysite&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;/var/www/mysite&amp;gt; mkdir config&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;/var/www/mysite&amp;gt; cd config&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;/var/www/mysite/config&amp;gt; cp ~root/.cpan/build/W3C-LogValidator-1.3.1/samples/logprocess.conf .&lt;/code&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Pay attention to the LogValidator version number because this article can (and will) become obsolete at some point.&lt;/p&gt;

&lt;p&gt;Now you're ready to edit the configuration file and run the script. It is very similar to the &lt;a href=&quot;http://loadaveragezero.com/app/drx/Internet/WWW/Servers/Apache&quot; title=&quot; Apache HTTP Server &quot;&gt;Apache&lt;/a&gt; httpd.conf file (and some of the directives are straight out of it), so if you're familiar with editing that file you're ahead of the game.&lt;/p&gt;

&lt;p&gt;There are several options that you will want to alter, and the file is fully documented. Well almost fully documented. I'll get to that in a minute.&lt;/p&gt;

&lt;pre&gt;
##  [apache] ServerAdmin : e-mail address to send the reports

ServerAdmin doug@example.com

##  MailFrom : From: address for e-mail output
##
## Unless the relevant option is specified when running the LogValidator,
## the mail output will use ServerAdmin (see above) as From: and To:
## This option allows you to override the From: parameter
## DEFAULT  = ServerAdmin

MailFrom logvalidator@example.com

## Title : a more useful Subject: for the Mail Output
##           and &amp;lt;title&amp;gt; for HTML Output
##
## Tell the Mail/HTML Output what this config is all about
## and make them use a better subject than the vanilla default
## DEFAULT = Logvalidator results

Title LogValidator - Results for example.com

##  [apache] DocumentRoot : where the files are located
##
## For some log formats, it is necessary to know where the actual files
## reside on the server

DocumentRoot /var/www/mysite/docroot/

##  [apache] ServerName : full address for the web server
##
## should be of the form host.domain
## NOTE: no need to prepend http://

ServerName example.com

##  [apache] CustomLog : log file and format
##
## Add as many entries as you like.
## The Log Validator will process all log files listed below
## formats: see http://httpd.apache.org/docs/mod/mod_log_config.html
#  NOTE: only the following formats are currently supported:
#                    common, combined, w3, full, plain (list of addresses)
# CustomLog /var/log/apache/access.log.1 combined
# CustomLog /home/me/path/to/list plain

CustomLog /var/www/mysite/logs/access_log combined

## [apache] DirectoryIndex : document equivalent to &quot;/&quot;
##
## See http://httpd.apache.org/docs/mod/mod_dir.html#directoryindex
## Used by the validator to compute the &quot;canonical&quot; URLs for Documents
## DEFAULT = index.html index.htm index

DirectoryIndex index.php index.html
&lt;/pre&gt;

&lt;p&gt;There are a large number of other options to play with, but many of the defaults are sufficient until you get everything running and you want to tweak things. The options that don't seem to be documented very well, and are omitted from the sample configuration file, relate to output. So I took the time to do so.&lt;/p&gt;

&lt;pre&gt;
# UseOutputModule : method and location to send output

# --email from your shell, or
# UseOutputModule W3C::LogValidator::Output::Mail
# -s|--sendto &amp;lt;address&amp;gt; from your shell, or
# SendTo doug@example.com

# --HTML from your shell, or

UseOutputModule W3C::LogValidator::Output::HTML

# -o|--output &amp;lt;path&amp;gt; from your shell, or

OutputTo /var/www/mysite/www/admin/logvalidator.html

# output will go to console if not specified
&lt;/pre&gt;

&lt;p&gt;Notice I have both options set, but the email method is commented out (a `#' character precedes comments). You can override either option by using command-line switches as described above. Also note that I am placing the HTML output in an admin directory, which I happen to have password protected because it contains other stuff that I don't want just anyone to have access to.&lt;/p&gt;

&lt;p&gt;Once you have everything configured correctly it's time to run the script and view the output. Back at your command prompt, run the script and tell it where to find the configuration file:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;~config&amp;gt; logprocess.pl -f logprocess.conf&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This should work if you are in the same directory as the configuration file, otherwise you may have to specify the full (or relative) path to its location. Also, the CPAN installation typically puts the Perl script in /usr/local/bin/, so if that directory isn't in your PATH or depending on which shell you're using, it may not find the script at first. Either try using the full path to the script or run the &lt;code&gt;rehash&lt;/code&gt; shell built-in command to update its database.&lt;/p&gt;

&lt;p&gt;Also, depending on how much traffic you get, the server log file may be quite large and running all four modules on the results will take some time, so be patient. When the script exits and depending on the output option you selected, you will either get an email with the results or you can visit the generated HTML report with your browser.&lt;/p&gt;

&lt;h3&gt;Reports&lt;/h3&gt;

&lt;p&gt;To generate the report the script took about 20 minutes running in the background. This was with a full day's worth of requests stored in my Apache access log from yesterday. I quick check with &lt;code&gt;&lt;a href=&quot;http://www.freebsd.org/cgi/man.cgi?query=wc&amp;amp;apropos=0&amp;amp;sektion=1&amp;amp;manpath=FreeBSD+7.0-RELEASE&amp;amp;format=html&quot; title=&quot; wc -- word, line, character, and byte count &quot;&gt;wc -l&lt;/a&gt;&lt;/code&gt; on the log returned around 50,000 total requests. Keep in mind that many of these are for images, CSS, &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/JavaScript&quot; title=&quot; Programming: Languages: JavaScript &quot;&gt;JavaScript&lt;/a&gt;, and other resources&amp;#8212;not complete pages. Below is a thumbnail with my results, minus the Basic module, which was configured to list the top 100 most popular pages (the default).&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://loadaveragezero.com/img/logresults_full.png&quot; title=&quot; LogValidator - Results for loadaverageZero&quot;&gt;&lt;img src=&quot;http://loadaveragezero.com/img/logresults_thumb.png&quot; alt=&quot;screenshot&quot; style=&quot;border: 1px solid #666;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I kept an eye on the process while it ran and by far the longest time was spent during the LinkChecker phase, which is not surprising given all the external &lt;a href=&quot;http://loadaveragezero.com/app/drx/Internet/Protocols/HTTP&quot; title=&quot; Internet: Protocols: HTTP &quot;&gt;HTTP&lt;/a&gt; requests necessary to check each link. And I have a lot of outbound links. I ran checklink manually on my home page, and that in itself took a minute and change.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;LogValidator is a great tool, although it takes some work to get it running. The main idea here is to target those pages on your Web site that get the most traffic and fix problems there first, before moving on. If you have thousands of pages like I do, it can be a pretty daunting task to find and fix every error on your site.&lt;/p&gt;

&lt;p&gt;One final note. I use &lt;a href=&quot;http://www.freebsd.org/cgi/man.cgi?query=newsyslog&amp;amp;sektion=8&amp;amp;apropos=0&amp;amp;manpath=FreeBSD+7.0-RELEASE&quot; title=&quot; Maintain system log files &quot;&gt;newsyslog(8)&lt;/a&gt; to automatically rotate and gzip compress my Apache access logs. I was hoping that LogValidator would have the ability to deal with gzipped log files, but as far as I could determine it does not. Although it really wouldn't be that difficult to modify the code to do so, I opted for using a simple wrapper shell script that uncompresses yesterday's access log, runs the logprocess script, and then recompresses the log file. Below is some sample code if you want to take this route.&lt;/p&gt;

&lt;pre&gt;
# newsyslog.conf -- rotate apache logs
/var/www/mysite/logs/error_log root:staff 640 6 &amp;#42; @T00 BZN
/var/www/mysite/logs/access_log root:staff 640 6 &amp;#42; @T00 BZ /var/run/httpd.pid 30
&lt;/pre&gt;

&lt;pre&gt;
#!/bin/sh

# logprocess -- W3C::LogValidator wrapper script

config=/var/www/mysite/config/logprocess.conf
access=/var/www/mysite/logs/access_log.0

gunzip ${access} 2&amp;gt;&amp;amp;1 /dev/null
logprocess.pl --quiet --config ${config} 2&amp;gt;&amp;amp;1 /dev/null
gzip ${access} 2&amp;gt;&amp;amp;1 /dev/null
&lt;/pre&gt;

&lt;p&gt;In case you're wondering, the &lt;code&gt;2&amp;gt;&amp;amp;1 /dev/null&lt;/code&gt; at the end of each line above means &quot;really quiet&quot; as in all output, both stdout (console) and stderror (also console, but another file handle), is sent to the &quot;bit bucket&quot; (/dev/null). And the reason I'm doing this is because I can now execute this script via cron and not have a bunch of useless messages in my inbox.&lt;/p&gt;

&lt;p&gt;That all he wrote folks! And you thought I was a softy after my last post. &lt;img src=&quot;http://loadaveragezero.com/img/fav/drx/wink.gif&quot; class=&quot;icon&quot; alt=&quot;wink&quot; /&gt;&lt;/p&gt;
    </content:encoded>
    <pubDate>Tue, 04 Nov 2008 02:33:00 -0500</pubDate>
    <guid isPermaLink="false">http://loadaveragezero.com/app/s9y/index.php?/archives/151-guid.html</guid>
    <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</creativeCommons:license><category>cpan</category>
<category>cron</category>
<category>css</category>
<category>email</category>
<category>html</category>
<category>lwp</category>
<category>module</category>
<category>perl</category>
<category>standards</category>
<category>validation</category>
<category>w3c</category>
<category>xhtml</category>
</item>
<item>
    <title>Authenticating a Googlebot in PHP and Perl</title>
    <link>http://loadaveragezero.com/app/s9y/index.php?/archives/135-Authenticating-a-Googlebot-in-PHP-and-Perl.html</link>
<category>PHP</category><category>FreeBSD</category><category>Linux</category><category>Perl</category>    <comments>http://loadaveragezero.com/app/s9y/index.php?/archives/135-Authenticating-a-Googlebot-in-PHP-and-Perl.html#comments</comments>
    <wfw:comment>http://loadaveragezero.com/app/s9y/wfwcomment.php?cid=135</wfw:comment>
    <slash:comments>3</slash:comments>
    <wfw:commentRss>http://loadaveragezero.com/app/s9y/rss.php?version=2.0&amp;type=comments&amp;cid=135</wfw:commentRss>
    <author>dwclifton@gmail.com (Douglas Clifton)</author>
    <content:encoded>
&lt;p&gt;&lt;img src=&quot;http://loadaveragezero.com/img/fav/drx/pencil.gif&quot; class=&quot;icon&quot; alt=&quot;code&quot; title=&quot; Sample Code &quot; /&gt; Following a &lt;a href=&quot;http://www.maxdesign.com.au/2006/09/23/some-links-97/&quot; title=&quot; Some links for light reading &quot;&gt;tip&lt;/a&gt; from &lt;a href=&quot;http://loadaveragezero.com/drx/author/R#a21&quot; title=&quot; Russ Weakley &quot;&gt;Russ&lt;/a&gt; I was pleased to find an interesting post on the Official Google Webmaster Central Blog titled
&lt;a href=&quot;http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html&quot;&gt;&lt;em&gt;How to verify Googlebot&lt;/em&gt;&lt;/a&gt;. In a nutshell, it explains how to use the &lt;a href=&quot;http://loadaveragezero.com/app/drx/Software/Operating_Systems/Unix&quot;&gt;Unix&lt;/a&gt; shell program &lt;a href=&quot;http://www.freebsd.org/cgi/man.cgi?query=host&quot;&gt;host&lt;/a&gt; to authenticate that an IP address copied from your Web &lt;a href=&quot;http://loadaveragezero.com/app/drx/Internet/WWW/Servers&quot;&gt;server&lt;/a&gt; log file really is a Googlebot and not some email harvester (or whatever).&lt;/p&gt;

&lt;p&gt;I decided to take this a step further and demonstrate how you can automate this procedure using a scripting language. For these examples I chose &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/PHP&quot;&gt;PHP&lt;/a&gt;
and &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/Perl&quot;&gt;Perl&lt;/a&gt;, although you could certainly use &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/Python&quot;&gt;Python&lt;/a&gt; or Ruby or whatever your preferred language is, as long as it has an interface to the &lt;a href=&quot;http://www.freebsd.org/cgi/man.cgi?query=gethostbyaddr&amp;amp;sektion=3&quot;&gt;gethostbyname and gethostbyaddr&lt;/a&gt; system calls.&lt;/p&gt;

&lt;p&gt;Using these calls under PHP is the simpler of the two approaches, as the interface to these routines are written at a more abstract level than using the Perl &lt;a href=&quot;http://search.cpan.org/~nwclark/perl-5.8.8/ext/Socket/Socket.pm&quot;&gt;Socket module&lt;/a&gt;. Below is an example googlebot() function in PHP that returns true if the IP address parameter authenticates, although there is no 100% guarantee of preventing a spoof getting through (but it will catch the vast majority of them). A bit of test code is included.&lt;/p&gt;

&lt;pre&gt;
&amp;lt;?php

function googlebot($ip)  {

    // check to see if this IP really is a Googlebot

    $bot = 'googlebot.com';
    $name = gethostbyaddr($ip);
    if ($name == $ip) return false;

    return (strpos($name, $bot) !== false and gethostbyname($name) == $ip) ? true : false;
}

// test it

$ip = '66.249.66.1';

echo $ip . ' is ';
if (!googlebot($ip)) echo 'not ';
echo 'a Google bot' . &quot;\n&quot;;
?&amp;gt;
&lt;/pre&gt;

&lt;p&gt;The Perl version is at a much lower level, very similar to the corresponding &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/C&quot;&gt;C&lt;/a&gt; system calls. In fact, the module is derived directly from the sys/sockets.h header file and the functions are just wrappers around these Standard C library calls. See &lt;a href=&quot;http://en.wikipedia.org/wiki/Berkeley_sockets&quot;&gt;Berkeley Sockets&lt;/a&gt; for more information. If you  have a copy of &lt;a href=&quot;http://www.amazon.com/o/tg/detail/-/0596000278/loadaverageze-20&quot;&gt;Programming Perl&lt;/a&gt;, the chapter 16 &lt;em&gt;Interprocess Communications&lt;/em&gt; section on socket programming will help, and if you are lucky enough to have a copy of the &lt;a href=&quot;http://www.amazon.com/o/tg/detail/-/0596003137/loadaverageze-20&quot;&gt;Perl Cookbook&lt;/a&gt;, chapter 18 &lt;em&gt;Internet Services&lt;/em&gt; has some great recipes for &lt;acronym title=&quot; Domain Name System &quot;&gt;DNS &lt;/acronym&gt; lookups. For &lt;em&gt;really&lt;/em&gt; gory details, refer to chapter 14 &lt;em&gt;DNS: The Domain Name System&lt;/em&gt; of &lt;a href=&quot;http://www.amazon.com/o/tg/detail/-/0201633469/loadaverageze-20&quot;&gt;TCP/IP Illustrated, Volume I&amp;#8212;The Protocols&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;
#!/usr/bin/perl

use Socket;

sub googlebot($)  {

    # check to see if this IP really is a Googlebot

    my $ip = shift;
    my $bot = 'googlebot\.com';
    my $name = gethostbyaddr(inet_aton($ip), AF_INET) or return 0;
    my @addr = gethostbyname($name);
    my $addr = inet_ntoa($addr[4]);

    return ($name =~ m/$bot/ and $ip eq $addr) ? 1 : 0;
}

# test it

$ip = '66.249.66.1';

print $ip . ' is ';
unless (googlebot($ip)) { print 'not '; }
print 'a Google bot' . &quot;\n&quot;;
&lt;/pre&gt;

&lt;p&gt;Finally, in case anyone is interested why it's been so long since I posted anything, much of the summer I was sick as a dog and since recovering, busy as a bee. It's nice to be feeling better and back to work!&lt;/p&gt;
    </content:encoded>
    <pubDate>Sun, 24 Sep 2006 18:08:26 -0400</pubDate>
    <guid isPermaLink="false">http://loadaveragezero.com/app/s9y/index.php?/archives/135-guid.html</guid>
    <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</creativeCommons:license><category>api</category>
<category>cpan</category>
<category>dns</category>
<category>freebsd</category>
<category>google</category>
<category>howto</category>
<category>internet</category>
<category>languages</category>
<category>module</category>
<category>perl</category>
<category>php</category>
<category>python</category>
<category>ruby</category>
<category>tutorial</category>
<category>unix</category>
<category>web server</category>
</item>
<item>
    <title>Sendmail in Perl</title>
    <link>http://loadaveragezero.com/app/s9y/index.php?/archives/133-Sendmail-in-Perl.html</link>
<category>Perl</category>    <comments>http://loadaveragezero.com/app/s9y/index.php?/archives/133-Sendmail-in-Perl.html#comments</comments>
    <wfw:comment>http://loadaveragezero.com/app/s9y/wfwcomment.php?cid=133</wfw:comment>
    <slash:comments>3</slash:comments>
    <wfw:commentRss>http://loadaveragezero.com/app/s9y/rss.php?version=2.0&amp;type=comments&amp;cid=133</wfw:commentRss>
    <author>dwclifton@gmail.com (Douglas Clifton)</author>
    <content:encoded>
&lt;p&gt;&lt;img src=&quot;http://loadaveragezero.com/img/fav/drx/CPAN.gif&quot; class=&quot;icon&quot; alt=&quot;cpan&quot; title=&quot; The Comprehensive Perl Archive Network &quot; /&gt; People love to argue. A recent thread on the &lt;a href=&quot;http://dc.pm.org/&quot;&gt;DC&lt;/a&gt; &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/Perl&quot;&gt;Perl&lt;/a&gt; &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/Perl#camel:mongers&quot;&gt;Mongers&lt;/a&gt; &lt;a href=&quot;http://dc.pm.org/#mailing&quot;&gt;mailing list&lt;/a&gt; opened yet another discussion of which &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/Perl#CPAN:org&quot; title=&quot; The Comprehensive Perl Archive Network &quot;&gt;CPAN&lt;/a&gt; module is the best for sending email.&lt;/p&gt;

&lt;p&gt;Most &lt;a href=&quot;http://loadaveragezero.com/app/drx/Software/Operating_Systems/Unix&quot;&gt;Unix&lt;/a&gt; systems, at least the &lt;a href=&quot;http://loadaveragezero.com/app/drx/Software/Open_Source&quot;&gt;open-source&lt;/a&gt; flavors I use most frequently (&lt;a href=&quot;http://loadaveragezero.com/app/drx/Software/Operating_Systems/Unix/FreeBSD&quot;&gt;FreeBSD&lt;/a&gt; and &lt;a href=&quot;http://loadaveragezero.com/app/drx/Software/Operating_Systems/Unix/Linux&quot;&gt;Linux&lt;/a&gt;), have &lt;a href=&quot;http://www.sendmail.org/&quot;&gt;sendmail&lt;/a&gt; installed. So when I need to send an email from a Perl script I just roll my own code to do it. You could argue that it's too simplistic because it's procedural or it isn't portable, but it works, it's fast and I have control over the code.&lt;/p&gt;

&lt;p&gt;The problem with using someone else's module is you're tied into the way they think, and it can be difficult or time consuming to modify the code to add a feature or fix a bug.&lt;/p&gt;

&lt;p&gt;Who made this rule that you have to use a module for everything or that object-oriented methods are somehow intrinsically superior to procedural programming? Don't get the wrong idea, I have nothing against modules or OOP. It would have taken me a hell of a lot longer to write many of the Perl scripts I've written without wonderful tools like &lt;a href=&quot;http://search.cpan.org/dist/CGI.pm/&quot;&gt;CGI.pm&lt;/a&gt; and &lt;a href=&quot;http://loadaveragezero.com/app/drx/Software/Databases#DBI:perl&quot;&gt;DBI&lt;/a&gt;. Not to mention about a 100 others.&lt;/p&gt;

&lt;pre&gt;
#!/usr/bin/perl

$to = 'you@somehost.com';
$from = 'me@somehost.com';
$subject = 'test sendmail';
$body = &amp;lt;&amp;lt;EOB;
Hello from Perl/sendmail
EOB

if ($error = sendmail($to, $from, $subject, $body))  {
    die &quot;Can't sendmail: $error\n&quot;;
}
print 'mail sent';

sub sendmail($$$$)  {

   my @args = (
       'to',
       'from',
       'subject',
       'body'
   );

   my $arg;

   foreach $arg (@args)  {
       unless ($$arg = shift)  {
           return (caller(0))[3] . ': missing $' . $arg . ' parameter';
       }
   }

   my $sendmail = '/usr/sbin/sendmail';
   my $switch = '-t';

   open MAIL, &quot;|$sendmail $switch&quot; or return $!;

   print MAIL &amp;lt;&amp;lt;EOM;
To: $to
From: $from
Subject: $subject

$body
EOM
   close MAIL;
   return;

} # sendmail()
&lt;/pre&gt;
    </content:encoded>
    <pubDate>Wed, 03 May 2006 14:44:21 -0400</pubDate>
    <guid isPermaLink="false">http://loadaveragezero.com/app/s9y/index.php?/archives/133-guid.html</guid>
    <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</creativeCommons:license><category>cpan</category>
<category>email</category>
<category>freebsd</category>
<category>linux</category>
<category>module</category>
<category>oop</category>
<category>open-source</category>
<category>perl</category>
<category>programming</category>
<category>sendmail</category>
<category>unix</category>
</item>
<item>
    <title>Love Perl</title>
    <link>http://loadaveragezero.com/app/s9y/index.php?/archives/85-Love-Perl.html</link>
<category>Perl</category>    <comments>http://loadaveragezero.com/app/s9y/index.php?/archives/85-Love-Perl.html#comments</comments>
    <wfw:comment>http://loadaveragezero.com/app/s9y/wfwcomment.php?cid=85</wfw:comment>
    <slash:comments>1</slash:comments>
    <wfw:commentRss>http://loadaveragezero.com/app/s9y/rss.php?version=2.0&amp;type=comments&amp;cid=85</wfw:commentRss>
    <author>dwclifton@gmail.com (Douglas Clifton)</author>
    <content:encoded>
&lt;p&gt;&lt;img src=&quot;http://loadaveragezero.com/img/fav/drx/heart.gif&quot; class=&quot;icon&quot; alt=&quot;heart&quot; title=&quot; Valentine's Day  &quot; /&gt; My first true love as a language, and although I don't use it much for Web stuff, I do still use it quite a bit. In fact I have a whole bin/cron directory full of Perl scripts. I love Perl!&lt;/p&gt;

&lt;p&gt;From the &lt;a href=&quot;http://loadaveragezero.com/drx/author/M#a689&quot; title=&quot; Mark Fowler &quot;&gt;creator&lt;/a&gt; of the &lt;a href=&quot;http://www.perladvent.org/&quot;&gt;Perl Advent Calendar&lt;/a&gt;, launching Valentine's Day, comes &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/Perl#love:perl&quot;&gt;LovePerl&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Via &lt;a href=&quot;http://loadaveragezero.com/app/drx/Software/Open_Source#radar:oreilly&quot;&gt;O'Reilly Radar&lt;/a&gt;.&lt;/p&gt;
    </content:encoded>
    <pubDate>Thu, 09 Feb 2006 20:46:42 -0500</pubDate>
    <guid isPermaLink="false">http://loadaveragezero.com/app/s9y/index.php?/archives/85-guid.html</guid>
    <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</creativeCommons:license><category>cpan</category>
<category>module</category>
<category>perl</category>
<category>search</category>
</item>
<item>
    <title>Bricolage Adds PHP Support</title>
    <link>http://loadaveragezero.com/app/s9y/index.php?/archives/28-Bricolage-Adds-PHP-Support.html</link>
<category>PHP</category><category>Perl</category>    <comments>http://loadaveragezero.com/app/s9y/index.php?/archives/28-Bricolage-Adds-PHP-Support.html#comments</comments>
    <wfw:comment>http://loadaveragezero.com/app/s9y/wfwcomment.php?cid=28</wfw:comment>
    <slash:comments>1</slash:comments>
    <wfw:commentRss>http://loadaveragezero.com/app/s9y/rss.php?version=2.0&amp;type=comments&amp;cid=28</wfw:commentRss>
    <author>dwclifton@gmail.com (Douglas Clifton)</author>
    <content:encoded>
&lt;p&gt;&lt;img src=&quot;http://loadaveragezero.com/img/fav/drx/bricolage.gif&quot; class=&quot;icon&quot; alt=&quot;bricolage&quot; title=&quot; Bricolage CMS &quot; /&gt; Version 1.9.0 of &lt;a href=&quot;http://loadaveragezero.com/drx/author/D#a588&quot; title=&quot; Just a Theory &quot;&gt;David Wheeler's&lt;/a&gt; excellent &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/Perl#bricolage:cms&quot; title=&quot; Bricolage CMS &quot;&gt;Bricolage&lt;/a&gt; Perl &lt;acronym title=&quot; Content Management System &quot;&gt;CMS&lt;/acronym&gt; adds the ability to call &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/PHP&quot; title=&quot; PHP: Hypertext Preprocessor &quot;&gt;PHP&lt;/a&gt; from inside the system. It also sports a new &lt;acronym title=&quot; User Interface &quot;&gt;UI&lt;/acronym&gt;, built with standards compliant &lt;a href=&quot;http://loadaveragezero.com/app/drx/Data_Formats/Markup_Languages/XHTML&quot; title=&quot; Extensible Hypertext Markup Language&quot;&gt;XHMTL&lt;/a&gt; 1.0 Strict and &lt;a href=&quot;http://loadaveragezero.com/app/drx/Data_Formats/Style_Sheets/CSS&quot; title=&quot; Cascading Style Sheets &quot;&gt;CSS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But the really exciting news is the new &lt;a href=&quot;http://www.justatheory.com/bricolage/1.9.0.html&quot; title=&quot; Bricolage Now has PHP 5 Templating &quot;&gt;PHP 5 Templating Architecture&lt;/a&gt;, on top of the existing Mason, Template Toolkit, and HTML::Template systems. To pull it off, the Bricolage team brought in George Schlossnagle of &lt;a href=&quot;http://www.omniti.com/&quot; title=&quot; Omni TI Consulting &quot;&gt;Omni TI&lt;/a&gt; to write a full-blown &lt;a href=&quot;http://search.cpan.org/dist/PHP-Interpreter/&quot; title=&quot; PHP::Interpreter on CPAN &quot;&gt;PHP::Interpreter&lt;/a&gt;. The new Perl module is not restricted to deployment inside of Bricolage, you can install and use it from any Perl application. Not only does it allow you to call PHP from Perl, with you can reach back and access Perl objects and methods from inside PHP.&lt;/p&gt;

&lt;p&gt;You heard me, you can access &lt;em&gt;&lt;strong&gt;any&lt;/strong&gt;&lt;/em&gt; &lt;a href=&quot;http://loadaveragezero.com/app/drx/Programming/Languages/Perl#CPAN:org&quot; title=&quot; The Comprehensive Perl Archive Network &quot;&gt;CPAN&lt;/a&gt; module from inside PHP. Wow. I have goosebumps on my arms.&lt;/p&gt;

&lt;p&gt;With the addition of the leaner UI and new support for &lt;a href=&quot;http://loadaveragezero.com/app/drx/Internet/WWW/Servers/Apache#sourceforge:mod_gzip&quot; title=&quot; Apache Compression Module &quot;&gt;mod_gzip&lt;/a&gt;, the Bricolage system gets a nice performance boost as well.&lt;/p&gt;

&lt;p&gt;Via &lt;a href=&quot;http://loadaveragezero.com/app/drx/Software/Open_Source#radar:oreilly&quot; title=&quot; O'Reilly Radar &quot;&gt;O'Reilly Radar&lt;/a&gt;.&lt;/p&gt;
    </content:encoded>
    <pubDate>Thu, 25 Aug 2005 10:24:00 -0400</pubDate>
    <guid isPermaLink="false">http://loadaveragezero.com/app/s9y/index.php?/archives/28-guid.html</guid>
    <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</creativeCommons:license><category>apache</category>
<category>cms</category>
<category>cpan</category>
<category>css</category>
<category>developer</category>
<category>open-source</category>
<category>oreilly</category>
<category>perl</category>
<category>php</category>
<category>programming</category>
<category>standards</category>
<category>xhtml</category>
</item>
<item>
    <title>Blue Moon</title>
    <link>http://loadaveragezero.com/app/s9y/index.php?/archives/24-Blue-Moon.html</link>
<category>Perl</category><category>MySQL</category>    <comments>http://loadaveragezero.com/app/s9y/index.php?/archives/24-Blue-Moon.html#comments</comments>
    <wfw:comment>http://loadaveragezero.com/app/s9y/wfwcomment.php?cid=24</wfw:comment>
    <slash:comments>1</slash:comments>
    <wfw:commentRss>http://loadaveragezero.com/app/s9y/rss.php?version=2.0&amp;type=comments&amp;cid=24</wfw:commentRss>
    <author>dwclifton@gmail.com (Douglas Clifton)</author>
    <content:encoded>
&lt;p&gt;&lt;img src=&quot;http://loadaveragezero.com/app/dcal/lun/blue.png&quot; class=&quot;icon&quot; alt=&quot;bluemoon&quot; /&gt; Dammit! I complete forgot tonight is the Blue Moon.&lt;/p&gt;

&lt;p&gt;Check out: &lt;a href=&quot;http://loadaveragezero.com/app/dcal/?y=2005&amp;amp;m=8&quot; title=&quot; dcal: August, 2005 &quot;&gt;dcal: Calendar for August, 2005&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And some details on how it is implemented: &lt;a href=&quot;http://loadaveragezero.com/app/dbrowse/dcal/tables/moon_phases/data&quot; title=&quot; Moon Phase data &quot;&gt;dbrowse: Moon Phase Data and the Blue Moon&lt;/a&gt;.&lt;/p&gt;
    </content:encoded>
    <pubDate>Sat, 20 Aug 2005 02:08:26 -0400</pubDate>
    <guid isPermaLink="false">http://loadaveragezero.com/app/s9y/index.php?/archives/24-guid.html</guid>
    <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</creativeCommons:license><category>database</category>
<category>moon</category>
<category>mysql</category>
<category>perl</category>
<category>phase</category>
</item>
</channel>
</rss>
