Tags related to tag internetapache api blog browser chat client commentary community cpan design developer dns extension freebsd google hockey howto irc languages linux module mozilla mysql open-source oreilly perl php programming python red wings ruby search ssh stats tagging technorati tutorial unix user-agent w3c
Sunday, September 24, 2006
Following a tip from Russ I was pleased to find an interesting post on the Official Google Webmaster Central Blog titled
How to verify Googlebot. In a nutshell, it explains how to use the Unix shell program host to authenticate that an IP address copied from your Web server log file really is a Googlebot and not some email harvester (or whatever).
I decided to take this a step further and demonstrate how you can automate this procedure using a scripting language. For these examples I chose PHP
and Perl, although you could certainly use Python or Ruby or whatever your preferred language is, as long as it has an interface to the gethostbyname and gethostbyaddr system calls.
Using these calls under PHP is the simpler of the two approaches, as the interface to these routines are written at a more abstract level than using the Perl Socket module. Below is an example googlebot() function in PHP that returns true if the IP address parameter authenticates, although there is no 100% guarantee of preventing a spoof getting through (but it will catch the vast majority of them). A bit of test code is included.
<?php
function googlebot($ip) {
// check to see if this IP really is a Googlebot
$bot = 'googlebot.com';
$name = gethostbyaddr($ip);
if ($name == $ip) return false;
return (strpos($name, $bot) !== false and gethostbyname($name) == $ip) ? true : false;
}
// test it
$ip = '66.249.66.1';
echo $ip . ' is ';
if (!googlebot($ip)) echo 'not ';
echo 'a Google bot' . "\n";
?>
The Perl version is at a much lower level, very similar to the corresponding C system calls. In fact, the module is derived directly from the sys/sockets.h header file and the functions are just wrappers around these Standard C library calls. See Berkeley Sockets for more information. If you have a copy of Programming Perl, the chapter 16 Interprocess Communications section on socket programming will help, and if you are lucky enough to have a copy of the Perl Cookbook, chapter 18 Internet Services has some great recipes for DNS lookups. For really gory details, refer to chapter 14 DNS: The Domain Name System of TCP/IP Illustrated, Volume I—The Protocols.
#!/usr/bin/perl
use Socket;
sub googlebot($) {
# check to see if this IP really is a Googlebot
my $ip = shift;
my $bot = 'googlebot\.com';
my $name = gethostbyaddr(inet_aton($ip), AF_INET) or return 0;
my @addr = gethostbyname($name);
my $addr = inet_ntoa($addr[4]);
return ($name =~ m/$bot/ and $ip eq $addr) ? 1 : 0;
}
# test it
$ip = '66.249.66.1';
print $ip . ' is ';
unless (googlebot($ip)) { print 'not '; }
print 'a Google bot' . "\n";
Finally, in case anyone is interested why it's been so long since I posted anything, much of the summer I was sick as a dog and since recovering, busy as a bee. It's nice to be feeling better and back to work!
Sunday, February 26, 2006
I am a huge hockey fan and I can't wait to watch the finals later today, with Sweden set to battle Finland for the Olympic gold. So why am I blogging about it here? Well, partially because it's Sunday and I like to post something a bit more light-hearted on weekends, but mostly because there is a strong connection between what interests me and these two Nordic countries. I've also heard both boast a large and enthusiastic community of ice hockey fans.

Sweden — Web Development
- 456 Berea Street. A blog by Roger Johansson, with a well-deserved reputation for upholding best practices and sharing techniques. A favorite of Web design and development blogrolls the world over.
- Robert's talk. Robert Nyman is another rising Web development blogger. I visit his site often, and you should too. I'm also looking forward to meeting him at SXSW next month.
- Autistic Cuckoo. Sadly, Tommy Olsson is no longer contributing to his blog. But he hasn't taken it down and there are plenty of excellent articles still there to study and bookmark.

Finland — Internet Technology
- Linux. Created by Linus Torvalds while a student at University of Helsinki, Finland, Linux has made such an impact on computing, and especially its use as a Web Server platform, that it's hard to say what the landscape of the Web would be like without it. I work from dozens of Linux hosts, along with the equally popular (at least in terms of Web hosting) FreeBSD open-source operating system.
- IRC. A multi-user, instant messaging/chat system created by Jarkko Oikarinen, IRC is used by open-source teams the world over. Some of the better known channels are provided by freenode, including developers chatting about Apache, many Linux distributions, MySQL, the W3C, Wikipedia, and countless others. Probably the best known of open-source IRC channels are those hosted by the Mozilla IRC Network, and many users rely on the Chatzilla Firefox extension to connect.
- SSH. The Secure Shell protocol and family of client programs was first developed by Tatu Ylönen at the Helsinki University of Technology. The OpenBSD project followed up a few years later with a free implementation, OpenSSH, which you will find pre-installed on all of the operating systems previously mentioned (and many others). Anyone still using telnet to connect to remote servers is living in the dark ages.
So, in a way I guess it's kind of like the the right brains against the left. Update: I've just heard that Sweden won, which is good since that's who I was rooting for. Not for the reasons you're probably thinking, rather because I'm a huge Red Wings fan and about half the Swedish team is made up of some of my favorite players. But I don't want to hear anything more—because of the time difference here in the US the game hasn't been televised yet!
Tuesday, February 7, 2006
From David Sifry and Technorati, some exciting stats (complete with graphs) on the exponential growth of blogging over the past several years: State of the Blogosphere.
A few highlights from the summary:
- The blogosphere is doubling in size every 5 and a half months
- It is now over 60 times bigger than it was 3 years ago
- On average, a new weblog is created every second of every day
- 13.7 million bloggers are still posting 3 months after their blogs are created
There are a lot more details, many of them specific to how much data Technorati handles. I used to complain about their performance problems (which has been much better of late), but wow, they have their hands full and there's no end in sight.
Keep on blogging on!
Thursday, November 24, 2005
O'Reilly Radar has a fascinating series of articles that explore the personal history behind some of the so-called "alpha-geeks." Find out what motivated these achievers, and their sometimes surprising road into the world of computers, software and writing.
In the aptly named Burn In 0: How I Got Into Computers, Nathan Torkington (coauthor of the Perl Cookbook) sets the stage for the series, and tells his own story.
Perhaps the flipside of all this would be "burn out" -- or the stories behind those who turned their backs on high-tech careers. Any organic farmers out there?
Sunday, August 28, 2005
This little ditty has been around the block, and still has no place to go. Except home. I wrote Agent vs. Agent on a whim, it's a commentary piece and never pretended to be anything else. I cranked the thing out one morning, one of those days after you get a crazy idea in your head just before falling asleep.
So after a little spit-and-polish, I submitted it to Evolt
for consideration. They in fact reviewed and approved the article, pending its stay in the queue. A few days later, I received a notification that all articles were being placed on hold temporarily. At about the same time, a post appeared on the Evolt site titled On the Move.
In the next few days, the evolt.org site is changing - we're moving hosts, changing CMS and redesigning. In the meantime, we're closing the site to new content while we migrate the data.
That, dear readers, was over six months ago. Now I'm not complaining, like many community sites (and I happen to think Evolt is, or was, an outstanding community) they are maintained by volunteers who have careers, and lives, and so on. Eventually, I gave up and withdrew the piece. Poor little AvsA had no home.
In the meantime, I'd written and published a piece
titled Generating Dynamic CSS with PHP
for Digital Web Magazine which turned out to be a fairly successful bit of work. The last time I checked, searching Google for the keyphrase "dynamic css php" returned a link to the article on the
first page. I still have people ask me about it, and it pops up on del.icio.us as a new bookmark every week.
So, I decided to pitch the article to DWM. I cleaned it up a little more and showed it to Krista, the Editor in Chief of the magazine. After a few weeks I received an email from her.

The piece is more appropriate to your blog, where you are free to offer your opinion as you see fit. As it is, it is not right for the magazine.
I might have been upset, if I didn't think it was so funny. To hell with it I thought, I'll publish the thing myself. And I realized my Rant page hadn't been updated in ages, and since according to Krista all I was doing was ranting anyway, what better place, no? Even better, since I'm not under any restrictions based on the thing being published elsewhere, I could add back in a few things that were edited out, and so on. So, for a third (or fourth, or fifth...) time I modified the piece to get it ready for my own site.
But hold your horses! The story isn't quite over yet. Low and behold, in the process of writing this blog post (which is at least as long as the damn article) I discovered, to my astonishment, that Evolt 3.0 is finally coming out. For about a nanosecond I thought about bringing AvsA back to Evolt, but it is already part of my Rant page and I'm not about to do anything more with it.
So, for you spy genre and John Le Carré fans out there, my Spy has Come in from the Cold.
Friday, August 19, 2005
The Netcraft Web Server Survey turns 10 years old this month, and marks another milestone, finding over 70 million Web sites on the Internet. Phew! That's a lot of Web sites.
The first Netcraft survey in August 1995 found 18,957 hosts, with the NCSA web server dominating with 57 percent market share, leading CERN (19%) and a newcomer named Apache (3.5%). Microsoft's Internet Information Server launched in February 1996, and by the survey's fifth birthday the server market was largely divided up between Apache (62%) and IIS (19%).
Apache continues its dominance, weighing in at just under 70% of all hostnames.
|