Tags related to tag tutorialaggregator ajax apache api atom blog browser community conference cpan developer dns extension feed firefox framework freebsd gecko geotag google howto html internet javascript json languages maps module mvc news nyc nyphp open-source opera pcre perl php programming python reader regex rest ria rss ruby search smarty sxsw training unix validation web web 2.0 web server web services
Sunday, September 24, 2006
Following a tip from Russ I was pleased to find an interesting post on the Official Google Webmaster Central Blog titled
How to verify Googlebot. In a nutshell, it explains how to use the Unix shell program host to authenticate that an IP address copied from your Web server log file really is a Googlebot and not some email harvester (or whatever).
I decided to take this a step further and demonstrate how you can automate this procedure using a scripting language. For these examples I chose PHP
and Perl, although you could certainly use Python or Ruby or whatever your preferred language is, as long as it has an interface to the gethostbyname and gethostbyaddr system calls.
Using these calls under PHP is the simpler of the two approaches, as the interface to these routines are written at a more abstract level than using the Perl Socket module. Below is an example googlebot() function in PHP that returns true if the IP address parameter authenticates, although there is no 100% guarantee of preventing a spoof getting through (but it will catch the vast majority of them). A bit of test code is included.
<?php
function googlebot($ip) {
// check to see if this IP really is a Googlebot
$bot = 'googlebot.com';
$name = gethostbyaddr($ip);
if ($name == $ip) return false;
return (strpos($name, $bot) !== false and gethostbyname($name) == $ip) ? true : false;
}
// test it
$ip = '66.249.66.1';
echo $ip . ' is ';
if (!googlebot($ip)) echo 'not ';
echo 'a Google bot' . "\n";
?>
The Perl version is at a much lower level, very similar to the corresponding C system calls. In fact, the module is derived directly from the sys/sockets.h header file and the functions are just wrappers around these Standard C library calls. See Berkeley Sockets for more information. If you have a copy of Programming Perl, the chapter 16 Interprocess Communications section on socket programming will help, and if you are lucky enough to have a copy of the Perl Cookbook, chapter 18 Internet Services has some great recipes for DNS lookups. For really gory details, refer to chapter 14 DNS: The Domain Name System of TCP/IP Illustrated, Volume I—The Protocols.
#!/usr/bin/perl
use Socket;
sub googlebot($) {
# check to see if this IP really is a Googlebot
my $ip = shift;
my $bot = 'googlebot\.com';
my $name = gethostbyaddr(inet_aton($ip), AF_INET) or return 0;
my @addr = gethostbyname($name);
my $addr = inet_ntoa($addr[4]);
return ($name =~ m/$bot/ and $ip eq $addr) ? 1 : 0;
}
# test it
$ip = '66.249.66.1';
print $ip . ' is ';
unless (googlebot($ip)) { print 'not '; }
print 'a Google bot' . "\n";
Finally, in case anyone is interested why it's been so long since I posted anything, much of the summer I was sick as a dog and since recovering, busy as a bee. It's nice to be feeling better and back to work!
Sunday, April 30, 2006
If you are a PHP developer in the New York metropolitan area or are planning a trip there in mid-June, then consider attending the NYPHP Conference and Expo (NYPHPCon). Running from June 14-16 at the New Yorker Hotel (map) in midtown Manhattan, the conference features three days of sessions, tutorials, exhibits, and networking events. Organized by the NYPHP Community and sponsored by many leading companies, the conference focuses on technical solutions and business strategy. You will have the opportunity to meet reps from Oracle, IBM and other top companies, network with fellow developers and business leaders, and have a look around Manhattan while you're at it.

During the two-days prior to the conference you also have the opportunity to receive in-depth training at the Programmer and Designer Track Workshops. Sorry, early bird specials are closed, but you can still register online at a substantial savings over doing so onsite.
Saturday, March 18, 2006
Everyone has their favorite browser, and each of them have specific strengths. Opera is famous for its performance and outstanding CSS support. Camino users appreciate the Gecko rendering engine, but prefer the native Mac OS X interface.
And of course there are legions of Firefox fans. For me, what really sets this browser apart from all the others is its extensibility, and in particular the many extensions for Web developers.
Before heading to SXSW I ran across one called InfoLister, which lists your installed themes, plugins and extensions. Okay, nothing terribly exciting there (or anything you can't already determine using other Firefox tools). What really intrigued me about InfoLister is a feature that allows you to upload this data as an XML file to your Web server.
Many people have published lists of their favorite extensions for Web developers, but I am constantly testing new ones and removing others I no longer find useful. What I set out to do was create such a list, but rather than building it by hand I would exploit this upload feature by parsing the XML data file and then generate the list automatically. That way, if I add or remove an extension, all I have to do is hit the InfoLister upload button and my list reflects the change.
If you a regular visitor to my site or this blog, then it should come as no surprise I elected to use PHP for the job. There are any number of other approaches you could take to accomplish the same thing and I will make some other suggestions. But this post is not the tutorial nor does it display the results, for that visit the Installed Extensions page. You will even find some sample code there if you're interested in creating a similar page.
So, why not share your own Firefox Extension List?
Tuesday, February 28, 2006
In his famous pragmatic style, Rasmus Lerdorf has shared an article on his Toys blog that will walk you through what he calls a "no-framework PHP MVC framework." The task is a simple rich Web application, with SQLite and PDO for data abstraction, Ajax and JSON for data validation (input sanitation is achieved through the PECL
filter package), and several components from the Yahoo! User Interface Library. All tied together with PHP, JavaScript and HTML, just like you'd expect.
Along the way he also shares plenty of performance tips to help keep your applications responsive. Given that Rasmus is the Infrastructure Architect at Yahoo! (not to mention the creator of PHP), he might know a thing or two about dealing with large numbers of requests. So, if the thought of refactoring a large, complex framework doesn't suit your project, consider rolling your own simplified MVC stack. This article is a great place to gather some ideas.
Thursday, February 23, 2006
The Yahoo! Developer Network has just opened their doors on yet another excellent resource, this time for PHP Developers. The focus thus far is on providing documentation needed to make REST Web Service requests, parse the results (be they in standard XML, JSON, or serialized PHP formats), and getting the most out of the results through caching for performance and reliability.
There are plenty of code samples, including the aptly named GeoCool!, which uses the Yahoo!'s Geocoder API (from none other than Rasmus Lerdorf himself). I'm beat, go check out the rest, including various hacks, howtos maps, mashups, plugins and other assorted whatnots.
Via JZ.
Sunday, August 21, 2005
Obviously, authoring RSS and content syndication tutorials is nothing new. So I was a bit surprised, after recently publishing my own introduction, when the response was so positive and so many people began linking to it. This got me thinking, and I decided to start collecting a list of other really good tutorials.
To kick this off, I found a blog post titled A Little Information About Feeds at Six Apart, and immediately added the referenced tutorial About Feeds to drx. Hence the title of this post: A Little More Information About Feeds.
From time to time, I will return to the permalink for this post and add more resources in the form of comments. You are welcome to submit your own suggestions. I hope to create a comprehensive list that we can offer to anyone who wants a thorough understanding of content syndication, RSS and Atom feeds (or any other formats that come down the pike), and the applications and Web services people can use to get the most out of the technology.
Tuesday, August 16, 2005
It's too bad that Tim Bray has commenting disabled on his blog. This is an issue that really needs discussion.
I've been saying the same thing since I first started to encounter RSS feed links. Your reward for clicking on one of them is getting a bunch of XML source code in your face. Not good! I can only image what some users think when this happens, perhaps something like "what did I just do, I broke my Internet connection!"
Here's the truth: an orange "XML" sticker that produces gibberish when you click on it does not win friends and influence people. The notion that the general public is going to grok that you copy the URI and paste it into your feed-reader is just ridiculous.
The biggest problem is what to do with visitors who don't have a feed reader installed. Browsers are ubiquitous, email programs are likewise. Most people have a multimedia applications, and so forth. But this is not the case for feed readers (yet).
My solution so far is three-fold. First, every Web site owner should have a tutorial that explains the basics of feed subscription and readers (or link to such a tutorial). Second, provide details on how to subscribe to the feeds on your site. And third, offer links to both the raw XML feed for advanced users, and use a service like Feedburner that provides a preview of the feed that works in the browser, and offers them additional ways of subscribing. I also provide links to popular aggregators, and links to Yahoo! and MSN for doing the same.
To read the entire post, visit Tim's The Real Problem.
Saturday, August 13, 2005
The loadaverageZero Apache log files are full of instances of people searching for regular expression tutorials. Probably because I list several good ones, and the search engines oblige me by indexing them.
Somehow, I missed Andrei Zmievski's PHP Regular Expression Clinic. And it's easily the best of the lot. If you're not familiar with Andrei's work, he's been the principal PHP developer since 1999, works as a member of the Core Software Infrastructure team at Yahoo!, is the author of the PHP Developer's Cookbook, the PHP-GTK GUI Toolkit extension, and is co-author of the Smarty templating engine.
He also happens to be the guy who wrote the PCRE PHP extension. That's quite a resume. Now don't assume he wrote the PCRE library itself, that distinction belongs to Philip Hazel. And PHP is by no means the only software that uses the PCRE library, you can find it as part of Apache, Python, and many other interesting open-source projects. And we can't forget about Perl itself, although it obviously doesn't need a compatibility library.
Some advice for Web developers: regular expressions are powerful, but they come at a price. I have seen countless examples of code that could easily have been implemented using standard string search and comparison functions. Use regular expressions only when you need to, and when you do, use the preg family of functions because they are faster, and more powerful. In my opinion, the extended regex (ereg) functions should be removed from the PHP core. If people really want to use them, they can compile them in or access them as a runtime extension.
The Clinic is available in the following formats:
|