Skip Site Navigation «

PHP Labs

vnav«labs«PHP

Building Modular XHTML Web Pages with PHP

Synopsis

A series of labs describing the process of building modular, structured and valid XHTML Web pages using the PHP server-side scripting language.

Contents

  1. Abstract
  2. Getting Started Resources
  3. Introduction to XHTML
  4. Review
  5. Switching to XHTML
  6. Summary
  7. PHP Labs Navigation

Abstract

This lab series is not a tutorial on writing PHP programs or scripting dynamic Web pages. Rather its intent is to demonstrate how you can use PHP to separate the different elements that make up well designed and valid Web pages into their component parts and have these parts adapt in certain powerful ways. These components correlate almost exactly with the modular design of XHTML itself. Any number of other open-source languages such as Perl or Python could be used to achieve the same goal. Or commercial ones certainly, but let’s not go there.

Getting Started Resources

For a review of HTML the global structure of an HTML document is a good place to start. The W3C is an incredible resource, and any serious Web programmer/developer should spend many hours researching their documents. If you’re prepared to get right into XHTML, I recommend you first visit this brief overview, a description of the XHTML structure module and later the complete modular design of the language. Also from the W3C is the complete XHTML 1.0 specification.

The PHP Web site has excellent online documentation including this introduction and tutorial. There are plenty of good books on these topics as well.

And finally, for an outstanding history lesson on computing in general, markup, CERN, the Internet, hypertext, the WWW and the W3C visit or bookmark History of the Web, from Oxford Brookes University.

An Introduction to XHTML

The official definition of XHTML from the W3C is a reformulation of HTML as a modular XML application. What does this mean? Simply put, XHTML is HTML 4.01 rewritten in XML. [source]

This is my not-so official version of the events that lead to this:

First you had SGML, it was huge, it was complex, there was no way you could use it across a data network. Then came HTML, it was built from SGML. But simplified, maybe too simple. It worked pretty well, but people wanted more out of Web pages than technical documents. They wanted pictures, they wanted sound, they wanted movies (they wanted porn).

Very quickly the Web exploded. We had browser wars, millions earned and millions lost and afterwards we were left with HTML which had become a mess, twisted and warped.

The W3C stepped in. They wanted to fix the problem, to move on to more sophisticated and smarter ways of doing all of these things. So they designed XML, which is like SGML only smaller, faster, modular, extensible, and can target all sorts of devices.

Assuming we have as many as, say, ten billion Web pages out there (who knows?), there is no way you can force or expect the people who created these pages to sit down and rewrite all their stuff in a new language, not to mention the COST to do this. And there is the problem of all the existing browsers used to view these documents, which aren’t going to magically disappear or suddenly understand XML.

So the W3C came up with an idea called XHTML, that it should be compatible with the older browsers and the older code, but moves us into the future where we don’t have all the problems we now have with HTML.

Makes sense to me. Granted, they also did a pretty good job of cleaning up and standardizing HTML, bringing us up to version 4.01 (in three flavors!) But that’s the end of the line folks.

In January of 2000 the W3C issued this Press Release or recommendation. That may seem rather dated to you, but new standards don’t happen overnight and software such as browsers and other UAs spend a lot of time in development.

Does this mean you must switch to XHTML? No. Does this mean you should switch to XHTML? Probably. Is hard to adopt XHTML? Yes and no, but in my opinion the benefits of doing so trump the work involved. To get you started with your existing pages, the W3C has a utility called HTML Tidy, which is now maintained by SourceForge. For further discussion on this topic, WaSP has an article titled: HTML vs. XHTML.

Review

  1. XML and SGML

    Are both metalanguages, which only means you build other markup languages from them.

  2. XML is a subset, or a simplification of SGML.
  3. HTML and XHTML

    Are both application languages, and are derived from SGML and XML respectively.

All of them employ some form of a DOCTYPE or DTD, which predates even SGML. A DTD is nothing more than a set of rules that defines the application language. With XML, the (meta) language is modular, meaning you import only the parts you need. If the user is blind, what’s the point of thousands of lines of code meant for a computer screen? XML can also target mobile phones, hand-helds, braille readers, and so on.

Switching to XHTML

In order to comply with the XHTML specification your new documents must:

  1. send the correct Content-type header in the HTTP response.
  2. declare the document as XML.
  3. send the correct DTD based on the syntax of the document.
  4. open the document with the correct root element.
  5. follow the rules of the structure of the document.
  6. follow the markup rules within the document.
  7. close all element tags.
  8. validate.

Some of these are trivial, but can be complicated by the issue of which user agent is requesting the document. Normally a UA is just a browser, but it can be a validator, or a search engine spider/crawler or a toaster. Just kidding. We’re only going to worry about the first two. To remain backwardly compatible with HTML you’ll need to to follow some simple guidelines.

The content-type header is not a part of your document, it precedes it at the network layer. I will discuss HTTP and headers in the first article.

Since XHTML should follow the strict rules of XML we begin with its declaration:

<?xml version="1.0" encoding="UTF-8"?>

In this example the encoding scheme for multiple languages and special characters is UTF 8-bit Unicode. I’ll get back to that.

Next comes the DTD which I briefly described above. Nothing that follows would make sense without it so it is mandatory now. A DTD is a formal grammar and it is a specification of what constitutes a valid document. In other words what tags are legal, their attributes and values and in what order and combination. In reality, because of performance and other reasons, most browsers that understand XHTML do not actually fetch and read the DTD. But for advanced situations and applications this is certainly possible.

Collectively the XML and DOCTYPE declarations are known as the XML prolog. Since the <html> (root or top level) element in the document that follows also relates to these declarations, my PHP doctype() function takes care of opening this tag at the same time. The closing tag is the last thing you’ll see, but there is a lot of work to do first.

Designers and other authors of CSS stylesheets should also take note that under XHTML the <html> element, rather than the <body> element, begins presentation of the document. This is easily fixed, have a quick look at the comments at the top of my root.css stylesheet and then a little further down for an example of this.

Summary

This Web site uses XHTML 1.1, and when necessary 1.0 Strict (in a very rare case Transitional). At any time you may use the W3C XHTML 1.1 button in the left sidebar to validate this, or any other document on this site. You will also find instructions there on how to validate your own documents.

There are a number of other helpful resources located on the sidebar, including a CSS validation tool. Also informative are the View XML and View PHP buttons below the validators.

Please take a moment to review the seven most common DOCTYPEs. For this series we’re interested in HTML 4.01 and XHTML 1.0 Strict, Transitional and Frameset, and XHTML 1.1. You’ll also find there a small XHTML template that helps visualize a complete and valid document.

My PHP Labs are designed to follow the formal structure of an XHTML document in a top-down manner, in much the same way as you would read the source code to this one.

  1. HTTP Protocol
  2. DTD Declaration
  3. Dynamic Metadata
  4. The <body> Element
  5. Markup Toolkit
  6. Law of Closure
  7. Eureka!
Last updated: Tuesday, May 29th, 2007 @ 9:41 PM EST [2007-05-30T02:41:04Z]   home

(c) 2006-2008, Douglas W. Clifton, loadaveragezero.com, all rights reserved.