Google has the mother of all datasets when it comes to a repository of Web page markup. Recently, the Google Code project released a set of stats on authoring techniques, sampled from over a billion documents. To view the results you will need a browser that supports both CSS and SVG. Firefox 1.5 is a good choice.
The overview cites both John Allsopp's study of semantics and François Briatte's design survey. Interestingly, it also mentions microformats.org and even a study done by Mozilla completed using their JavaScript Web Spider. Also noteworthy: Google admits they are “not leading the way in terms of validation.”
There is a lot of material to go over here, and among the most interesting bits are the frequency (popularity) charts that describe the distribution of class names used by developers. It turns out the most popular map consistently to the elements that are being proposed by the WHAT Working Group's so-called HTML5:
This list happens to be in structural order, as it turns out footer is the most popular class name. I use all of them, as do many people (my ego informs me perhaps I contributed to these stats). I can't help thinking of this as a sort of social tagging system, in which popular class names define semantics and drive the design of markup languages by consensus.
We are the Web—good stuff.