Digg makes official its adoption of a 'semantic Web' standard

By Scott M. Fulton, III | Published May 2, 2008, 12:14 PM

It could be the very thing the Web has lacked all these years, even with its wealth of intermingled hyperlinks: a markup language for conclusively identifying context. Now, Digg is making the bold attempt to be its biggest "beta tester."

One of the principal deficiencies all these years about HTML or XHTML as a markup language has been the absence of any genuine, built-in feature for explaining to indexing services or even to browsers with intelligent features, just exactly what a page contains at a granular level. Metadata could conceivably help categorize data, assuming everything on a page had the same category; but with more Web pages these days constituting whole blogs, whole-page metadata is rapidly becoming useless.

The W3C standards body in charge of maintaining and developing the language of the Web has actually been addressing this problem for several years; though while it's been busy building "standards," by the objective definition, relatively very few sites have actually tried implementing them. Last month, one major exception was Digg, the social news aggregator which quietly began trials of a W3C standard for labeling contextual data at a very low level -- meaning, right next to the data itself.

The concept is called RDFa, and essentially it's a way to put W3C's existing RDF contextual markup language to real-world use by converting it to XML form. It borrows RDF's rather inspired way of explaining what an element means, or what the space being reserved for an element (say, from a database) should mean.

It calls for a stretch of the imagination a little bit because it uses a loose metaphor from the realm of common grammar: All context -- everything that can symbolize relative relevance in text -- can be represented in terms of a something "x" that does a certain something "y" to a something "z." In this case, "x" is the subject of the relationship and "z" is the object, in the grammatical (not the programming) sense. The action of that relationship is the "y," which in RDF is called the predicate (note, not "verb").

These three items together form what are called triples in RDF; and in the XML-based RDFa notation, a triple can be embedded into an HTML element -- such as <P>, <H2>, <IMG>, or <SPAN> -- in such a way that it effectively describes the context of the element's contents. This happens after you merge the RDFa namespace into the XML for the page.

For example, a paragraph about a subject defined by an online resource can include in its <P> element bracket an attribute about: that is set as a member of a defined class, and that points to the HTTP address of that resource. Then whenever the name of that person specified in the resource is referenced, that name can be placed in a <SPAN> element with an attribute such as contact:name, where contact is a property defined in the specified class. This way, an index or smart browser can detect when and where a paragraph is about a person whose name is indexed and catalogued by an outside resource.

The "triple" in this case is easy: The subject is the person's name, the predicate is the act of naming, and the object is the resource where the name is catalogued. Imagine if you ran a certain online resource -- say, a wiki/encyclopedia thingie of some sort -- and what an advantageous position you might be in.

Digg's involvement in all of this came by way of a very brief announcement on its company blog yesterday, where principal member Steve Williams wrote, "We've added RDFa, making Digg part of the 'semantic web' where Web pages become more sophisticated, beyond simply words and pictures."

But Williams is actually an active proponent of Digg's involvement in new and emerging standards, as demonstrated by his announcement last January of its entry into the DataPortability project, the gathering place for standards efforts in the field of data exchange, of which RSS and RDF are two prominent members.

Other brief mentions on Digg's blogs over the past month have been the only indications the company has been giving to the world of its direct -- and perhaps even principal -- involvement in RDF and RDFa, besides a simple check of the site's own source code, where attributions such as rel="dc:source" property="dc:title" within <DIV> elements are now common. A few weeks ago, developer Bob DuCharme discovered these little attributions and began playing with them to discern their viability.

On his personal blog, DuCharme wrote, "The first few times I tried the RDFa Highlight bookmarklet, which puts red rectangles around all the parts of a Web page that have RDFa metadata assigned, I didn't think it was very useful; I thought, OK, red rectangles, what can I do with them? My experience with Digg changed my mind. A single button click gives a very quick and intuitive display of how much RDFa a page offers to work with."

The possibility exists for a kind of mega-meta-source to emerge from Digg, where interesting news topics are associated with cataloged resources. But for that to actually work, someone has to manage those resources -- and that effort will take a level of humanpower and resources of another kind (the kind symbolized with "$") that RDF won't provide even the most ambitious sites just on its own.

Comments

This is a significant milestone in the Semantic Web movement. I see RDFa and Microformats are important technologies that will define the success of the Semantic Web. http://tinyurl.com/4fcuor

Score: 0

|

How can you have an article about rdf and no mention of Apple?

Score: 0

|

If this works out it should prove to be rather interesting. I wonder if they have given any thought to RIAs (Rich Internet Applications) such as Flex or Flash? especially since they are becoming more and more common...

Score: 0

|

All hail! Good job! Welcome! Kudos!

Score: 0

|

Why would Windows 7 customers spend $120 more for BitLocker?

For pre-orders from now until July 11, Microsoft is offering the Windows 7 Professional SKU for a very steep discount. So why invest in Ultimate?

Will Oracle's Java-based Fusion middleware 'fuse' with Java?

Now that Oracle has acquired Sun Microsystems, Java developers and supporters are wondering when Oracle will formally welcome Java into the family.

Vista's dead: Microsoft kills an OS and no one cares

Carmi Levy: Wide Angle Zoom Can you kill an operating system? Microsoft is about to find out.

Dish users may continue using DVRs as appeals court stays injunction

An injunction that was slated to go into effect soon, shutting off Dish Network customers DVRs and preventing their future sale, has been put on hold.

What's Now: Recording industry wins big against Usenet file sharing service

Plus: A Linux developer cuts back on the FAT, and now Nvidia's at loggerheads with Apple.

Hybrid satellite cell phones aren't far off

The first satellite in Terrestar's hybrid cellular/satellite phone network has been launched.

SMS could be a critical iPhone vulnerability, says white-hat hacker

Mac hacker Charlie Miller knows how to get into your iPhone.

All together now: iPhone and Palm Pre, likely to both grace O2's UK portfolio

European wireless network operator O2 has reportedly reached a deal to exclusively carry the Palm Pre in the UK. O2,...

Google talks spam trends, spiffs up Gmail labels

More organization, a little less riff-raff in your inbox.

ASCAP wants money for your ringtone

'Performances' ought to be compensated, says composers' group.

The law vs. the right to know: Whose news is it anyway?

Lockdown with Angela Gunn An appeals court judge would award 'control' of a news item to its biggest reporter.

A Michael Jackson post-mortem on Internet journalism

Scott Fulton On Point An artist is being laid to rest, and something that would purport to be the new electronic press is congratulating itself prematurely.