Google releases its data encoding format to compete with XML

By Scott M. Fulton, III
Published 17 years ago

In an effort to solve the bulk and time-consumption problem when encoding large databases, Google developed its own alternative to XML. Yesterday, the company began evangelizing others to use it as an alternative to the industry standard.

There's an argument that open standards are only truly useful when one standard applies to any given category of service -- an argument that was raised in the matter of application formats. Now the broader category of data encoding -- handled nowadays by XML -- is about to receive a big challenge, ironically from the group perceived as the champion of open standards in Internet communication: Google.

Yesterday afternoon, Google publicly released documentation for a system it has been using internally, called Protocol Buffers, inviting others to use it as well. And in a surprising blog post, one of its own software engineers argued that its system was preferable to XML because it's less expensive to deploy, and can more easily scale up to very large databases.

"As nice as XML is, it isn't going to be efficient enough for this scale. When all of your machines and network links are running at capacity, XML is an extremely expensive proposition," wrote Google software engineer Kenton Varda. "Not to mention, writing code to work with the DOM tree can sometimes become unwieldy."

Google's public documentation shows Protocol Buffers (which has yet to be formally abbreviated) is indeed conceptually different from XML, in that it's rooted more in procedural logic than structural declaration. In XML, there's a schema which defines the structures of tables and recordsets, which is separate from the document that relates the contents of records in that structure.

In Protocol Buffers, by contrast, one file contains class declarations whose composition looks much more like C++. They're called .proto files, and they define structural prototypes for tables using object-oriented language with which many programmers are already familiar. Each member of a class -- analogous to an entry in a database -- has characteristics that define their types in memory, just like variables.

But here, in an unusual departure from the norm, the default values for these members are set to digits (for strings or literals) or values (for numerals) that define their place in a sequence -- where they fall within a record. Imagine if data were streamed onto recording tape, the way it used to be in the late 1960s and '70s. It's that streaming of the data sequence, without all the fenceposts, that differentiates XML from Protocol Buffers, by taking out all those markups that say when an entry or a record starts and stops.

Setting the data contents then takes place programmatically, using programming language constructs rather than a marked-up data file.

Under the heading, "Why not just use XML?" an overview page in the Protocol Buffers documentation reads, "Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers: are simpler, are 3 to 10 times smaller, are 20 to 100 times faster, are less ambiguous, [and] generate data access classes that are easier to use programmatically."

Some might argue that, in the effort to solve the bulk problem, Google didn't really invent anything new at all -- it simply reverted to the older concept of the interface definition language (IDL), a defining feature of the era of COM and CORBA. Google anticipated that argument, and yesterday Varda offered a pre-emptive counter-argument to the question, "Isn't it just another IDL?"

"Yes, you could call it that. But, IDLs in general have earned a reputation for being hopelessly complicated," Varda wrote. "On the other hand, one of Protocol Buffers' major design goals is simplicity. By sticking to a simple lists-and-records model that solves the majority of problems and resisting the desire to chase diminishing returns, we believe we have created something that is powerful without being bloated. And, yes, it is very fast -- at least an order of magnitude faster than XML."

14 Comments

Google releases its data encoding format to compete with XML

14 Responses to Google releases its data encoding format to compete with XML

Recent Headlines

Concerns mount around UK Online Safety Act

Neural networks and their effect on test and measurement [Q&A]

TikTok announces a suite of trust and safety tools

NVIDIA will support Windows 10 into late 2026

83 percent of credential stuffing campaigns target APIs

Attacks evolve too quickly for businesses to maintain truly resilient security

Google is testing an iOS-style navigation feature for Chrome

Most Commented Stories

Windows 11 25H2 has a new option to remove all unwanted Microsoft apps

This new Windows 11 clone is actually Linux and runs faster on your old PC -- get it now

Half of Americans think AI is a threat, the other half don't. Who's right?

This ergonomic AI mechanical keyboard is built for modern productivity

UpDownTool lets you move from Windows 11 to Windows 10 in just 5 clicks -- without losing any data

Never mind Windows 11, Windows Classic Remastered is the nostalgic Microsoft operating system you didn't know you wanted

Saying no to Windows 11 just got easier -- Operese automatically transfers your Windows 10 files and settings to Linux

IObit Software Updater 8 makes app updates faster and safer -- download it now