Protocol Buffers: Reinventing the past
Stories Columns Opinions Resources
Sun extends Groovy, PHP support to NetBeans
Version 6.5 of the IDE will see complete support for those two languages along with comple...
|
Sun reorganizes its software production infrastructure
Facing economic hardships, lost revenue and loss of employees, Sun has split its software ...
|
Adobe steers Flash toward RIA implementation
At this year's Adobe MAX Conference, the focus was on Flash, this time making Flash more o...
|
BigLever builds a bridge to SCM with Gears
The Gears Universal Configuration Management Bridge allows CM systems to integrate with Ge...
|
SOA Watch: New economic realities
In the current economic downturn, agile programming and SOA are attractive options that bu...
|
Integration Watch: A new twist on threads
The key to raising the efficiency of multiprocessors is to shrink the overall workload by ...
|
Integration Watch: The Return of NetRexx?
Java scripting languages are seeing a surge in popularity, with NetRexx looking particular...
|
Windows & .NET Watch: Transaction crowd gets a boost
With multicore chips becoming the standard for processors, the need for a flexible, usable...
|
From the Editors: Election should shake up JCP
Rod Johnson has the right ideas for opening up the Java Community Process, and he may be a...
|
Letters to the Editor: Sun gives REST, SOAP choice
A reader takes issue with a headline on our story about Sun working with REST along with S...
|
Guest View: Be smart and lazy
The optimal solution for problems is the simplest one, so always aim to streamline your ap...
|
Zeichick's Take: From EXEC to EXEC 2 to REXX to NetRexx
Andrew Binstock's column last week, "The Return of NetRexx," brought back some fond memori...
|
Practical tips for saving money on code maintenance
If software design is expensive, well, code maintenance is even more so. When you look...
|
Transform your app-dev quality by involving the whole community in testing
As the saying goes, the more eyes you have on software, the shallower the bugs. That’...
|
Build your dev and test labs for less – a lot less – with virtualization
You don’t have the budget to equip developers and software test teams with all the har...
|
Software Common Hacks and Counterattacks: A Guide to Protecting Software Products against the Top 7 Piracy Threats
Software piracy continues to be a growing epidemic. This white paper examines prevalen...
|
By Andrew Binstock
August 15, 2008 —
For the decades prior to the widespread adoption of XML, the computer industry sought ways to find an easy way to transfer data across systems. Many, many options entered into vogue, then disappeared or eventually found acceptance in small utilitarian niches.
The CORBA CDR format, Sun’s XDR (eXternal Data Representation) and the various EDI (Electronic Data Interchange) protocols are examples of the latter category. A few other standards, such as ASN.1(Abstract Syntax Notation 1), live on out of sight even though widely used; and because they are unseen they don’t come to mind when an internal or external data exchange protocol is needed.
Even after the wide embrace given to XML, standards bodies such as the IETF have continued formulating new standards. One example is SDXF (Structured Data eXchange Format), which is a non-text-based format that has much of the same structure of XML, although the data types are self-describing.
Formulation of so many standards arises in large part because the balance that must be struck between three contending forces at play in the design of a data-interchange protocol. 1) Speed: How fast can the data be marshalled and then unmarshalled? 2) Ease of use: How simple is it to describe the data in the appropriate format from a variety of programming languages? 3) Expressivity and other factors: How well does the format express the data needs of a particular domain? Also, there are the pragmatic concerns such as security (can it be encrypted when sent over the wire?) and compressibility.
Curiously, protocols that fail at one or more of these aspects can still become popular. XML is one example. XML is more of a data structure representation than an interchange format, as it requires some definitions that both ends agree on (generally specified in the form of a schema or DTD) so that validation of fields and the type of a data item can be identified.
Beyond the need for supplemental support, XML has some fairly horrid characteristics, the worst of which is speed. XML is an extremely wordy protocol that is very slow to parse. Sites that run enterprise Java and profile their applications frequently find that the single biggest time sink is the encoding and decoding of XML documents. In nearly all cases, it is by a wide margin the most time-consuming activity.
XML’s ease of use is not great, although it’s better than it once was due to the fact that there are many good parsers available. Even so, the requirement to translate certain characters, and the consequent need for CCDATA, mean encoding can at times be complex.
Finally, of course, XML expressivity is abysmal. Data must be structured, items can at times be categorized only in one specific hierarchy, and each hierarchy level must encompass all lower levels, plus repeating patterns of fields, such as a list of employee records, must repeat all the field names and hierarchy for every instance.
I could go on about XML limitations, but the key objection is surely performance. Google, which is a company that is intensely focused on performance, examined XML and decided that it was far too slow for the company’s high-speed requirements. In true Google tradition, rather than use an alternative existing protocol, it came up with its own: Protocol Buffers, a new format for serializing data that the company open-sourced in July.
The data structure for Protocol Buffers data items is described using a simple data description language, reminiscent of CORBA’s Interface Description Language. This description is saved in a .proto file, which is then compiled into C++, Java or Python.
The Protocol Buffers meta-language is surprisingly extensive. Both sequential and hierarchical data structures can be defined. And data items, covering all the standard string and integral types, can be marked required or optional. Fields can include enums, which is a common mechanism for tightly associating a specific value to a variable name.
Protocol Buffers is a binary format. This is a crucial dimension for the performance, as it means that long text strings and field names do not have to be parsed for each item. In addition, the marshalling and unmarshalling functions are built directly into the code, via the .proto compiler—essentially guaranteeing that every aspect is optimized for speed.
Unlike earlier technologies released by Google that still had an experimental feel to them, Google has been using Protocol Buffers for years and in almost all aspects of the services they provide. So, this is a technology that has been extensively tested… and optimized. If you need to transfer data in C++, Java or Python and want a fast, simple solution that avoids the penalties imposed by XML, try Protocol Buffers. I think you’ll like what you find.
Andrew Binstock is the principal analyst at Pacific Data Works. Read his blog at binstock.blogspot.com.
Related Search Term(s): XML, Google
Share this link: http://sdtimes.com/link/32599