CTP for SQL Server 2008 now available

By Scott M. Fulton, III | Published February 20, 2008, 3:41 PM

Its final release has been pushed back to as much as six months after Microsoft's big launch party for it, which is still slated for next week. In the meantime, potential customers are being given a taste of some new and game-changing functionality.

Yesterday afternoon, Microsoft posted its Community Technology Preview for SQL Server 2008, which will be general businesses' first, best look at the next edition of what analysts perceive to be Microsoft's fastest growing product line -- faster than even Windows itself.

But one of the key benefits of the new RDBMS' architecture may be something businesses will want to try in a safe environment such as a virtual server, unless they're absolutely certain they're ready to divorce themselves from SQL Server 2005. It's Microsoft's technique for data compression, and although it was introduced in SS 2005 SP2, with SS 2008 it has extended its reach to literally all data types within a table.

Typically, when you think of "data compression," your mind conjures ZIP files or Lempel-Ziv algorithms. That's not what this is about: With an active database, which is almost as fluid a substance as can be stored in memory or on disk, compression is a very tricky thing. On a static basis, compression has been tried at the page level (with groups of records in a table), but the excess overhead for the compression dictionaries and the time spent in maintaining those -- and repairing them -- has often proven not to be worth the bother.

It was IBM that pressed the issue on data compression, having tinkered with a new scheme on DB2 since about 1998 before finally, formally rolling it out in 2002 for DB2 Version 8. It's a row compression concept that doesn't require much excess coding: Essentially, when you create a new SQL table or set up a procedure for altering an existing one, you declare the new table scheme to be compressed using the declaration COMPRESS YES.

What IBM's scheme then does is take certain columns of information within a table -- columns whose data tends to be variable in length anyway -- and replaces it with links to a hidden index table called a compression dictionary. That table is then treated differently, using data compression techniques that would otherwise wreak havoc if applied to the entire data table.

Prior editions of DB2 tried similar approaches to data compression, with some admins reporting the results were quite the opposite. Data dictionary size grew out of proportion, to the extent that it might have been better if "compression" had never been applied.

The popularity of the existence of data compression, in any form, in DB2 got Microsoft to thinking about how to go about the task for itself, in a way that was compatible with its own, very different schematic. Granted, SS 2005 already offered file compression (note the distinction), though it only worked properly for read-only databases. But for SQL Server 2005 SP2, Microsoft began rolling out an alternate take on the concept, in the form of a new variable type for data in records, called vardecimal.

"This storage format can be enabled at a table-level granularity," wrote Microsoft engineers Sunil Agarwal and Hermann Daeubler in 2007. "When enabled, SQL Server stores decimal and numeric data in the variable portion of the row instead [of] the fixed portion. You can use vardecimal storage format to reduce the size of your database if you have tables with decimal and numeric data types. How much space you save depends on the number of decimal or numeric columns, the data distributions, and the size of the table(s)."

Imagine in your mind a column full of people's addresses. In the existing varchar format, the spaces that would fill the remainder of each entry following the end of an address, would be truncated instead of stored as zeroes, which would consume excess space. With vardecimal, the concept is applied to value columns. Although four bytes may be required to represent the highest values of any entry in a column, it doesn't necessarily have to consume four bytes for every entry, especially for some whose contents are short integers like 0 or 15.

While vardecimal was introduced as a "try-it-you'll-like-it" concept in SS 2005 SP2, for SS 2008, it becomes the standard storage format for all values. That fundamentally changes the constitution of the database.

But it does not change the semantics of the database program, which will still look and behave as though fixed storage is being applied. As Agarwal wrote last November, "One important point to notes is that even though the SQL Server stores these data types in variable length format, the semantics of the data type remains unchanged (i.e., it is still the fixed length data type from the perspective of the application). This means that you can avail the benefits for data compression without requiring any changes in your application(s)."

Some other additions testers can expect to find in SS 2008 include integrated full-text search, which utilize high-speed text-searching algorithms (the kind you'd find in Internet search engines) in SQL Server for the first time; and policy-based management, which extends to SS the principle of administration by rule that may become the hallmark of Windows Server 2008.

Comments

View comments by with a score of at least

This sounds kinda scary. This has got to make it harder to piece stuff together after disk corruption.

Score: 0

|

"Typically, when you think of "data compression," your mind conjures ZIP files or Lempel-Ziv algorithms."

Christ. Only if you're that oldskool. I know it's used in Adobe Acrobat and GIFs, but it doesn't exactly roll off the tounge.

Score: 0

|

And don't forget RLL - that was my personal favorite. :)

Score: 0

|

Comcast deal for NBC Universal is about content, not broadband

Although Comcast is certainly America's largest broadband provider, at least for PCs, in most regards, today's deal with GE may not impact the Internet at all.

Mark Russinovich on MinWin, the new core of Windows

The next version of Windows three years hence will likely build onto a significant architectural change implemented in Windows 7 and Server 2008 R2.

Fee or free? Murdoch, Huffington square off over the cost of Internet news

Participants in an FTC workshop yesterday witnessed the two extremes of the Web news publishing debate, still centered on the issue of long-term profitability.

Security firm: Windows patches not responsible for 'Black Screen of Death'

On second thought, maybe that access control list thingie with the lockdown something-or-rather didn't trigger an alleged, perhaps non-existent, pandemic.

Online advertising evolves away from display, toward interactive software

Marketing departments and agencies are increasingly establishing positions for "creative technologists" who can steer designers and developers toward platforms that enable direct connections with consumers.

Google begrudgingly adjusts news crawling for paid publishers

If publishers want to make readers pay for news content, and thereby drive down its popularity and Google ranking, the company says, they can just go right on ahead.

Apple settles with Psystar except for 'circumvention devices'

The fracas with the Florida clone computer maker might have ended today had Apple not have muddled the issue over a cheap piece of Psystar software.

Microsoft denies latest 'Black Screen of Death' claims

After an anti-malware producer announced a fix to what it says is a swarm of recent KSoD problems, evidence of the swarm itself has yet to turn up.

Latest Firefox 3.6 beta fixes 133 bugs, promises faster page load times

A once-sluggish beta testing process has kicked into overdrive, with astonishing success at finding serious bugs. Will Mozilla be able to fix all the others in time?

Confirmed: Office 2010 to ship in June

Two weeks after Microsoft had been expected to draw a clearer roadmap for its principal applications suite, it's finally ready to commit to the end of H1.

New EU antitrust commissioner will oversee Microsoft, Oracle+Sun, Intel issues

As one of Europe's most prominent politicians shifts positions in January, her replacement remains a question mark over technology's biggest issues.