CTP for SQL Server 2008 now available

By Scott M. Fulton, III | Published February 20, 2008, 3:41 PM

Its final release has been pushed back to as much as six months after Microsoft's big launch party for it, which is still slated for next week. In the meantime, potential customers are being given a taste of some new and game-changing functionality.

Yesterday afternoon, Microsoft posted its Community Technology Preview for SQL Server 2008, which will be general businesses' first, best look at the next edition of what analysts perceive to be Microsoft's fastest growing product line -- faster than even Windows itself.

But one of the key benefits of the new RDBMS' architecture may be something businesses will want to try in a safe environment such as a virtual server, unless they're absolutely certain they're ready to divorce themselves from SQL Server 2005. It's Microsoft's technique for data compression, and although it was introduced in SS 2005 SP2, with SS 2008 it has extended its reach to literally all data types within a table.

Typically, when you think of "data compression," your mind conjures ZIP files or Lempel-Ziv algorithms. That's not what this is about: With an active database, which is almost as fluid a substance as can be stored in memory or on disk, compression is a very tricky thing. On a static basis, compression has been tried at the page level (with groups of records in a table), but the excess overhead for the compression dictionaries and the time spent in maintaining those -- and repairing them -- has often proven not to be worth the bother.

It was IBM that pressed the issue on data compression, having tinkered with a new scheme on DB2 since about 1998 before finally, formally rolling it out in 2002 for DB2 Version 8. It's a row compression concept that doesn't require much excess coding: Essentially, when you create a new SQL table or set up a procedure for altering an existing one, you declare the new table scheme to be compressed using the declaration COMPRESS YES.

What IBM's scheme then does is take certain columns of information within a table -- columns whose data tends to be variable in length anyway -- and replaces it with links to a hidden index table called a compression dictionary. That table is then treated differently, using data compression techniques that would otherwise wreak havoc if applied to the entire data table.

Prior editions of DB2 tried similar approaches to data compression, with some admins reporting the results were quite the opposite. Data dictionary size grew out of proportion, to the extent that it might have been better if "compression" had never been applied.

The popularity of the existence of data compression, in any form, in DB2 got Microsoft to thinking about how to go about the task for itself, in a way that was compatible with its own, very different schematic. Granted, SS 2005 already offered file compression (note the distinction), though it only worked properly for read-only databases. But for SQL Server 2005 SP2, Microsoft began rolling out an alternate take on the concept, in the form of a new variable type for data in records, called vardecimal.

"This storage format can be enabled at a table-level granularity," wrote Microsoft engineers Sunil Agarwal and Hermann Daeubler in 2007. "When enabled, SQL Server stores decimal and numeric data in the variable portion of the row instead [of] the fixed portion. You can use vardecimal storage format to reduce the size of your database if you have tables with decimal and numeric data types. How much space you save depends on the number of decimal or numeric columns, the data distributions, and the size of the table(s)."

Imagine in your mind a column full of people's addresses. In the existing varchar format, the spaces that would fill the remainder of each entry following the end of an address, would be truncated instead of stored as zeroes, which would consume excess space. With vardecimal, the concept is applied to value columns. Although four bytes may be required to represent the highest values of any entry in a column, it doesn't necessarily have to consume four bytes for every entry, especially for some whose contents are short integers like 0 or 15.

While vardecimal was introduced as a "try-it-you'll-like-it" concept in SS 2005 SP2, for SS 2008, it becomes the standard storage format for all values. That fundamentally changes the constitution of the database.

But it does not change the semantics of the database program, which will still look and behave as though fixed storage is being applied. As Agarwal wrote last November, "One important point to notes is that even though the SQL Server stores these data types in variable length format, the semantics of the data type remains unchanged (i.e., it is still the fixed length data type from the perspective of the application). This means that you can avail the benefits for data compression without requiring any changes in your application(s)."

Some other additions testers can expect to find in SS 2008 include integrated full-text search, which utilize high-speed text-searching algorithms (the kind you'd find in Internet search engines) in SQL Server for the first time; and policy-based management, which extends to SS the principle of administration by rule that may become the hallmark of Windows Server 2008.

Comments

View comments by with a score of at least

This sounds kinda scary. This has got to make it harder to piece stuff together after disk corruption.

Score: 0

|

"Typically, when you think of "data compression," your mind conjures ZIP files or Lempel-Ziv algorithms."

Christ. Only if you're that oldskool. I know it's used in Adobe Acrobat and GIFs, but it doesn't exactly roll off the tounge.

Score: 0

|

And don't forget RLL - that was my personal favorite. :)

Score: 0

|

A real beta process at work: Mozilla fires up Firefox 3.6 Beta 2

In the clearest sign yet that public input really does help the development process, a flurry of bug detections provoked Mozilla to release Beta 2 of the next Firefox.

Snow Leopard and Windows 7 still can't crack the netbook problem

Apple has killed Atom support in OS X 10.6.2 and Windows 7 Starter Edition is stripped of "basic" functionality.

Microsoft's Top 3 advances in Exchange Server 2010

The latest round of changes launched today will impact how admins deliver services to e-mail recipients, and how much companies will pay along the way.

Firefox turns five: Thanks for giving us a choice

Carmi Levy | Wide Angle Zoom: No longer the phoenix rising from the ashes, Mozilla has carried on more than just Netscape's legacy.

Kindle for PC opens in beta, underwhelms

Amazon has opened the beta of Kindle for PC, a companion to the Kindle, but little else.

European ministers approve watered-down 'neutral net' language

The latest provision in the EU's telecoms regulatory framework would let businesses cancel individuals' Internet access, if they go to court first.

It's the US vs. the EU over Oracle+Sun and the meaning of 'open source'

Now that the EU is a virtual country, the US Justice Dept. is taking a stand in favor of its view -- and against the EC's -- that MySQL will survive under Oracle.

Qualcomm: $1.3 billion Samsung licensing deal unrelated to fair trade violations

Samsung has come to a 15-year licensing deal with Qualcomm over 3G and 4G wireless technology.

Nokia's 'limited number' of recalled chargers exceeds 14 million

Today, the Finnish phone maker has begun a recall of mobile phone chargers that are a shock hazard.

Ubuntu 9.10 upgraders report frustration

For those Wine aficionados out there, beware of the remote possibility that your Linux system could be infected by Windows-seeking malware.

Supreme Court considers patentability of abstract methods today

Can software that executes a formula for a business process qualify for federal patents? An appeals court already said no, and inventors are making their case.