Zoetrope promises a view of the Web over time

By Angela Gunn | Published December 9, 2008, 5:59 AM

While it's just as easy to locate a 15-year-old page in HTML 1.0 as a blog post published this afternoon, the front page of last Friday's BetaNews is as gone as 49-cent gas. Enter Zoetrope, aiming to track Web information over time.

Time keeps on slippin' online -- but wouldn't it be cool to actually happen in something faster than real-time?

For now, Zoetrope is mainly a research project -- a joint effort from researchers at the University of Washington and Adobe Systems. The paper "Zoetrope: Interacting with the Ephemeral Web" (PDF available here) describes a fairly complex system of tools working in concert:

  • a visual programming toolkit and set of interactions for building queries
  • a new semantics and set of operators for manipulating streams of data gathered over time
  • indexing structures for handling the dataset
  • new ways of re-rendering pages in the dataset, and
  • the dataset itself, collected as often as needed (hourly? daily? by the minute, for certain purposes?) by a newly designed Web crawler.

Hasn't archive.org already been invented, you ask? Yes, and if you're wondering how a certain site has looked over time, it's a marvelous library...as long as by "over time" you generally mean "every few months." Zoetrope, on the other hand, envisions you tracking changes by the day or even by the hour, and linking them to changing data on other sites.

Eytan Adar, a researcher at the University of Washington, has published a number of papers on "temporal informatics" -- how the Web changes over time, and how we interact with that changing data. Zoetrope is part of his dissertation, currently underway, and he is the project lead.

Watching the successive changes to, say, BetaNews could be amusing, but the juice of the Zoetrope is in its lenses -- bits of code that excerpt part of a page and allow you to follow strictly that bit over time.

The flashiest version of the tech is the visual lens, which would let the user simply drag the mouse to tell the browser what to keep an eye on. But there are structural lenses and textual lenses, both of which compensate to some degree for the annoying tendency of site designers to scramble things around because they can.

Lenses, in other words, focus on one chunk of information on one page over time. That could be a lot of data, so the plan provides for filters on lenses -- "Watch the top-headline spot and grab anything that mentions the iPhone," for instance, or, "If nothing's changed on this page, don't retain a copy of the current peek."

Lenses and filters are "bound" or 'stacked" to form queries -- say, "tell me how the stock market reacts to news of variations in component supplies from various manufacturers, how long it takes for the market to show those effects, and which news sources have the biggest impact on prices." Results could be shown in various formats (timelines, cluster visualizations, movies), and the team has created a tool that ships data out to a Google Spreadsheet for even more fun.

There are already sites that let you track data over time, of course. Zoetrope's strength, as Adar explains, comes when people ask new kinds of questions that require synthesis of data from multiple sources. That may sound like deep water for people who don't spend their days correlating data, but don't count out the civilians. Adar says that though some advanced users would be comfortable designing their own queries, "What we've been thinking about recently is how to go beyond this core set of users to a broader audience. An interesting direction is to let people export the visualizations they create and allow those to be embedded in Web pages or e-mailed around."

It all sounds splendid, especially if you've ever searched cached Google pages or archive.org for something you know you saw once upon a time. As the project moves from research-project status to reality, though, it's going to take resources beyond the university. Adar expressed an enthusiastic wish to find some way to work with the archive.org crew, and foresees several possibilities for managing the tech resources necessary to make Zoetrope spin.

As Zoetrope's currently conceived, it doesn't require any action from the owners of searched sites -- only from the would-be Zoetrope user. Not that a little help wouldn't be welcomed, of course.

"It would be wonderful if site providers found enough value in Zoetrope to be willing to consider modifying their sites," says Adar. "I actually think that very small markup changes to the sites would help Zoetrope a great deal. For example, if the sites added 'hints' to the HTML that better marked what parts of the document were going to be consistent that would really help the way Zoetrope worked. But we're pretty pragmatic about it...it would be great to have, but we have to try to build solutions for Web sites that don't provide this information."

Even you, honored BetaNews reader, could be part of the solution. "We also think there is an opportunity to have Zoetrope work in a distributed fashion (P2P), since it would be hard to have one central service collecting every version of every page," says Adar. "Being able to have multiple 'observers' working together would be great, and having some standard to share this data would make a lot of things easier. An API for letting people build new applications using the Zoetrope data might also be something for us to look at in the future."

View comments by with a score of at least

What does AT&T's 'Mark the Spot' app say about service quality?

That's a question for Betanews readers to answer in comments to this post.

Windows fix for TLS security bug still forthcoming, won't be Tuesday

Anyone looking for a fix for last month's discovery of a potentially serious security hole in TLS and SSL may have to wait until everyone is ready to act together.

Google rolls out real-time search, Near Me Now, extended personalization

Over time, searches from PCs and mobile phones will grow even "more personalized." But what about user privacy and search results that give you "the truth"?

Intel's marriage of CPU and GPU not ready for prime time

Although there will be an Intel component this month that can compute and plot in parallel, Betanews was told today, it won't be based on Project "Larrabee."

Betanews Podcast: Rupert Murdoch and the buying stuff online problem

We'll have a more difficult time paying for online news if the underlying protocol for online payment has a big gaping hole in it.

Not the first, not the last, technology predictions for 2010

Carmi Levy | Wide Angle Zoom: The real truth is probably that what went around in 2009, will come around to haunt us next year.

Google Goggles: Hands on with the Shazam of the Real World

Google today unveiled Goggles, its visual search lab for Android devices that identifies objects by sight.

Microsoft: Windows 7 Family Pack wasn't 'pulled,' it just sold out

If you hurry, you may still be able to find the last Family Pack upgrade editions hanging around retail store shelves, but probably not so much online.

Clever iPhone game returns after being bumped over a name dispute

The game's simple concept and multitude of platforms and puzzles manage to pull off a retro, 8-bit style that's reminiscent of an old Atari game given a modern makeover.

An alternative to Research in Motion's enterprise e-mail? There's an app for that

Good Technology today released an iPhone app compatible with its enterprise e-mail solution.

Playing catch-up in 2010: Windows Mobile, BlackBerry, and Symbian

Microsoft, RIM, and Nokia are each working on improved mobile operating systems. But could these efforts add up to too little, too late?