Finding the Hidden Meanings in Google Code Search

By Scott M. Fulton, III | Published October 5, 2006, 4:36 PM

Generally with a search engine, you get the best results when your queries contain just enough related terms or criteria for the engine to determine a common context. Each new term in a query strengthens the context of the information you're trying to locate - for example, with the Google query strawberry fields -Beatles location.

Historically, finding a good example of source code on the Web has been a matter of crafting the right query that will hit on the common language text that's adjacent to the code you're trying to find. Today, Google's new Code Search facility begins exploring a new premise: Is there a way to apply the same search tool as for a common language query, in order to locate a passage of source code from Google's vast library of open-source pages?

The jury may still be out on this one. A lot about what one says using a common language can be decipherable using a search engine like Google without the need for a sophisticated language interpreter; usually, just the fact that certain terms are used close to one another is enough to establish a common context for them.

But source code isn't lexical by design, it's algebraic. Its reliance on purely variable symbolism is commemorated by the fact that programmers still use throwaway symbols like foo and bar as utility variables, usually with the lowest and least restrictive scopes.

Suppose, for instance, I were developing a route finder application for a mapping service, such as Google Maps. Something algorithmic that I'd be interested in seeing that pertains to that job would be a so-called "Hamiltonian cycle" or "Hamiltonian circuit" - finding the shortest round-trip path through a given number of vertices or map points, such that no single point is passed through more than once.

I can actually find such an algorithm quite quickly using the ordinary Google query line. But what if I want to see a plethora of examples? Since there's nothing specific about an algorithm that dictates the names of the symbols it must use (symbology in most code need only be consistent within the confines of its own modules), what I have to hope for is that the programmers whose works are archived here were generous enough to have supplied comment lines that use the term Hamiltonian, or perhaps thought to use the term either as or within the name of a variable.

The first seemingly meaningful snippet I turned up using Google Code Search was written in Python, from a library that generates a class whose name is Hamiltonian. At first, I thought I was in luck. But judging from the name of the program to which the library belonged -- called "Lanthanide" -- I realized this actually pertained to measuring the dual-photon absorption rate of lanthanide compounds, which are used in the doping of semiconductors. Interesting concept, but clearly the wrong context.

Further down the list of returned search results, I found a snippet of C++ code that represents a class called AbsHamiltonian. But following the logic in my mind and not seeing it pertain to mapping, I felt inclined to track down its terminology once again. This time, I discovered the snippet pertained to the simulation of a Hamiltonian matrix - an array of values used in multi-body mechanics and molecular dynamics, including quantum dynamics. Another very interesting field, but again, a digression.

The third time ended up the charm, as I finally located a snippet that pertained to the branch of science I was interested in, without the detour past two other fields of endeavor that were also similarly inspired by the work of Sir William Hamilton.

In this particular instance, I was interested in some element of source code whose purpose I could describe using common language. But what if I were interested in code not for its context, but for its construction instead? Would I be able to recall the specific way that certain class was instantiated, or just how many lines there were within the if loop I may be looking for?

Although Google Code Search is one way to attract traffic to the wide, wonderful sea of public source code, it may also be a lazy way for Google to provide access to it while avoiding having to catalog and categorize it all, hoping that its otherwise powerful query line tool would compensate for the lack of "handles" that would make code snippets truly useful.

Of course, more mischievous minds than mine would be the first to try this new feature out and pronounce their findings: Someone already discovered that software license key generators also fell under the category of "public source code," and located techniques for generating access keys to commercial software.

Google's is not the first attempt at a search engine for source code. Koders.com has been running a similar search line for quite some time now, although it also offers a tie-in feature with Microsoft Visual Studio. This way, programmers can find, download and link their code to pre-existing implementations of problems they may already be trying to solve, such as finding the simplest route that links all given points on a map.

Google may need to adopt similar tools if it has any interest in being competitive in this field; otherwise, this first rendition of Code Search, at least at first, feels like going deep sea fishing with a crossbow. It's a nice place, and it's a nice tool, but they don't mix.

Comments

This maybe repetitious, however all systems as well as bloggers have Flaws! "Google is a major contributor to internet innovations. My dilema is where are the disciples? What percentage of surfers have the knowledge and ability to absorb the snippets, source code, etc...Is catering to the elite or to the chosen few a wise direction for Google to embark on. When I view Blogs for instance I see waste so many of them are garbage with nothing there, simply because the user could not navigate the site and deal with codes "

Score: 0

|

Google is a major contributor to internet innovations. My dilema is where are the disciples? What percentage of surfers have the knowledge and ability to absorb the snippets, source code, etc...Is catering to the elite or to the chosen few a wise direction for Google to embark on. When I view Blogs for instance I see waste so many of them are garbage with nothing there, simply because the user could not navigate the site and deal with codes.

Score: 0

|

Like Koders.com, there is also Krugle.com

Though searches from Google Code Search reveal that they index the contents of .zip files, as many lua scripts I am familiar with are available by zip distribution and the lua files show up in the Code Search as seen in this result http://www.google.com/co...yHH7vcQ-U1g:sBapxR-FFkg

Score: 0

|

Before it can tackle Windows, Chrome must leave Safari in the dust

It's a little browser with dreams of becoming a bigger operating system some day. But while it's chasing Microsoft's dreams, Chrome's tail is being chased by Apple.

Silverlight 3 goes live on Microsoft's servers

Microsoft's answer to Adobe's Flash is (unofficially) here, with prospects of higher-speed, higher-resolution video and for the first time, 3D.

Best Buy-brand TVs to get TiVo

A new alliance will place the retailer's own brand alongide the manufacturers, and could also lead to future partnerships on services.

Three Android phones on the way from T-Mobile in 2009

T-Mobile's myTouch 3G, launched Wednesday, will be followed by two more Android phones later this year, but neither of them will be HTC's Hero.

LTE still lacks a voice

The 4G Wireless standard that Verizon hopes to show off before this year is out is still at a loss for (spoken) words.

T-Mobile's strategy to combat Apple's iPhone with Android

With a trio of Android phones now in the pipeline for 2009, T-Mobile hopes to break the iPhone's emerging stranglehold.

EC's Reding: Government should act as broker for media downloads

If Internet media services don't step up and build an attractive way for users to start paying for downloads, a commissioner says, government may do the job instead.

Sony TVs get Netflix, still no PS3

Though it's coming in behind LG, Samsung, and Microsoft, Sony will begin to offer Netflix streaming, too.

Google Chrome OS: Too little, too early

Carmi Levy: Wide Angle Zoom Don't start the revolution just yet, says Carmi, who isn't so certain Chrome OS will be the "Windows Killer."

GAO pen test brings the hammer down on federal rent-a-cops

But are the computers to blame for the contract-guard fiasco at FPS?

What's Next: Chrome OS will have at least some friends in high places

Also: South Korea takes another round of DDoS abuse, and Neelie Kroes and Steve Ballmer may shake hands before she exits stage left.

Data sharing among online advertisers: Is sanity in sight?

Lockdown with Angela Gunn In the middle of a 15-page plea not to get regulated, a spark of smart thinking.

PST Recovery Software 12.0

July 9 - 11:34 PM ET

Unistal Data Recovery 12.08.06

July 9 - 11:09 PM ET

BKF Repair 3.0

July 9 - 10:54 PM ET

Vuze for Windows 4.2.0.4

July 9 - 6:26 PM ET

UltraVNC 1.0.6.4

July 9 - 6:05 PM ET

WildBit Viewer 5.5 Beta 3.0

July 9 - 5:44 PM ET