WIRED:Google faces a quandary in its first tentative forays into the world of video and audio search, writes Danny O'Brien
THERE'S A part of the net that keeps the engineers at Google, Microsoft Live Search and Yahoo! up at night. It's the internet's equivalent of dark matter: all that data contributed by millions of sites that is ready and available for the public to visit, but is as yet unvisited and unindexed by search engines.
For search companies, that represents more than just an untapped resource - it's their blind spot, and perhaps the location of the one piece of information that might draw their users away to an upstart competitor.
Most scary of all for them - and perhaps anyone or any society which depends on the internet for its knowledge - is that the sector of "dark knowledge" is growing. And the largest part of it is the fastest- growing part of the web: video.
Google, the world's largest search engine company, is also the owner of one of the world's largest repositories of video, YouTube. But YouTube's search is terrible, because it can only find items that have been explicitly labelled by their creators.
Look for "Nasa moonshots" and you'll get a few of the uploaded clips of Nasa's moon landings. If you're looking for somebody talking about Nasa in an hour-long documentary, it won't appear in the results.
The words spoken in a YouTube video, or almost every video available online, are not available for net users to search. That's partly because to do so would require the spoken words to be translated into text, and speech-to-text recognition is still imperfect. That said, imperfect searches would be better than nothing, if ever used by major search engines.
But the real problem is that even our current humble levels of speech recognition demand large amounts of computer time.
Even though downloading, indexing and making available for search the text on a webpage is a relatively simple act for a computer, companies such as Google still have to devote many computers to repeatedly scouring the textual web, simply because it is such a large mine to plunder.
There was a time when video would have represented a much smaller seam within this wealth of text. But times are changing. Internet users are watching and uploading far more video than ever before.
The reason for that is a combination of maturing technologies: bandwidth is fast enough to watch video in real time, without waiting for it to download first.
It's bearable to upload video too and process it on the average desktop computer. Digital video recorders can be purchased for less than €50, and decent video footage created by a modern mobile phone.
A year ago, YouTube was hosting more than 84 million individual videos, and it has undoubtedly grown far larger since then; the site serves an estimated four billion video views a month to US users alone.
However, YouTube represents only a slice of all online video. One in eight posts to the blogging website Wordpress contains a link to an online video; sites outside the US compete with YouTube to host local video content.
Of course, you might argue that an index of the kind of videos that YouTube is famous for hosting would be a waste of anyone's time to search.
Most last less than three minutes and are more likely to show a dancing cat than a precious piece of vital spoken information. But the same was said of much of the textual web, and Google still managed to generate a multibillion business from what was missed by its predecessors.
You can see the quandary that Google faces by considering its first tentative forays into the world of video and audio search.
Earlier this year, Google started indexing political videos, uploaded to YouTube, which were connected with the US presidential election. This is undoubtedly useful: searching for "space exploration" gives me an Obama video that shows him going into more detail on his commitment to Nasa than his policy documents indicate (and a set of investment promises to Nasa's Florida voters that he might later regret).
But what a tiny sliver of the overall video on the internet! And Google's indexing of these videos is really just an extension of yet another challenge for search engines: indexing and understanding audio sources. There's no attempt to recognise and index faces, or objects, or backgrounds in these sources.
It's hardly surprising, given the processor demands of indexing just this small sample, that Google has been selective about what it examines.
But Google did not get where it is today by being picky about what it indexes: it won the web by being voracious and omnivorous for new data to provide.
With YouTube's videos sitting on its own servers, it certainly has a head start in exploring video. The fact that it has managed so little so far shows what a great challenge - or opportunity - the future of video search represents. Google, Yahoo! and Microsoft may represent the cutting edge of search, but there's plenty more gold out in those internet hills.