DeWitt Clinton On The Birth of OpenSearch

OpenSearch is a common way of querying a database for content and returning the results. The idea is that it brings sanity to the proliferation of search APIs, but a realistic view would have to admit that we’ve been trying to do that since before the development of z39.50 in libraries decades ago, and the hundreds of APIs that have followed have all well intentioned and purposeful.

So what makes makes OpenSearch something more than an also ran in a crowded herd? Part of it is in what it doesn’t do. “Rather than reinventing the wheel, it uses the simple and very popular syndication formats RSS and Atom, along with a document describing the search engine.”

DeWitt Clinton helped create the OpenSearch protocol while working at Amazon’s A9.com. DeWitt is currently at Google, but he’s continuing his work on OpenSearch as an open, Creative Common’s licensed specification, and I caught up with him there to talk about what it takes to develop an open format.

My first questions were about where OpenSearch came from.

DeWitt Clinton: Amazon launched a wholly owned subsidiary called A9. This was in late 2003, and revealed the first beta site in early 2004. A9’s mission was to explore search and to see where search could be done better.

One of the first things that we launched was the A9 front-end search interface, including search results from Google and handful of other partners. We integrated the different search results and displayed them to users, which was, I think, relatively novel for the time. It was a multiple column display where you could do one search query and see search results. They weren’t necessarily interleaved, but they were aggregated on screen.

We worked with Google’s search API, Answers.com’s search API, we worked with a few other search APIs and we started talking to additional partners about getting their searches into A9. There were a number of companies that had search engines, but far more often than not, they also had proprietary search APIs.

Basically, if you were a search company — if you were Answers.com or something like that — you would say, “OK, I can accept search requests and I’ll going to give you search results back, maybe I’ll use this XML format, maybe it’s going to be SOAP, maybe it’s going to be something else.”

So we worked with a couple more of these proprietary APIs and said, “You know, this is getting silly. We’re doing all this work on our end to integrate search results, maybe there is an easier way.” We looked around to see if there was a standard for search, and didn’t really surface anything specifically for web-based, web-type search. There were formats for more structured search, but web search is at best very loosely structured.

So we started to pick it apart, looking to propose a search format that our partners could use. But what would go into a search format? What are the common traits of search? What are the things that all web-search engines accept as parameters on the request and what are the type of things that they send back?

We started looking at the existing protocols — those that Yahoo!, Google, and even the smaller, more niche search engines had exposed — and asking ourselves what they were doing. We took the common elements from those formats until we found the subset that we could tell, just empirically, was going to cover at least the 80% case of what other people are already doing.

Then there was this moment when we realized we were inventing yet another proprietary format. You know, essentially a closed format. Fortunately, having done a lot of work with RSS in the past, we realized, “You know, search results are just a list. And the whole world is using RSS as a way of syndicating lists. So what if we — instead of trying to invent something completely new — what if we leveraged an existing protocol?”

RSS was already out there, already open, already extremely well-adopted, and had tons of client and server libraries available. Combining RSS-based responses, the extra search result metadata, and our new format for describing search interfaces gave us the common subset, the 80% case we needed for syndicated search. And that became OpenSearch 1.0.

There were three “lightbulb moments” in designing OpenSearch. The first was extracting the common features of web search. The second was leveraging existing formats, such as RSS. The third “lightbulb” was in asking the question: “who benefits if this is a proprietary A9/Amazon solution? Is the world a better place, is even our business better off if this is closed and proprietary?” And the answer, very clearly, was “no.” With that the decision was clear, “You know what, let’s open this protocol. Let’s use the Creative Commons as a way of opening the text of the format of the protocol.”

DeWitt Clinton, OpenSearch, interview, open formats, protocols, search, search syndication, RSS