RubberStamped.org

December 05, 2007

Google Custom Search - Business Edition Review


Emmanuel Evita from Google invited me to review the Google Custom Search Engine, Business Edition, and kindly provided me with access to the service for a case study.

I will use this software to enhance the usability and utility of my web directory,
Rubberstamped.org. Hopefully this case study prove useful to others who are considering implementing Google's hosted search service.

So What Is It?

The Google Custom Search Engine allows you to create your own search engine. Sign up for the hosted service, select a few options, and youíre done. The Business Edition has additional features, including customization of search results available through an XML API, reporting features that give insight into visitor behaviors, and the option to turn off ads and Google branding.

What Problem Does The Business Edition Of Google Custom Search Engine Solve?

In-site search can be a frustrating, both from the perspective of a user and webmaster.

If the search result relevancy isn't high, the user will go elsewhere. Search scripts and databases can be cumbersome and create administrative overhead.

The Google software provides you with site search that works. You get to use Googleís leading edge search technology, so the result sets are, as youíd expect, impressive. The service is hosted, so there is no administration required once it is setup.

You can also incorporate the result sets of other sites into the search engine, which is a feature I intend to use heavily. The service is very easy to implement, and the prices are competitive.

The Project & Goal




I run the directory Rubberstamped.org. We accept listing requests from webmasters and, following hand-review, we place sites in a category. Our aim is to create a quality, spam-free directory in which all results have been hand reviewed and categorized by editors.

The downside of this model is the lack of relevancy. Directory indexes can be shallow, which means the result sets can be weak when compared to a crawler-based engine.

To solve this problem, I intend to replace the existing directory search function with Google Business Search and incorporate result sets from other hand-picked human edited directories and verticals to use as backfill.

This approach will serve two purposes.

Firstly, we still get to keep a degree of control on spam, because we hand-select sources which operate a similar editorial policy to Rubberstamped.org.

Secondly, we increase our utility to the end user. If the utility to the end user is increased, we stand to gain more users, which should, in turn, convince people to list with us.

Google have placed a high degree of emphasis on human reviewed content. We will run a search service that includes hand-reviewed resources to provide a meta-directory search service for those users who prefer directories. Hopefully this study demonstrates how webmasters can compliment Google search by creating second-tier hybrid search services.

A description of the Rubberstamped meta directory search service, powered by Google Business Search, can be found here.

How To Set-Up Google Custom Business Search

The setup procedure is very easy.

For those of you running a search engine on one site, your list will obviously consist of just one site. You can add to the list at any time.

And thatís it. No really! Compare this procedure with implementing a search engine software script.

You can then sign up for different account levels, depending on your support and functionality requirements. Custom Search Business Edition starts at $100 a year for searching up to 5,000 pages, and extends to $500/year for up to 50,000 pages. Larger volumes of pages are supported through Google's enterprise sales group.

Core Sites

Iíve chosen these sites for inclusion based on their depth, freshness, and adherence to the hand-reviewed philosophy. We will be adding quality sites on a regular basis. Note: none of these sites are affiliated with Rubberstamped.org. Their results sets are incorporated via the Google search index.

The core sites, besides Rubberstamped.org, include:

Mahalo.com is a search service beta launched in May 2007 by Jason Calacanis. The aim of Mahalo is to track and build hand-crafted result sets for many popular search terms. Mahalo's directory employs human editors to review websites and write search engine results pages that include text listings, as well as other media, such as photos and video. Their emphasis on hand-review is similar to ours, so Mahalo is a good fit for inclusion.

BestofTheWeb.org is a directory established in 1994. They have around 172,000 pages listed in Google, and these consist of hand-reviewed results. The BOTW model is very similar to our own.

Yahoo Directory requires little introduction. Huge directory based around hand reviewed listings, which should help to provide needed depth.

Business.com is a well established business search engine and directory.


We have used a filter option so users can choose to search just Rubberstamped, or include backfill from other sites. Give it a go. The setup process took a few minutes, then a few hours spent reading the FAQs to learn the finer points. We'll be looking to delve into the API in order to further hone our result sets, and will make this a topic of another article.


December 20, 2006

Google & The Semantic Web

Good post on Googles' initiatives in the semantic web space.

Google already uses largely automated techniques to identify and deal with Web spam, email spam in gmail, click fraud, etc. We wonít begin by using completely automated techniques to process and make decisions based on data found on the Semantic Web and will be able to develop partly automated systems to decide what data can and should be trusted and by how much.


The notion of trusted data is important, and Google has such mechanisms in place, such as Page Rank. Part of the picture is human categorisation.

November 13, 2006

What Is The Semantic Web?

People have been using this phrase for years, although it hasn't always been clear what is meant by "semantic web".

In What Is The Semantic Web, Nova Spivack does a great job of explaining the theory.

Summary:

The Semantic Web is not separate from the existing Web.....It simply adds new metadata to the existing Web. It merges right into the existing HTML Web just like XML does, except this new metadata is in RDF....

March 21, 2006

Text Links

Excellent blog from on linking strategies from SEW Conference speaker Debra Mastaler.

March 09, 2006

Excellent Directory List

The Strongest List provides an excellent, regularly updated list of directories.

March 13, 2005

Free Directories

I've often been asked why we don't provide a free submission service.

As much as we'd like to, the model won't scale. The modest review fee we do charge keeps out the chancers, and ensures serious webmasters are provided with fast response times. We'll treat you well, and our users will benefit from fresh listings featuring the latest content.

For those people looking for free directory listings, try Dmoz, Zeal, and Wow. Or search Google.

January 31, 2005

Moving Beyond Outmoded Directory Models

We're turning round listing requests in far less than 48 hours. Usually within the hour, so we're exceeding the goals we've set for ourselves. Thanks for all the kind emails about our response time!

People often ask how we can hope to be comprehensive if we don't accept free listings. The thing is, we do list sites for free. Works like this - for every paid listing request we accept, we seek out and list a non-commercial listing in one of the less commercial categories. The end result should be a good balance of commercial and non-commercial interests, thus mimicking a real world environment.

January 22, 2005

The politics of search

As pretty much expected when politics come into play, DMOZ pulled the listing of RubberStamped. Apparently, not enough listings, although when we asked how many listings were required, we didn't get a reply. Que Sera Sera.

When I posted the "No Sleep 'Til Dmoz" thread, the objective wasn't really to get listed. It was to illustrate the slow turn-around time. Suffice to say, if Rich hadn't stepped in, my submission would have rotted along with the hundreds of others that no doubt sit in the queue. DMOZ should be turning around all submissions in far less than eight days. If the submissions are too numerous to deal with, then surely policy and procedures need to change to address that problem? If policies cannot be applied consistently, then what use are they?

The judgement of "quality" or "completedness" of this directory is a red herring. The real issue is that there are many directories on the web that do qualify to be listed but aren't because the queue has been neglected. And because the category isn't fresh, it isn't particularly useful. Will we see the category carefully maintained and kept up to date from now on? I doubt it. It will likely be left dormant like so many others. DMOZ has long since lost sight of it's original vision.

I don't begrudge DMOZ as a whole. They're overwhelmed with submissions and subject to the political problems that come with "free" and volunteerism on that scale. What I don't see is any will or effective strategy to solve those problems.

Those are the real issues that need addressing :)

January 21, 2005

DMOZ come through. Sort of.

Here's how not to do marketing.

There are two main features to the Rubberstamped directory: turn-around time and fresh content. To illustrate this, we decided to show the problem with DMOZ i.e. slow turn around time and lack of fresh content. We submitted to DMOZ, then planned to count the days to acceptance. Of course, this count would go into the hundreds before acceptance. It was highly likely we would never be accepted.

We were wrong.

What happened was Mr DMOZ founder himself, Rich Skrenta, got in touch, personally, and mentioned that we had been added to DMOZ. Days to DMOZ: Erm...eight. And a personal note from the founder.

Heh.

How embarrassing. The least we could do was approve Mr Skrenta's listing request in RubberStamped :) Rich has also been so good as to agree to an interview, and talk about what he's up to these days. We'll be publishing this on searchengineblog.com as soon as we're done chatting. That may take some time as Rich is a very interesting chap.

So, DMOZ:1 RubberStamped: 0

Own goal.

January 18, 2005

Mining the deep web

It's interesting trying to find that data beyond the reach of the search engines, or the great sites that, for one reason or another, have been buried in the search results.

Here's a few tools I've been using to look way beneath the surface:

Specialized vertical search engines
Deep Query Manager
Gary Price's compilation of links to deep search interfaces

More to come. Started a deep search category.