July 13, 2007

Stopping Google from Indexing Your Site

All right, this may not be precisely your main goal online. Nonetheless, there are many web sites out there which give the distinct impression that this was they’re specific purpose in creation.

Yet there are, for what it’s worth, very valid reasons to block pages some times. The trick is to make sure you’re only blocking the right documents.

One of the yet-to-come interesting features for stopping Google is the unavailable_after meta tag, announced by Google’s Director of Crawl Systems, Dan Crow at a Search Marketing New England event this week. This is one of the most potentially useful document meta options, although the value may not be immediately apparent.

The point of the unavailable_after meta element is to inform Google that a page should not be indexed after a certain date. This could be used in situations such as:

  • Job Postings with expirations
  • Sale announcements
  • Special offer deals
  • Expired auction listings

Basically, this would be great for any document which expires. From a user perspective, it’s incredibly dissatisfying to arrive at an expired sales page as the result of a search. From a business perspective, at best you’re providing no value; at worst you’re angering the customer. If you remove the page altogether, it may take months before the search engine catches up with you — leaving you with a hefty share of 404 responses. If you could inform the search engine right from the start that your page would cease to be valuable as of a specific date, you could avoid this whole problem.

For when and how the tag will be implemented, of course, we’ll just have to wait and see.

Filed under: Google, Search (General)

February 17, 2007

NoFollow in the News

The “nofollow” microformat, intended for use to indicate to search engines that you don’t want this link to be followed, has been the subject of quite a few interesting posts recently. On the one hand, there’s Loren Baker’s 13 Reasons why NoFollow Sucks:

The NoFollow link attribute (rel=”nofollow”) was originally created to block search engines from following links in blog comments, due to the amount of blog comment spamming.

The theory is that if spammers are spamming in blog comments to get better SEO and anchored links for their sites, NoFollow would render such spam useless. Problem is, spammers still spam.

And on the other front, Ahmed Bilal, in response with Defending NoFollow Against Angry SEOs:

Google has taken a lot of flak on a lot of issues in the past few years - it’s a price an industry leader invariably has to pay.

Apart from Blogger spam (and their plans to control all of the world’s information and then sell it to the highest bidder :) ), NoFollow is possibly an issue that gets Google the worst possible press.

But is NoFollow really that bad a move, or is it something that’s being used to beat Google over the head by people who have grudges against Google?

Now, in general, my feelings are that nofollow has proved to be entirely useless as a method to prevent spam. It’s vaguely possible that spam would be 10 times worse today than it is had nofollow not been employed on many blogs by default…but I doubt it. Nofollow, however, does have perfectly valid and understandable uses. Ahmed exposes the most interesting value to the nofollow microformat by pointing out the actual purpose it carries:

Anti-spam plugins prevent spammers from posting spam on our blogs. NoFollow prevents spammy comments from polluting the search engines. There’s an important distinction - Google’s responsibility is to guarantee the best possible results. When did fighting the world’s spam fall under their responsibilities?

NoFollow was never expected to stem the tide of spam: it was, however, hoped to reduce the amount of spam in search indexes, allowing searchers to more easily retrieve valuable information.

Whether that has been a success is, certainly, a very different question. But that is the question we need to analyze in order to determine whether NoFollow has really been of any use, not whether more or less spam has been unleashed on the world; but whether we can find that spam in Google’s search index.

Now, this is a difficult question to test. This is far from the only means that Google uses to stop spam - the fact that you can’t find the spam sites which are being linked to in your spam comments using Google doesn’t necessarily mean that NoFollow had anything to do with it. If you’re anything like me, no spam comment has ever been on your blog long enough to be indexed. So, in order to identify spam which has been blocked by NoFollow, it seems you’d need to confirm the following points:

  1. This spam site has been successfully linked to using the NoFollow microformat.
  2. This spam site has only been linked to using the NoFollow microformat.
  3. This spam site has not been removed from the Google index using some other means.

And I’m not sure whether we can do that. Google may be able to; but I can’t.

Filed under: Google, Spam

January 24, 2007

Investigations into Google Privacy

Google Privacy is not a fancy new Google service. (Nice idea - personalized privacy management of your Google Account - I like it.) Nonetheless, doesn’t exist. The privacy of your information at Google is the subject of an investigation by the Norwegian Data inspectorate, along with privacy concerns at several other Norwegian search engines. This organization is attempting to answer a few specific questions, according to Pandia.com:

Well…maybe specific isn’t quite the right word, actually. The quote from the Pandia article, by Senior Engineer Atle Arnes of the Inspectorate, asks:

“Why do the search engine store the IP addresses [of searchers] for so long and are they using them for?”

This is actually a pretty wide-reaching query - I’d certainly be very curious to see the answer to the second part of the question. Somehow, however, I suspect that Google’s answers won’t be leaked very far out into the public unless they obfuscate any interesting part of the answer. Of course, it’s entirely possible that what Google does with the information they collect is nothing, but I think few people would believe that.

Privacy is a chronic concern in the Internet age. There’s no question that the information available to search engines can easily identify a person associated with their queries, even without any IP address or other uniquely identifying information. What the holder’s of this information will do with it is a curious question.

In theory, Google employees could know my calendar, my bank information, have access to my email, my search history, have indexed my hard drive, and know what websites I’m affiliated with and have webmaster privileges for. That’s a LOT of information.

What does Google know about you?

Filed under: Google, Privacy

Next Page »