November 16, 2006

Arguing about Page Rank

It’s inevitable when dealing with search marketing clients that the question of PageRank will be raised. PageRank is one of the best known and most widely recognized site status metrics which is easily accessible to the lay site owner. It’s not, however, a particularly useful metric, and can, in fact, be highly misleading. How do you inform your clients of the truth about PageRank?

You can’t take the easy out. Just telling your client, authoritatively, that “PageRank is not a usable metric” will do nothing for you: they’re not convinced. You have to find a way to show them the fact that this abstract number attached to their website is not relevant to their search marketing strategy.

So what are the relevant points?

True PageRank is not Available

Matt Cutts has stated on his blog that:

I believe that I’ve said before that PageRank is computed continuously; there are machines that take inputs to the PageRank algorithm at Google and compute the resulting PageRanks. So at any given time, a url in Google’s system has up-to-date PageRank as a result of running the computation with the inputs to the algorithm. From time-to-time, that internal PageRank value is exported so that it’s visible to Google Toolbar users

Matt Cutts, More Info on PageRank

Further, in the same article, he’s stated that:

It’s more accurate to think of it as a floating-point number. Certainly our internal PageRank computations have many more degrees of resolution than the 0-10 values shown in the toolbar.

Although many would argue that anything Matt says must be taken with a grain of salt, this particular factoid has been reiterated enough that I’m convinced. PageRank is only made available to the public “from time-to-time.” When it is made available, it is made as a 0-10 integer which is relative to a far more complex floating-point number. If you track these updates, like at SEOCompany.ca, you’ll observe that they occur generally speaking in three month-ish intervals. They aren’t regular, and they aren’t frequent. So: Available PageRank numbers are historical and approximate.

PageRank is correlating all web pages in Google’s Index

Keep in mind: you’re competing against your competitors. Sites within your same field, selling the same products, offering the same services, etc. But PageRank is ranking all of the pages of all the websites which Google has indexed. A PageRank of 3 in one industry is not correlative to a PageRank of 3 in another. If you’re going to compare PageRank at all, you need to keep firmly in mind that your PageRank does not necessarily need to be a high number. Make comparisons exclusively within your industry if you want to get any meaning at all.

PageRank is not related to traffic

Currently, this site’s index page has a PageRank of 4. The blog main page had a PageRank of 5 before I switched to Wordpress and changed all the URLs, and is now unranked. The site is 8 months old. The site receives approximately 100 unique visitors a day. This is essentially unchanged from before the WordPress switch.

Another site of mine, Joe Dolson Accessible Web Design, currently has a home page PageRank of 3 - with it’s associated blog bearing a PR of 4. That site is 2 1/2 years old, and receives approximately 400 unique visitors per day.

You can draw your own conclusions.

Is PageRank irrelevant?

No, not entirely. PageRank conveys some very basic information about your site: has Google gotten around to indexing your page, have they found backlinks to it, etc. But it doesn’t and shouldn’t be interpreted as any kind of goal-oriented metric. It’s better to pursue valuable content, links, and traffic than to attempt to reverse-engineer your PageRank.

Filed under: Google, Search (General)

Big news for Google Sitemaps

‘Cuz it’s not just Google, anymore. From Andy Beal, Google, Yahoo, and MSN are now supporting a common XML sitemap protocol - with a slight version increment to Sitemaps 0.9 and a brand-new .org: http://www.sitemaps.org.

Sitemaps.org, although certainly having little-to-no visual connection with Google, seems to still be hosted by Google, per the 404 error (hey, I went looking for http://www.sitemaps.org/sitemap.xml - it wasn’t there.) This is pretty fair, given that the joint use of the Sitemaps standard shouldn’t also force Yahoo and MSN to use Google branding!

Nonetheless, the Sitemaps protocol is pretty deeply associated with Google, so this is somewhat of a branding coup for them.

This unification between the “big three” in search on an agreed standard is a great development: I’m looking hopefully forward to a day when I can get similar information on the behaviors of each major search engine on a site through a “Webmaster Console”. Or perhaps an API release so a 3rd party can create a unified sitemaps monitor for all the search engines using the protocol!

The announcement comes from Google, Yahoo and MSN, but that isn’t to say that others can’t become involved: the protocol is intended to be an open standard, and any who choose can make use of it. Danny Sullivan comments at Search Engine Watch on his own hopes:

How about unification around other search standards, such as improving the robots.txt system of blocking pages. Again, this is something the search engines (specifically Google and Yahoo when I spoke to them), say they’re interested in. So fingers crossed, we’ll see more of this down the line.

Overall, I’m thrilled. It took nearly a decade for the search engines to go from unifying around standards for blocking spidering and making page description to agreeing on the nofollow attribute for links in January 2005. A wait of nearly two years for the next unified move is a long time, but far less than 10 and progress that’s very welcomed. I applaud the three search engines for all coming together and look forward to more to come.

No question, it’s an exciting development.

Filed under: Google, Site Development

September 25, 2006

Intellectual Property, Search Engines, and the Law

It seems that copyright infringement is a chronic complaint against search engines - and, to be honest, particularly against Google. At least from a news perspective, Google is the only search engine ever suspected of infringing copyright despite the minor detail that their engine does behave more or less the same way as any other.

Regardless, the recent successful Belgian lawsuit has brought a number of interesting issues to mind.

The burden of copyright prevention is in no way a requirement to maintain copyright. No content producer is required to make use of tools to erect barriers against copyright infringement. In this case, where the basis of the lawsuit is primarily based on the presence of cached pages in Google’s index, the question which pops to my mind is:

What relevance should preventative measure play in copyright law?

The DMCA (which, granted, is in no way relevant to the Belgian courts) specifies that it is a contravention of copyright law to circumvent measures taken to protect copyright. This could mean that Google would be in violation of the DMCA if websites were to make use of features such as robots.txt, noarchive, or nocache and Google failed to acknowledge and respect those rules.

However, the bill says nothing which provides any protection for an automated service which has made provision to allow publishers to protect their materials when those publishers do not make use of the copy protection provisions.

It is trivial for content publishers to prevent Google for misusing their material. However, according to law, the onus lies fully on Google to avoid copyright infringement.

Bill Slawski, who very helpfully transcribed portions of the judgement, comments concerning the Court’s addressal of these issues:

Regardless of how the Court may have felt about those options, I think that they should have been addressed in some manner. The failure to do so makes it appear that they either weren’t provided information about those by their expert, or didn’t understand them, or may not have addressed those issues on purpose.

It would have been very much appreciated had the court ruling actually made any mention of the methods available to the newspapers to prevent this issue.

Google has also made their own public statement concerning the case which specifically mentions the ability publishers have to prevent the indexing of their content. One suspects that they would really like publishers to know that there are other means of accomplishing their goals than a lawsuit.

I wonder how much of the problem has to do with business model and communication. The goals of a business are very diverse: and the diverse business units may not always be working towards the same ideals. These Belgian newspapers sites may be a good example.

I can easily imagine that the web development team was gung ho about making certain their websites were well-indexed and represented in Google.be. The copyright protection team, on the other hand, noticed one day that their content was showing up on somebody else’s website! Not being technologically savvy, the legal team talked with upper management and a group of papers pressed a lawsuit. The teams, each protecting and supporting their company in their own ways, may have acted literally in opposition.

This is an entirely hypothetical scenario, of course - I don’t have any kind of inside track to know what actually went on in the development of this lawsuit. Nonetheless, I can’t help but be curious just how these publishing companies perceive the value of their website presences.

« Previous Page | Next Page »