April 14, 2006
A few weeks ago, I reviewed an early beta search engine called Mojeek. Well, Mojeek hasn’t gotten a huge amount of press, so that article pops up pretty easily in a search - and Marc, the creator, ran across that article and dropped me a line. In the conversation which ensued, he gave me access to a test account for his personal search - which I am now going to delve into in some depth.
Personal search accounts are currently only available in Beta testing, and must be requested personally - visit Mojeek Personal Search (Beta) for more information!
Mojeek Personal Search: Functionality
Mojeek’s personal search gives you the ability to create a set of preferred sites to search. You can list as many as you wish, as far as I can tell, provided that they have already been indexed by the Mojeek crawler. This is a similar functionality to that provided by Lushe.net. Where Lushe.net manages a database of your preferred sites but uses Google’s search engine, Mojeek centralizes these functions into a single site.
One additional feature offered by Mojeek is the ability to exclude sites from all Mojeek web searches. If you know that you NEVER want to read documents from, say, Fox News, you are able to exclude that site from your site results. While you are logged in, the option to add to your personal search or exclude from results is provided as part of all search results.
The personal search option is welcome and still relatively unique, but there are a number of ways in which it could be improved.
Combined with the flexibility of Mojeek’s alternative algorithm search options (still rudimentary), this personal search could have great potential. However, it would be very valuable if Mojeek could add a Lushe-like bookmarklet to add sites to your site search. I’m also concerned by the fact that you can’t add unindexed sites to your list - as I commented in my previous article, the Mojeek index was far from being the most current available, and one site I tried to add (this one) was not yet indexed. Granted, the site is only a month and a half old, but I could hope! A highly useful service would be the ability to request a crawl of sites which were not in the index.
Finally, it would be very useful to be able to establish more than one personalized search. At the moment, it appears that one user is associated with one personalized search - but I would consider it quite reasonable that I would want more than one personalized group of sites.
Mojeek Personal Search: Interface
In general, the interface for Mojeek Personal Search is simple and straightforward, just like their main search interface. The first view on logging in is of your basic personal information. There are only five other options of places to go: your personal search home page, three editing pages: personal information (optional, can be public or private), edit listed sites (sites to be searched by your personal search), and edit excluded sites (sites not to be searched by web search), and help. The phrase "Edit Listed Sites" isn’t crystal clear to me, as it doesn’t specify the purpose for the listing - perhaps "Edit Personal Search" would be more clear. However, it’s far from being a major point of concern.
The one thing I would want to change about the interface is that you cannot access your personal search from the main Mojeek home page. When logged in, you have the option to switch between searching the web and searching your own selected sites on all other pages - but on the home page you have no such choice. A minor thing, but something which would make the site that much more user-friendly.
Other Comments
It’s also worth noting that one advantage of the Mojeek personal search is the ability to use it as a site search tool. If you selected only your own site as the selected site, it will act effectively to search your site. However, this tool will only becoming truly useful if the indexing rate speeds up sufficiently to keep a current index.
The search engine continues to have great potential - between the personal search, site search, and alternate algorithm selection there are some useful tools available. The suggestions above, I believe, could allow it to build even further and possibly gain some footing in the search world. User-controlled search factors is likely to be one of the most valuable developments in search technology in the next generation of search.
For more information on user-controlled search factors, read Adding more factors to Microsoft’s sliders, by Bill Slawski. Microsoft is taking a different approach to user modified searching than Mojeek which may also have interesting repercussions for search.
April 10, 2006
With family and friends having been visiting for over two weeks now, and a weekend trip to boot, it’s been hard to keep up on industry news and interesting articles. Nevertheless, I’ve had plenty to read and pique my interest over the last five days. In fact, almost too much! There’s a good reason to writing frequently in a blog - you’re not left looking at quite as vast an array of interesting subjects that you want to write about. In the last few hours I’ve read or become aware of over a dozen topics or articles which fascinate me - but I still have guests, and not really the time to write on them all!
So I’m picking on a topic which, although perhaps not the most unique, is certainly very close to home - MSN’s search engine ranking technology.
If you want to learn how Google works, at least in generalities, there are dozens of places to look. Just do a search on Google, and you’ll be served up a pretty good variety of information. (Although it’s worth noting that the first result for "How does google work" brings up their April Fool’s Day prank for 2002.) The basics of Google’s search technology are fairly well-known. But MSN is not as open about their methods.
Recently, there’ve been a couple of good articles published about MSN’s RankNet system - a new algorithm incorporated into MSN’s search technology.
Search Engines and Algorithms: Optimizing for MSN’s RankNet Technology, by Jennifer Sullivan Cassidy, discusses the technology and how to optimize for it. Beyond PageRank: Machine Learning for Static Ranking (PDF), a publication from MSN’s research department, addresses the subject in predictably greater depth.
Thankfully for all of us, Bill Slawski summarizes these documents for us. However, he does focus primarily on the principles and subjects of the paper rather than on the practical application to search engine optimization. This leaves some ground still available for discussion!
What is RankNet?
RankNet is, in essence, a "learning machine" that takes the patterns of human searches into account, and learns from them, in order to provide more relevant results the next time around. They start from a baseline of predictions made that are input into its neural net…[snip]…They make their predictions with supervised learning…
Jennifer Sullivan
So it’s a little complicated. A couple of potentially applicable definitions of neural net include:
- Neural Network
- A type of statistical computer program which classifies large and complex data sets by grouping cases together in a way similar to the human brain. Used in data mining. (Audience Dialogue)
- A computational method for optimizing for a desired property based on previous learning cycles (training). (GenProMag)
- A member of a class of software that is “trained” by presenting it examples of input and the corresponding desired output. For example, the input might be a magnetic anomaly and the required output the depth to the source of that anomaly. Training might be conducted using synthetic data, iterating on the examples until satisfactory depth estimates are obtained. (Geop.itu.edu.tr)
For machine learning:
- Machine Learning
- The ability of a machine to improve its performance based on previous results. (University of Illinois at Champaign-Urbana)
- Subspecialty of artificial intelligence concerned with developing methods for software to learn from experience or extract knowledge from examples in a database. (Ahima.org)
- The ability of a program to learn from experience — that is, to modify its execution on the basis of newly acquired information. In bioinformatics, neural networks and Monte Carlo Markov Chains are well-known examples. (Nature.com)
The implication of this is that Microsoft is incorporating, at some level, a type of artificial intelligence in their search algorithms. This is a very reasonable idea, actually. Where Google bases their results largely on links - that is, on permanent votes for a website’s relevance and validity, MSN is working to apply on-the-fly learning to their ranking. A site may become more important because it is the link a searcher chose to visit.
MSN is incorporating several hundred (569, according to Cassidy) criteria to their ordering system. These criteria are not what is weighed when determining a document’s relevancy, but they are taken into consideration when choosing what properities of a document should be given greater weight in ranking.
RankNet is not a simple concept, but has fairly wide-reaching potential. It can be applied to filters, search engines, or any number of database interaction tools. But what does it mean for search engine optimization? Very little, in my opinion.
The development of more sophisticated algorithms for search engines is very much parallel to the reality of human social interaction. These methods are simply intended to make a website which is important to many people easier to find for other people who may also find the website valuable. As such, a more sophisticated algorithm will respond best to a useful, usable, worthwhile web resource. Search engine optimization is not about optimizing for an algorithm - although this should be considered - it is about optimizing for a human user. If your human social network is successful, then the chances you have for building an algorithmic social network are much higher.
Although RankNet itself may not contribute directly to search engine optimization techniques, it is important to be aware of what factors MSN takes into consideration.
According to Cassidy, MSN looks first at:
- Anchor text in links
- Content
- Keyword Density
- URL keywords
- Header tags, title tags, alt tags, and title attributes
- Strong or Bold text
MSN also handles 302 redirects very effectively, places more importance on static pages than on dynamic pages, and has very effective filtering of duplicate content. In fact, MSN apparently has the best handling of duplicate content - and Google the worst.
Taking all of this into consideration may give you a well optimized page. Abusing them may give you a few days of top rankings. But, then again, it might not. If you don’t build your business to attract real customers and real references, it won’t matter how optimized your pages are - just like in any other search engine.
April 5, 2006
At the end of February, I posted about a new search engine called Dumbfind. The engine is designed around tagging technology, and was offering a trial of a scheme for contextual advertising based on tags. Today, I’m going to combine my efforts by doing a follow-up on my own Dumbfind Adsonomy trial while talking about their user interface, my theme for the week.
My trial was very unsuccessful. I ran ads for two of my own websites, and received (according to my statistics) no referrals at all from Dumbfind.com. Zero. Zip. Nada. You get the point. I feel there are two major contributing factors to this.
The first has to do with Dumbfind’s traffic. An Alexa position of 44,315 (as of this writing) is respectable for a small business, but I also have to note that this is a recent increase of 95,000 positions. Maybe I’ll receive a few visits in the last two weeks of my trial, if this continues. But respectable positioning for a search engine is not the same as for a small service site. Dumbfind may be building their traffic, but it’s just not there yet.
The second flaw is from the design of their Adsonomy system. First, each ad only provides the attachment of 10 tags. Second, these tags must be selected from their tag database. This eliminated many potentially useful tags. If the tags being applied are maximally general, such as "search engine marketing", there is a greatly reduced chance they will bring up my ad – but Adsonomy didn’t permit terms which were regionally specific and associated with my keywords.
It is unclear how Adsonomy associates your selected tags with searches. One serious lack in the Adsonomy interface is any discussion of how it works! I was unable to define a search which caused my own ad to show up. Not a definitive test, by any means, but it does leave me wondering.
The Dumbfind search results are very awkwardly displayed. In my tests, I found it difficult to visually distinguish sponsored listings from actual search results. I also find it very difficult to understand the listings. An example listing:

The top line of these results tells you the url and title of the website. The font is small and not very obvious - for a while, I thought these were contextual advertising, partially because the name of the company doesn’t appear obviously in any of the following links. This is partially the fault of the site itself - they have optimized their titles for search terms but have not included their site name. However, the display of this information is entirely the responsibility of Dumbfind. I’m left confused due to three issues:
- The first element and only title level indicator of the site address in the results is smaller than most of the remaining text.
- The largest element is exclusively drawn from title tags. If this includes the site name, great - otherwise, it is confusing.
- All results are provided with supplementary pages.
The fundamental problem, to me, is that the search results are too complex - I’m barraged with information about this site in a manner that overwhelms and confuses me. I have no option to remove supplemental results and simplify my view, and the beginning of each result is unclear due to the scale of the other elements.
I like the way Dumbfind’s main page looks. I like their idea of tagged searching. However, I find their search results confusing and cluttered with advertising. It’s unlikely that they would win me over against any other service, unless they can provide more customization tools or simplify their default results.
« Previous Page | Next Page »