Google might like to be thought of as a 'black hole of Internet search engines,' consuming all the information
that falls within its gravitational reach. The difference being, the information does escape, and the web is not
really ripped apart at the seams. Oh well, so much for analogy.
But there really are holes in Google, Yahoo! and all other search engines that have nothing to do with the forces
of nature. These holes have serious implications for the quality of search engine results, and therefore require
the attention of your optimization efforts.
Let's begin the analysis with Google - the current technology leader in the search engine field. When users visit
the Google search engine and run a search, they often enter in complete phrases. This tendency is likely to become
more common as text to speech comes to reality. How Google treats these phrases demonstrates a fault within their
algorithms and a hole in the accuracy of their search results. When you include a common word in a phrase within
the Google search box, it gives you the following message above the search results:
"for" is a very common word and was not included in your search." [details] If you click for details, you
get Google's explanation on how and why common words are excluded.
But, here's where Google falls down. Visit Google right now. Open up 4 windows and in each window's search box
type the following queries:
Hotels New York
Hotels in New York
Hotels for New York
Hotels about New York
The words 'in' 'for' and 'about' all get the standard, "This is a very common word and was not included in your
search," message. Yet all four searches display entirely different results?
What is Google doing? I considered the possibility that I was pulling results from different data centers, so
I ensured this was not the case. I then tried a variation on this search query, using the term "search engine
optimization X hotels" the 'X'' representing a blank space, or one of the words, in' for' or about. In this test,
only where the X' represented a blank space did I get varying results. Still, by rights they ought to have all been
identical.
|
It occurred to me that perhaps Google was using different algorithms when it identified a place name in the search
query by trying to understand the context of the query. That would be a logical move. I'm very familiar with software
that comprehends the context of textual content. Could it be that Google is trying to apply some contextual filtering
to their results? I then proceeded to try a garbage search. A search phrase with common words which really have no
direct relevance, and therefore words which would never appear together logically:
"room hotel tapestry highway lagoon"
Interestingly, Google had 1720 entries which matched this query, and the results varied depending on which of the
X terms I inserted between any two of the words. Search results also varied if I moved the placement of the ignored
word within the query. But is this context? A further test would be required. I put together 3 queries using the
same terms, but with a common or ignored word inserted as follows:
Filing tax return(s)
Filing a tax return(s)
Filing of tax return(s)
In this case, I tried singular and pluralized searches, to ensure that poor grammar was not affecting the results.
Results varied for each search. That's not to say they were all entirely different, just that they varied. I tried
a few other searches and received similar results. Most importantly, the results I received were all equally
contextually correct, which was a relief.
Some people have written to news groups and discussion boards that when Google comes across an 'ignore' word, it
substitutes a wild card. However, if that were true, the various ignore words, would all return the same results
and this is not the case. Therefore, it can be surmised that Google does not in fact ignore words at all! It is
more likely that Google is using some measure of context algorithm. This is logical. The technology exists and
Google is known to have bought a UK firm last year which was developing such a technology. Our own firm uses software
which uses contextual analysis in its algorithms.
Taking the analysis a step further, which other engines seem to have a grasp on context? Obviously, the places to
look first were Google's competitors: Yahoo! Microsoft, and AskJeeves.
AskJeeves
Askjeeves sprang immediately to mind, as it had originated the concept of "phrase a question" type searching,
thus it should logically have some context filtering in place. In fact, when I ran the 'tax return' query through
the engine, I did receive varying results. Very different results than Google, I might add. When multiple 'ignore'
words were added to a query, results did not vary, which may indicate very limited filtering.
I then tried an alternate query. "diapers for (a) baby" and "diapers on (a) baby" This should logically return
different results. One recommending diapers, and one about how to put them on, or keep them on or how they should
look, etc. Surprisingly, I received identical results to my queries. Context was not being properly filtered by
the very search engine which first introduced the concept! I tried the same search on Google. While results were
jumbled a bit, the top web sites were the same for both queries, just in varying order. With over 550,000 results
to choose from, this would indicate Google too, has a long way to go to fulfilling the promise of contextually
correct responses.
Yahoo!
Next, I turned my attention to Yahoo! I was somewhat surprised to discover that Yahoo! does not seem to have
-any- filtering in place. Results did not vary at all for the test searches run when the "ignore" words were
inserted or removed. Yahoo! also did not identify these terms as being ignore terms in their results, but the fact
that results were unchanged when the terms were added or deleted would indicate that they were omitted and Yahoo!
does not have the necessary algorithms to allow it to comprehend the context of a search query.
Is context an area where Yahoo! seriously lags behind Google and others? If true, this points to a widening gap
between the search engines in the future. Google is already positioning for speech to text devices, can intonation
be far behind? Yahoo! has not demonstrated any evidence of making strides in either of these areas.
Microsoft
Lastly I looked at the new Microsoft engine. No contextual filtering in place. Since this search engine is still
in beta, I cannot in all fairness comment on it being behind in a race where we have not yet seen the final product.
Still, it's something to keep in mind for the future.
Implications for SEO
The implication of contextual search on how your web site performs in the search engines is immense. It means that
the nuances of how people search have to be better taken into account by all SEO firms.
In our firm we recognized that as the world moved to speech to text and as the web grew in size, context would be
the next big differentiator in search results. This means that context is already recognized and taken into account
both by our technicians and our technology when analyzing a web site, and optimizing it for search engines.
Working to improve your web site's performance in the search engines now requires a comprehension of how people
are actually phrasing search queries and using that knowledge to properly position the content on your site, to
account for the idioms used by your target audience.
Ensure that you are using phrases in the way you hear people asking questions. Ensure you cover all the bases and
get all possible variations. Get outside help if you need it, but don't miss out on your opportunity to take
advantage of the Black Holes out there.
About The Author
Richard Zwicky is a founder and the CEO of Metamend Software, a Victoria, B.C.
based firm whose cutting edge Search Engine Optimization software has been recognized around the world as a leader
in its field. Employing a staff of 10, the firm's business comes from around the world, with clients from every
continent. Most recently the company was recognized for their geo-locational, or LBS technology, which correlates
online businesses with their physical locations.