Content spamming, in its simplest form, is the taking of content from other sites that rank well on the search engines,
and then either using it as-it-is or using a utility software like Articlebot to scramble the content to the point that it
can't be detected with plagiarism software. In either case, your good, search-engine- friendly content is stolen and used,
often as part of a doorway page, to draw the attention of the search engines away from you.
Everyone has seen examples of this: the page that looks promising but contains lists of terms (like term term paper term
papers term limits) that link to other similar lists, each carrying Google advertising. Or the site that contains nothing
but content licensed from Wikipedia. Or the site that plays well in a search but contains nothing more than SEO gibberish,
often ripped off from the site of an expert and minced into word slaw.
|
|
Editorial Note: A firestorm of controversy erupted last Wednesday following a brief press
conference at SES in Chicago about rampant click fraud, criminal bot networks and Google. Webmaster Radio
followed up by airing an explosive interview with
Clarence Briggs, CEO of AIT.com and the lead proponent in the class action lawsuit against Google earlier this
year. Pro and con posts on the story can be found at Search
Engine Journal. Some additional background on the story can be found in the article
"Meandering the Margins of Goog and Evil" by
Jim Hedger. We'll continue to track the story as it develops and keep you posted. |
These sites are created en masse to provide a fertile ground to draw eyeballs. It seems a waste of time when you receive a
penny a view for even the best-paying ads but when you put up five hundred sites at a time, and you've figured out how to
get all of them to show up on the first page or two of a lucrative Google search term, it can be surprisingly profitable.
The losers are the people who click on these pages, thinking that there is content of worth on these sites and you. Your
places are stolen from the top ten by these spammers. Google is working hard to lock them out, but there is more that you
can do to help Google.
Using The Antispam Tag
But there is another loser. One of the strengths of the Internet is that it allows for two-way public communication on a
scale never seen before. You post a blog, or set up a wiki; your audience comments on your blog, or adds and changes your
wiki.
The problem? While you have complete control over a website and its contents in the normal way of things, sites that allow
for user communication remove this complete control from you and give it to your readers. There is no way to prevent readers
of an open blog from posting unwanted links, except for manually removing them. Even then, links can be hidden in commas or
periods, making it nearly impossible to catch everything.
This leaves you open to the accusation of link spam for links you never put out there to begin with. And while you may
police the most recent several blogs you've posted, no one polices the ones from several years ago. Yet Google still looks
at them and indexes them. By 2002, bloggers everywhere were begging Google for an ignore tag of some sort to prevent its
spiders from indexing comment areas.
Not only, they said, would bloggers be grateful; everyone with two-way uncontrolled communication wikis, forums, guest
books needed this service from Google. Each of these types of sites has been inundated with spam at some point, forcing
some to shut down completely. And Google itself needed it to help prevent the rampant spam in the industry.
In 2005, Google finally responded to these concerns. Though their solution is not everything the online community wanted
(for instance, it leads to potentially good content being ignored as well as spam), it does at least allow you to section
out the parts of your blog that are public. It is the "nofollow" attribute.
"Nofollow" allows you to mark a portion of your web page, whether you're running a blog or you want to section out paid
advertising, as an area that Google spiders should ignore. The great thing about it is that not only does it keep your
rankings from suffering from spam, it also discourages spammers from wasting your valuable comments section with their
junk text.
The most basic part of this attribute involves embedding it into a hyperlink. This allows you to manually flag links, such
as those embedded in paid advertising, as links Google spiders should ignore. But what if the content is user-generated? It's
still a problem because you certainly don't have time to go through and mark all those links up.
Fortunately, blogging systems have been sensitive to this new development. Whether you use Wordpress or another blogging system,
most have implemented either automated "nofollow" links in their comment sections, or have issued plugins you can implement yourself
to prevent this sort of spamming.
This does not solve every problem. But it's a great start. Be certain you know how your user-generated content system provides this
service to you. In most cases, a software update will implement this change for you.
Is This Spamming And Will Google Block Me?
There's another problem with the spamming crowd. When you're fighting search engine spam and start seeing the different forms
it can take and, disturbingly, realizing that some of your techniques for your legitimate site are similar you have to
wonder: Will Google block me for my search engine optimization techniques?
This happened recently to BMW's corporate site. Their webmaster, dissatisfied with the dealership's position when web users
searched for several terms (such as "new car"), created and posted a gateway page a page optimized with text that then
redirects searchers to an often graphics-heavy page.
Google found it and, rightly or wrongly, promptly dropped their page rank manually to zero. For weeks, searches for their
site turned up plenty of spam and dozens of news stories but to find their actual site, it was necessary to drop to the
bottom of the search, not easy to do in Googleworld.
This is why you really need to understand what Google counts as search engine spam, and adhere to their restrictions even
if everyone else doesn't. Never create a gateway page, particularly one with spammish data. Instead, use legitimate techniques
like image alternate text and actual text in your page. Look for ways to get other pages to point to your site article
submission, for instance, or directory submission. And keep your content fresh, always.
While duplicated text is often a sign of serious spammage, the Google engineers realize two things: first, the original text
is probably still out there somewhere, and it's unfair to drop that person's rankings along with those who stole it from them;
and second, certain types of duplicated text, like articles or blog entries, are to be expected.
Their answer to the first issue is to credit the site first catalogued with a particular text as the creator, and to drop
sites obviously spammed from that one down a rank. The other issue is addressed by looking at other data around the
questionable data; if the entire site appears to be spammed, it, too, is dropped. Provided you are not duplicating text on
many websites to fraudulently increase your ranking, you're safe. Ask yourself: are you using the same content on several
sites registered to you in order to maximize your chances of being read? If the answer is yes, this is a bad idea and will
be classified as spamdexing. If your content would not be useful to the average Internet surfer, it is also likely to be
classed as spamdexing.
There is a very thin line between search engine optimization and spamdexing. You should become very familiar with it. Start
with understanding hidden/invisible text, keyword stuffing, metatag stuffing, gateway pages, and scraper sites.
About The Author
Article by Danny Wirken
http://www.chauy.com/2006/07/googles-tag-to-remove-content-spamming/