Log in

No account? Create an account
Spamming immunity for online journals -- why? - Arvind Narayanan's journal [entries|archive|friends|userinfo]

Spamming immunity for online journals -- why? [Jan. 8th, 2007|12:05 pm]
Arvind Narayanan
[Tags|, , ]

From google's spam report page:
Don't deceive your users or present different content to search engines than you display to users, which is commonly referred to as "cloaking."
Cloaking is a bannable offense, and indeed if you do it you're certain to get banned sooner or later.

With a couple of exceptions. "Registration required" websites like NYT and online journals and (conference proceedings) like SpringerLink, IEEE Xplore, the ACM portal etc. that require payment seem to be able to cloak with impunity. When they see googlebot, they open up their content to it, and google happily indexes it. So you search for "how to save the world," and google gives you a result from one of these sites like with the "ten easy steps to save the world." You're delighted. You click on the link, and are greeted with "please pay $$$ to see this article."

"But wait a second," I hear you say. That isn't really the same thing as spam because the content is the same, it's just restricted access. No difference, I contend. From the point of view of 99.99% of the users who aren't going to pay, it's an equal amount of annoyance. (Normally I'd be able to access Springer and ACM content at least from school with the IP-address-detection thingy, but now that's stopped working too, and I'm too lazy to figure out why. But that's beside the point.)

It'd be in google's self-interest to not piss off its users and remove paid content from its database. In any case, I think cloaking in any form is disgusting and unethical and I find it surprising that supposedly reputable publishers do it and don't draw any comment. I guess the only good thing about it is that at least in CS the whole print medium is going the way of the dodo, and soon all publications will be available on citeseer/arXiv/author homepages/etc.


[User Picture]From: hukuma
2007-01-08 07:41 pm (UTC)
Since our "IP-address-detection thingy" is still working, I find it convenient that Google indexes the ACM & Springer sites. But even for people in your position, I would say that a link to a registration required article has value, since it tells you that there is an article to be found, and even if you don't want to pay, you can look for other means of getting it (author's web page, library, etc.) The only improvement I can think of is getting rid of the potential deception and adding a note "Subscription required" to the Google link, the same way they do in Google News.
(Reply) (Thread)
[User Picture]From: arvindn
2007-01-08 07:46 pm (UTC)
The only improvement I can think of is getting rid of the potential deception and adding a note "Subscription required" to the Google link, the same way they do in Google News.

Yeah, I guess I'd be happy with that.
(Reply) (Parent) (Thread)
[User Picture]From: skthewimp
2007-01-09 03:25 am (UTC)
or maybe there could be a "subscription/payment required" link on the search page, where google directs you to the springer etc. pages!

when you're doing some kind of "research" and your univ/company is paying for it, you would be much better off having these paid links. and putting it on a separate page won't piss off the rest of the world also!
(Reply) (Parent) (Thread)
[User Picture]From: iliada
2007-01-09 01:24 am (UTC)
My guess is that it's all been figured out between Google and the publishers. They make their sites searchable, while in exchange Google does not make their content available as cached pages.
(Reply) (Thread)
[User Picture]From: normalcyispasse
2007-01-09 01:53 am (UTC)
Hm. Interesting point.

Content-for-pay when otherwise advertised in search results is certainly a duplicitous practice. I wonder if Google can get around this by using cached versions of the articles.
(Reply) (Thread)
[User Picture]From: arvindn
2007-01-09 02:07 am (UTC)
These pages typically also have the "nocache" tag set. Funny thing is, I can't think how google can build their index without caching the content, so they do cache it but just won't show it to you (i.e, no "cached" link in search results.)
(Reply) (Parent) (Thread)
From: (Anonymous)
2007-02-01 04:56 am (UTC)

Google spamming by journals, etc.

You're absolutely right. These journal publishers are guilty of black-hat redirects and cloaking, and I report them as spam to Google every opportunity I get. I am a nuclear engineering grad student, so I do a lot of research using the journals that are thus marketed. But I sure as hell don't want to see them tease me for cash when I do a general Google search. Some of these guys are actually putting out legit pages that have a full abstract, which Googlebot sees and returns if it contains your search string. Springer is an example you mention that is OK in my experience. On the other hand, IEEE Xplore is probably the most prolific black-hat redirect offender in existence--they let Google see their full-text papers and then shutter everyone else. Heck-you can't even see a citation or abstract from these clowns. The other "bigtime" thug in this regard is Metapress. I wrote about my experiences with IEEE Xplore here, on an amateur physics message board:


I appreciate the fact that others like yourself are willing to call BS when they see it, and I hope Google puts the kibosh on these cheaters.

(Reply) (Thread)