Netflix

Netflix [Oct. 18th, 2006|01:16 pm]
Arvind Narayanan
[Tags|, , ]

There's something magical about spying via Netflix. Unlike the fantasy worlds of MySpace and the blogs, it's less a social platform than a practical tool, so the data is exceptionally pure. Netflix allows our tastes to flourish in their full, omnivorous, complex human glory, free of shameful image-management and the high/low divide. Earnest Goes to Camp consorts freely with Citizen Kane. It's like a self-portrait in movie titles: Nowhere else is cultural desire so nakedly on display.
-- Sam Anderson, Slate.

And now you can spy via Netflix too., a.k.a How To Break Anonymity of the Netflix Prize Dataset.

Things are quite murky right now. They're going to get murkier before they can get clearer though. Watch this space for updates.

[User Picture]From: floopilot
2006-10-18 08:25 pm (UTC)
Very interesting.
And I must say, very speedy research - you churned this out from the time the data was released to now??
[User Picture]From: arvindn
2006-10-18 09:23 pm (UTC)
It's worse -- I started thinking about it only when I read the slashdot story about someone improving Cinematch, which was last Monday. I started looking at the data the day after that, and we uploaded the paper yesterday. So exactly one week!

It's been hell. We constantly felt we were going to get beaten to it, so I worked essentially round the clock, keeping myself awake with coffee. Yesterday at one point I was so stressed out I started randomly yelling at someone. Today I woke up at 1pm :) The feeling of relief is great.

Vitaly was awesome. He has about ten threads of research going on at any one point, but he managed to shove everything around in order to work on this.

[User Picture]From: ephermata
2006-10-18 08:43 pm (UTC)
Great stuff. Thanks for posting it.
From: (Anonymous)
2006-10-18 09:11 pm (UTC)
Thanks. It is a very preliminary draft, so if you have any comments I'd love to hear them.
[User Picture]From: arvindn
2006-10-18 09:16 pm (UTC)
Sorry, that was me above.
From: (Anonymous)
2006-10-19 04:03 am (UTC)
One information you say could easily disclose the user if you somehow have access to few(6 i think) of his ratings for movies not in top 100(500) rated movies.
What fraction of the users even have something like 50 movies rated from this set?
[User Picture]From: arvindn
2006-10-19 05:19 am (UTC)
Off the top of my head, a random movie rated by a random customer has a probability of well over 50% of being outside the top 500.

Thanks for pointing that out, we'll put the stats on this into the paper.
