Comments: 
Can we sample log N uniform bits in constant time? Or just one bit?
oh, i was actually assuming that you can generate a uniform sample from the interval [0, 1] in constant time. i guess this is equivalent to log N uniform bits.
i'm sure it's provably impossible with 1 bit: even if all the weights are equal, you still need to choose from among N alternatives, which takes log N bits of randomness. thanks for bringing that up. i will edit the post.
the spirit of the puzzle is that computation time is the scarce resource, rather than random bits.
Edited at 20090724 06:14 pm (UTC)
From: (Anonymous) 20090725 07:42 am (UTC)
Clarification  (Link)

Could you specify your sample clearly? Sample of size N with unique objects which means order matters or Sample of size N with duplicates allowed and # of dups proporational to the prob. distribution x_n?
 From: arvindn 20090725 07:51 am (UTC)
Re: Clarification  (Link)

firstly the sample is not of size N. rather, N is the number of objects. the number of samples is unspecified; but each time the algorithm is called it must return a new, independent sample in expected constant time.
if that seems unclear, another way to phrase the problem is as follows:
write an algorithm to draw M independent samples from the prob. distribution [x_1, x_2 ... x_N] in time O(M + N).
secondly, as implied by the term independent samples, each sample is to be chosen independently of the others, which means there will (probably) be duplicates. i.e, it is "sampling with replacement" rather than "sampling without replacement."
hope that helps.
Edited at 20090725 08:08 am (UTC)
From: eightbit 20090726 03:34 pm (UTC)
Re: Clarification  (Link)

Still looking for a solution? When looking for sublineartime algorithms for problems, it seems like the model of computation can easily cloud the thought process that the puzzle is meant to invoke. Requirements involving expectedconstant time have a similar effect.
I expect the natural first thought invoked by the problem to be the following. We partition [0,1] into N subintervals, each with length proportional to x_i. Then, given a random sample from [0,1], we only need to look up which subinterval it falls into, and then output the index of that subinterval.
Given that we are ignoring model issues, and allowed to sample reals, I'll assume that we are only meant to do limited things with the reals  maybe only comparison. A naive approach then tries to look up the interval using comparisonbased search on a binary tree. This takes log N time, however.
If we stick to the comparison idea, though, the next step should be exploit the fact that we only need expected constant time. This suggests something like Huffman coding. But if we consider the uniform distribution on N elements, it is clear that this won't help.
So is the trick meant to extend from here? Or have I assumed something incorrect? I have seen several puzzles in that past that get to this point and then introduce an ideal hash function with O(1) lookup. I'm hoping there's more of a trick than that!
 From: arvindn 20090726 08:33 pm (UTC)
Re: Clarification  (Link)

"When looking for sublineartime algorithms for problems, it seems like the model of computation can easily cloud the thought process that the puzzle is meant to invoke." yes, this is absolutely correct. there are hidden log factors everywhere. i'm assuming that you can also do arithmetic operations on the reals at unit cost. i think this is the only sensible model  with comparisons alone, you cannot even "evaluate" a real in constant time, so the preprocessing cannot run in time O(N). i feel confident that there is no way to "exploit" the arbitrary precision of the reals to do more than a constant number of logical operations at once, because you need more than constant time to "set up" the reals to operate on. in any case, my solution does not "cheat" in this manner (or exploit the computation model in any way). finally, while writing up my solution i realized i was very naive in thiking it might be easy. took more than 2 hours and i'm still not done. so i hope that it will be informative. btw here is a copy of my current draft. i need to elaborate a few things before i can present it to a wider audience, but i hope things are clear to you. if you could quickly see if you can spot any holes, that would be much appreciated! https://docs.google.com/Doc?id=dgwzqjjp_278g93mhvgj
From: ext_200976 20090725 09:46 pm (UTC)
O(N)+O(M * N/2) for me  (Link)

I don't think I know how to do this. The best I can do: rand_element_generator(x_n):
* create a second array [c_1, c_2,...c_N] with the cumulative probablity distribution: c_1 = x_1, c_i = c_(i1) + x_i.
* if necessary, normalize the cumulative array by dividing each element by c_N.
* return a generator that on each call chooses a random number u on 0.0 .. 1.0, and returns n such that
c_(n1) <= u < c_n
(where c_0 = 0).
For small N, the final step is best done as a linear search, O(N)+O(M * N/2). If N is significant you could do a binary (log N) search, making it O(N) + O(M * log N). In the linear search case you can presort x_n from largest to smallest and relabel or map the output. I guess I could make a lookup table: make an array where each element A_i, i in 0..999, is the value that rand_element_generator would return if (i/1000) were chosen  the smallest n such that (i / 1000) < x_n. Now I can * generate random number u on 0.0 .. 0.1
* set i = int( 1000 * i )
* if A_i == A_(i1) or A_i = A_(i1)+1, then return A_i. Otherwise linear search between A_(i1) and A_i.
This last seems fiddly and inelegant. If I understand right, this is the underlying problem in the FASTA challenge on the 'programming language shootout': http://shootout.alioth.debian.org/u32q/benchmark.php?test=fasta&lang=gcc&id=4 They seem to generally use linear search, perhaps because N is small.
From: ext_200978 20090725 10:32 pm (UTC)
one possible solution  (Link)

Maintain a family of lgN hash functions e.g. [h_0,h_1,....,h_lgN1], such that h_i(obj) > [0,1]. Assuming that we can generate such a Hashmap using some property of the given set of N objects.
Now generate all subsets of these lgN hash functions (which would be N of them) and assign them to each of the cell of our original probability distribution array say prob_distrib[x_1,x_2,...x_N].
Now take each of the N objects one by one and apply the corresponding set of hashfunctions to it and maintain a reverse map of hash functions to these objects in a different hash table such that an object is assigned to all those hash fns keys for which h_i(obj) >= prob_distrib[x_i], so basically we are mapping these N objects to a set of lgN hash functions. This step will run in O(N) time with a hidden large constant in O(N) with the assumption that not all the lgN bits will be set for each cell of the array prob_distrib.
As we can sample lgN bits in O(1) time, select the hash functions with the corresponding bits set and then return the corresponding objects mapped to these hash fns which would be our independent sample upon each call to generate these lgN bits.
My two cents,
From: (Anonymous) 20090726 07:55 pm (UTC)
don't know CS, but from a probabilist's point of view  (Link)

So, how about the inverse CDF method. It is known that if X is a random variable, then F(X) is uniformly distributed (where F is the Cumulative distribution function of X). Thus sampling from the distribution of $X$ is equivalent to computing F^{1}(U), where $U$ is a uniform [0,1] random variable. Now, here is where I don't if my solution answers your question. Since $X$ is discrete, F^{1} is easily computable. I don't know if this takes the O(N) preprocessing allowed or we need more than $O(N)$ steps....Anyways, look forward to your answer...  