First of all I would like to say that I am not a student trying to get his homework solved. Rather I am a software developer trying to implement a game that has a probabilistic side... In case it makes any difference .

The abstraction of the problem is pretty simple and hope someone more seasoned in statistics can point me in the right direction:

**Imagine an urn with N balls, of which B balls are white and N-B are black. I draw M balls out of the urn with replacement, however, every time I get a white ball i paint it black (or replace the white ball with a black one). I want to express the probability that I have exactly X white balls remaining in the urn (obviously 0 <= X <= B) after the M trials.**

If it makes any difference B is small (<100), N is very large (N>10^5) and M/N is in (0, 1). So limits could be applied if needed..

After a bit of wiki reading the problem seems very close to a Hypergeometric distribution but in my case the probabilities of drawing a white/black balls does not change in the same way.

In my case the probability for drawing white ball goes: B/N -> (B-1)/N -> ... -> 1/N -> 0 when drawing a white ball, and remains constant when drawing a black one. In the example of the hypergeometric distribution B/N -> (B-1)/(N-1) -> ... -> 0/(N-B) if I draw white and B/N -> B/(N-1) -> ... -> B/(N-M) if i draw black. But experimentally I see the probability distribution I'm looking for has a shape similar to a hypergeometric probability distribution..

Remembering my high school days I could calculate the average number of balls as a function of M/N and verified this by simulation. Also through simulation I plotted the probability distribution, and created an empirical model that i use now. But it could be way cooler to have some sort of a algebraic formula.

I also tried expressing the problem as a bayesian network where i can compute probabilities faster than by simulating. It works quite well but does not give me an algebraic function...

Another approach I think is feasible is a ``brute" counting all possible draw sequences. Each ball can be consider a digit in base N and the sequence of M draws results in a number in base N with M digits => I have ~N^M possible draw sequences. Imagine also the digits given to the white balls are numbered 1 to B. The problem is that I can not count all the numbers between [1, N^M) that have any X digits in the set of the first B digits, but not any of the rest of the (B-X) digits. Do you think this is a better approach?

Needles to say, any advice is greatly appreciated.

Thank you,

Radu