Np_random_rand

Bumped into a weird behaviour of numpy.random.rand function: the resulting mat is random but it does not seem uniform as is written in the documentation.

I’ve ran the test and it failed unexpectedly

def testInRangeAroundDefault(self):
    randMat = np.asarray(np.random.rand(20,30,3)*255, dtype=np.uint8)
    colMin = np.asarray([0, 0, 0], dtype=np.uint8)
    colMax = np.asarray([128, 128, 128], dtype=np.uint8)

    inranges = cv.inRange(randMat, colMin, ColMax)
    nuOfInRange = np.count_nonzero(inranges)
    totCells = randMat.shape[0]*randMat.shape[1]

    cv.imwrite(DEBUG_PATH+'/test_rand.bmp', randMat)
    cv.imwrite(DEBUG_PATH+'/test_randFiltered.bmp', inranges)
    self.assertAlmostEqual(nuOfInRange, totCells/2, delta=100, msg=f'Bad inRangeAround or bad luck( ')

The idea behind the code is to check if the pixel value is greater or smaller than the “middle pixel”.

And it consistently resulted in error

self.assertAlmostEqual(nuOfInRange, totCells/2, delta=100, msg=f'Bad inRangeAround or bad luck( {totCells/2} vs {nuOfInRange}')
AssertionError: 83 != 300.0 within 100 delta (217.0 difference) : Bad inRangeAround or bad luck( 300.0 vs 83

I’ve thought something si wrong with my code, so I’ve opened GIMP to check the images:

  • Random:

random colorful noisy image

  • Filtered with inRange:

little number of white pixels

  • And thresholded by half (taken from another round, but distribution is the same):

same picture as inrange

Bonus points to whoever knows what is the key to the solution

Solution

The key is independence. Since every value is independent the test conditions are not “All items in array above half max” but rather “All pixels where Red AND Green AND Blue channels are above half max simultaneously” which is probability of 0.5*0.5*0.5.

Thanks to ChatGPT for helping me with it. Although giving an answer that expected part of filtered pixels is 50%.

Q: How much of the array will be greater that [0.5,0.5,0.5] if it is generated with np.random.rand(100,100,3)?

A: If we generate a 100x100x3 array using np.random.rand(100,100,3), each element of the array will be a random number between 0 and 1. The probability of a random number being greater than 0.5 is 0.5, since the numbers are uniformly distributed between 0 and 1.

To find the proportion of the array that will have values greater than [0.5,0.5,0.5], we can compare each element of the array to [0.5,0.5,0.5] and count the number of elements that are greater. Since each element is independent and identically distributed, we can use the binomial distribution to model the number of elements that are greater than [0.5,0.5,0.5]. The probability of success (i.e., an element being greater than [0.5,0.5,0.5]) is 0.5, and the number of trials is 100x100x3=30,000.

The expected number of elements that are greater than [0.5,0.5,0.5] is therefore:


> E(X) = np = 15000

where n=30000 and p=0.5.

The standard deviation of X is:


> SD(X) = sqrt(np(1-p)) = sqrt(7500) ≈ 86.6

Using Chebyshev's inequality, we can bound the proportion of elements that are greater than [0.5,0.5,0.5] with:


> P(|X-E(X)| >= kSD(X)) <= 1/k^2

where k is some positive number. For example, if we choose k=3, we get:


> P(|X-E(X)| >= 3SD(X)) <= 1/9

So the proportion of elements that are greater than [0.5,0.5,0.5] is:


> P(X > [0.5,0.5,0.5]) = X/n ≈ 0.5

Therefore, we expect approximately 50% of the elements in the array to be greater than [0.5,0.5,0.5].