maths/gaussian.py is wrong #2212

stkain · 2020-07-18T01:48:16Z

Hi,

the formula is wrong.

return 1 / sqrt(2 * pi * sigma ** 2) * exp(-((x - mu) ** 2) / 2 * sigma ** 2)
                                                              ^^^^^^^^^^^^^^^
                                                              this actually multiplies with sigma**2 instead of dividing by sigma**2

Instead, for example:

1 / (sqrt(2 * pi) * sigma) * exp(- 0.5 * ( (x - mu) / sigma     )**2 )

avoids squaring sigma and taking the sqrt again in the factor before the exponential

and first calculating (x-mu) / sigma and then squaring saves you squaring numerator and denominator separately
in the exponential.

Bye,
Stefan

The text was updated successfully, but these errors were encountered:

cclauss · 2020-07-18T05:33:51Z

@QuantumNovice Could we create a doctest with one of these to prove this claim:
https://docs.scipy.org/doc/scipy/reference/search.html?q=Gaussian

spamegg1 · 2020-07-25T14:40:37Z

Defining it as the OP did, gives (in Python 3.8.2 shell) exactly the same results as in the docstring of maths/gaussian.py:

>>> from numpy import exp, pi, sqrt
>>> def gaussian(x, mu: float = 0.0, sigma: float = 1.0) -> int:
	return 1 / (sqrt(2 * pi) * sigma) * exp(-0.5 * ((x - mu) / sigma)**2)

>>> gaussian(1)
0.24197072451914337
>>> gaussian(24)
3.342714441794458e-126
>>> import numpy as np
>>> x = np.arange(15)
>>> gaussian(x)
array([3.98942280e-01, 2.41970725e-01, 5.39909665e-02, 4.43184841e-03,
       1.33830226e-04, 1.48671951e-06, 6.07588285e-09, 9.13472041e-12,
       5.05227108e-15, 1.02797736e-18, 7.69459863e-23, 2.11881925e-27,
       2.14638374e-32, 7.99882776e-38, 1.09660656e-43])
>>> gaussian(15)
5.530709549844416e-50
>>> gaussian([1,2, 'string'])
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    gaussian([1,2, 'string'])
  File "<pyshell#7>", line 2, in gaussian
    return 1 / (sqrt(2 * pi) * sigma) * exp(-0.5 * ((x - mu) / sigma)**2)
TypeError: unsupported operand type(s) for -: 'list' and 'float'
>>> gaussian('hello world')
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    gaussian('hello world')
  File "<pyshell#7>", line 2, in gaussian
    return 1 / (sqrt(2 * pi) * sigma) * exp(-0.5 * ((x - mu) / sigma)**2)
TypeError: unsupported operand type(s) for -: 'str' and 'float'
>>> gaussian(10**234)
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    gaussian(10**234)
  File "<pyshell#7>", line 2, in gaussian
    return 1 / (sqrt(2 * pi) * sigma) * exp(-0.5 * ((x - mu) / sigma)**2)
OverflowError: (34, 'Numerical result out of range')
>>> gaussian(10**-326)
0.3989422804014327
>>> gaussian(2523, mu=234234, sigma=3425)
0.0

None of the search results linked by @cclauss calculates the same function.

I was able to find an implementation from the authors of the famous Algorithms textbooks (Sedgewick et al):
https://introcs.cs.princeton.edu/python/22module/gaussian.py.html

This also gives exactly the same results as the docstring of maths/gaussian.py (with the exception of supporting NumPy arrays of course, and the Overflow error):

>>> import math
>>> def pdf(x, mu=0.0, sigma=1.0):
    x = float(x - mu) / sigma
    return math.exp(-x*x/2.0) / math.sqrt(2.0*math.pi) / sigma

>>> pdf(1)
0.24197072451914337
>>> pdf(24)
3.342714441794458e-126
>>> pdf(15)
5.530709549844416e-50
>>> pdf([1,2, 'string'])
Traceback (most recent call last):
  File "<pyshell#32>", line 1, in <module>
    pdf([1,2, 'string'])
  File "<pyshell#28>", line 2, in pdf
    x = float(x - mu) / sigma
TypeError: unsupported operand type(s) for -: 'list' and 'float'
>>> pdf('hello world')
Traceback (most recent call last):
  File "<pyshell#33>", line 1, in <module>
    pdf('hello world')
  File "<pyshell#28>", line 2, in pdf
    x = float(x - mu) / sigma
TypeError: unsupported operand type(s) for -: 'str' and 'float'
>>> pdf(10**234)
0.0
>>> pdf(10**-326)
0.3989422804014327
>>> pdf(2523, mu=234234, sigma=3425)
0.0

So... maths/gaussian.py seems correct, but the OP might be right about the performance (unnecessary squaring and square-rooting).

cclauss · 2020-07-25T20:05:38Z

Thanks for the explanation. Should we close? @stkain, your sense?

stkain · 2020-07-26T00:56:33Z

Hello folks,

@spamegg1 's calls gave the same results because sigma was never specified (defaulting to 1.0)

Try something like:

import numpy as np 
import matplotlib.pyplot as plt 

def wrong_gaussian(x, mu, sigma): # wrong gaussian from math/gaussian.py
    return 1 / sqrt(2 * pi * sigma ** 2) * exp(-((x - mu) ** 2) / 2 * sigma ** 2) 

def gaussian(x, mu, sigma): # propper gaussian
    return 1 / (sqrt(2 * pi) * sigma) * exp(- 0.5 * ( (x - mu) / sigma     )**2 ) 

x = np.linspace(-5, 5, 1024) 
plt.figure(num=1, figsize=(10, 5)) 
plt.clf() 
plt.plot(x, gaussian(x, 0, 1), 'k', label="normal distribution with sigma=1.0") 
plt.plot(x, gaussian(x, 0, 2), 'b', label="proper gaussian with sigma=2.0") 
plt.plot(x, wrong_gaussian(x, 0, 2), 'r', label="wrong gaussian with sigma=2.0") 
plt.legend() 
plt.xlabel('x') 
plt.ylabel('pdf') 
plt.grid()

in ipython or jupyter
and you will notice the difference.
Larger sigma should make the bell curve flatter and broader, not narrower.
That is why sigma must be in the denominator, not the numerator.

Bye,
Stefan

stkain · 2020-07-26T01:04:09Z

Correction.
The snippets above only work with ipython --pylab (my default :-) )

The prefixes for numpy and pyplot have to be explicitly specified.
So instead of the above, try this:

import numpy as np 
import matplotlib.pyplot as plt 

def wrong_gaussian(x, mu, sigma): # wrong gaussian from math/gaussian.py
    return 1 / np.sqrt(2 * np.pi * sigma ** 2) * np.exp(-((x - mu) ** 2) / 2 * sigma ** 2) 

def gaussian(x, mu, sigma): # propper gaussian
    return 1 / (np.sqrt(2 * np.pi) * sigma) * np.exp(- 0.5 * ( (x - mu) / sigma     )**2 ) 

x = np.linspace(-5, 5, 1024) 
plt.figure(num=1, figsize=(10, 5)) 
plt.clf() 
plt.plot(x, gaussian(x, 0, 1), 'k', label="normal distribution with sigma=1.0") 
plt.plot(x, gaussian(x, 0, 2), 'b', label="proper gaussian with sigma=2.0") 
plt.plot(x, wrong_gaussian(x, 0, 2), 'r', label="wrong gaussian with sigma=2.0") 
plt.legend() 
plt.xlabel('x') 
plt.ylabel('pdf') 
plt.grid()

stkain · 2020-07-26T01:54:16Z

The correct formula for the gaussian:

The formula, as implemented in math/gaussian.py:

Please notice the difference in the exponential:
the sigma squared in the denominator of the proper definition versus the sigma squared as a multipying factor in the wrong definition.

spamegg1 · 2020-07-26T05:51:53Z

Hmm OK. In that case the docstring provides insufficient testing. It should specify some different sigma values. The only case with a specified sigma value returns 0.

Let me compare maths/gaussian.py against Sedgewick's implementation again:

>>> import math
>>> def pdf(x, mu=0.0, sigma=1.0):
    x = float(x - mu) / sigma
    return math.exp(-x*x/2.0) / math.sqrt(2.0*math.pi) / sigma

>>> from numpy import exp, pi, sqrt
>>> def gaussian(x, mu: float = 0.0, sigma: float = 1.0) -> int:
	return 1 / sqrt(2 * pi * sigma ** 2) * exp(-((x - mu) ** 2) / 2 * sigma ** 2)

>>> pdf(1, 2, 3)
0.12579440923099774
>>> gaussian(1, 2, 3)
0.001477282803979336

They are indeed different!

Adding parentheses around 2 * sigma ** 2 fixes it:

>>> def gaussian(x, mu: float = 0.0, sigma: float = 1.0) -> int:
	return 1 / sqrt(2 * pi * sigma ** 2) * exp(-((x - mu) ** 2) / (2 * sigma ** 2))

>>> gaussian(1, 2, 3)
0.12579440923099774

Strangely, for larger values they are the same (I guess the difference is further down the decimals):

>>> pdf(10, mu=10, sigma=10)
0.03989422804014327
>>> gaussian(10, mu=10, sigma=10)
0.03989422804014327
>>> pdf(10, mu=10, sigma=100)
0.003989422804014327
>>> gaussian(10, mu=10, sigma=100)
0.003989422804014327

OK, so I propose that the implementation in maths/gaussian.py should just add parentheses around the 2 * sigma ** 2 (then it gives the correct answer), and add some docstring cases with small sigma values that are different than 1.

If the OP would like to make a PR that is fine. @cclauss You can close it once someone makes a PR.

spamegg1 · 2020-08-01T14:53:34Z

#2249 was merged in, should this be closed now?

stkain · 2020-08-06T16:52:21Z

Hi,

thx. Yes, the message can be closed.

Bye,
Stefan

cclauss added the help wanted label Jul 18, 2020

spamegg1 mentioned this issue Jul 29, 2020

Correcting the Gaussian Formula #2249

Merged

14 tasks

stkain closed this as completed Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

maths/gaussian.py is wrong #2212

maths/gaussian.py is wrong #2212

stkain commented Jul 18, 2020

cclauss commented Jul 18, 2020 •

edited

Loading

Uh oh!

spamegg1 commented Jul 25, 2020 •

edited

Loading

Uh oh!

cclauss commented Jul 25, 2020

Uh oh!

stkain commented Jul 26, 2020

Uh oh!

stkain commented Jul 26, 2020

Uh oh!

stkain commented Jul 26, 2020

Uh oh!

spamegg1 commented Jul 26, 2020 •

edited

Loading

Uh oh!

spamegg1 commented Aug 1, 2020

Uh oh!

stkain commented Aug 6, 2020

Uh oh!

Uh oh!

maths/gaussian.py is wrong #2212

maths/gaussian.py is wrong #2212

Comments

stkain commented Jul 18, 2020

cclauss commented Jul 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spamegg1 commented Jul 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cclauss commented Jul 25, 2020

Uh oh!

stkain commented Jul 26, 2020

Uh oh!

stkain commented Jul 26, 2020

Uh oh!

stkain commented Jul 26, 2020

Uh oh!

spamegg1 commented Jul 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spamegg1 commented Aug 1, 2020

Uh oh!

stkain commented Aug 6, 2020

Uh oh!

cclauss commented Jul 18, 2020 •

edited

Loading

spamegg1 commented Jul 25, 2020 •

edited

Loading

spamegg1 commented Jul 26, 2020 •

edited

Loading