statistics - Calculate how a value differs from the average of values using the Gaussian Kernel Density (Python) -
i use code calculate gaussian kernel density on values
from random import randint x_grid=[] in range(1000): x_grid.append(randint(0,4)) print (x_grid)
this code calculate gaussian kernel density
from statsmodels.nonparametric.kde import kdeunivariate import matplotlib.pyplot plt def kde_statsmodels_u(x, x_grid, bandwidth=0.2, **kwargs): """univariate kernel density estimation statsmodels""" kde = kdeunivariate(x) kde.fit(bw=bandwidth, **kwargs) return kde.evaluate(x_grid) import numpy np scipy.stats.distributions import norm # grid we'll use plotting random import randint x_grid=[] in range(1000): x_grid.append(randint(0,4)) print (x_grid) # draw points bimodal distribution in 1d np.random.seed(0) x = np.concatenate([norm(-1, 1.).rvs(400), norm(1, 0.3).rvs(100)]) pdf_true = (0.8 * norm(-1, 1).pdf(x_grid) + 0.2 * norm(1, 0.3).pdf(x_grid)) # plot 3 kernel density estimates fig, ax = plt.subplots(1, 2, sharey=true, figsize=(13, 8)) fig.subplots_adjust(wspace=0) pdf=kde_statsmodels_u(x, x_grid, bandwidth=0.2) ax[0].plot(x_grid, pdf, color='blue', alpha=0.5, lw=3) ax[0].fill(x_grid, pdf_true, ec='gray', fc='gray', alpha=0.4) ax[0].set_title("kde_statsmodels_u") ax[0].set_xlim(-4.5, 3.5) plt.show()
all values in grid between 0 e 4. if receive new value of 5 want calculate how value differs average values , assign score between 0 , 1. (setting threshold)
so if receive new value 5 score must close 0.90, while if receive new value 500 score must close 0.0.
how can that? function calculate gaussian kernel density correct or there better way/library that?
* update * read example in paper. weight of washing machine typically of 100 kg. vendors use kg unit refer capacity (example 9 kg). human easy understand 9 gk capacity , not total weight of washing machine. can “fake” form of intelligence without deep language understanding, instead modeling distribution of values on training data each attribute.
for given attribute (weight of washing machine example), let va = {va1, va2, . . . van} (|va| = n) set of values of attribute corresponding products in training data. if found new value v intuitively “close” (the distribution estimated from) va, should feel more confident assigning value (example weight of washing machine).
an idea measure number of standard deviations new value v differs average of values in va better 1 model (gaussian) kernel density on va, , express support @ new value v density @ point:
where σ^(2)ak variance of kth gaussian, , z constant make sure s(c.s.v, va) ∈ [0, 1]. how can obtain in python using statsmodels library?
* updated 2 * example of data... think not important... generated code...
from random import randint x_grid=[] in range(1000): x_grid.append(randint(1,3)) print (x_grid)
[2, 2, 1, 2, 2, 3, 1, 1, 1, 2, 2, 2, 1, 1, 3, 3, 1, 2, 1, 3, 2, 3, 3, 1, 2, 3, 1, 1, 3, 2, 2, 1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 2, 2, 3, 3, 2, 2, 1, 2, 1, 2, 2, 3, 3, 1, 1, 2, 3, 3, 2, 1, 2, 3, 3, 3, 3, 2, 1, 3, 2, 2, 1, 3, 3, 1, 2, 1, 3, 2, 3, 3, 1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 1, 2, 1, 1, 2, 3, 2, 1, 2, 2, 2, 3, 2, 3, 3, 1, 1, 3, 2, 1, 1, 3, 3, 3, 2, 1, 2, 2, 1, 3, 2, 3, 1, 3, 1, 2, 3, 1, 3, 2, 2, 1, 1, 2, 2, 3, 1, 1, 3, 2, 2, 1, 2, 1, 2, 3, 1, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 3, 2, 1, 1, 3, 2, 2, 2, 3, 2, 1, 3, 2, 1, 1, 3, 3, 3, 2, 1, 1, 3, 2, 1, 2, 2, 2, 1, 3, 1, 3, 2, 3, 1, 2, 1, 1, 2, 2, 2, 3, 3, 3, 3, 2, 2, 2, 3, 1, 1, 2, 2, 1, 1, 1, 3, 3, 3, 3, 1, 3, 1, 3, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 3, 1, 2, 3, 1, 3, 2, 2, 2, 2, 2, 1, 1, 2, 3, 1, 1, 1, 3, 1, 3, 2, 2, 3, 1, 3, 3, 2, 2, 3, 2, 1, 2, 1, 1, 1, 2, 2, 3, 2, 1, 1, 3, 1, 2, 1, 3, 3, 3, 1, 2, 2, 2, 1, 1, 2, 2, 1, 2, 3, 1, 3, 2, 2, 2, 2, 2, 2, 1, 3, 1, 3, 3, 2, 3, 2, 1, 3, 3, 3, 3, 3, 1, 2, 2, 2, 1, 1, 3, 2, 3, 1, 2, 3, 2, 3, 2, 1, 1, 3, 3, 1, 1, 2, 3, 2, 3, 3, 2, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 2, 1, 1, 2, 3, 2, 3, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 3, 1, 1, 2, 3, 1, 1, 2, 3, 1, 2, 3, 1, 2, 1, 3, 3, 2, 2, 3, 3, 3, 2, 1, 1, 2, 2, 3, 2, 3, 2, 1, 1, 1, 1, 2, 3, 1, 3, 3, 3, 2, 1, 2, 3, 1, 2, 1, 1, 2, 3, 3, 1, 1, 3, 2, 1, 3, 3, 2, 1, 1, 3, 1, 3, 1, 2, 2, 1, 3, 3, 2, 3, 1, 1, 3, 1, 2, 2, 1, 3, 2, 3, 1, 1, 3, 1, 3, 1, 2, 1, 3, 2, 2, 2, 2, 1, 3, 2, 1, 3, 3, 2, 3, 2, 1, 3, 1, 2, 1, 2, 3, 2, 3, 2, 3, 3, 2, 3, 3, 1, 1, 3, 2, 3, 2, 2, 2, 3, 1, 3, 2, 2, 3, 3, 2, 3, 2, 2, 2, 3, 3, 1, 3, 2, 3, 1, 1, 2, 1, 3, 1, 2, 2, 3, 3, 1, 3, 1, 1, 2, 2, 1, 3, 3, 3, 1, 2, 2, 2, 1, 3, 1, 2, 2, 2, 3, 3, 3, 1, 1, 2, 3, 3, 1, 1, 2, 3, 2, 3, 3, 2, 2, 1, 3, 3, 3, 3, 2, 3, 1, 3, 3, 2, 1, 3, 2, 1, 1, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 2, 3, 3, 3, 2, 1, 3, 1, 1, 1, 1, 3, 1, 2, 3, 3, 3, 2, 3, 1, 2, 2, 2, 3, 2, 1, 2, 3, 3, 2, 3, 3, 1, 2, 3, 3, 3, 3, 2, 3, 3, 2, 1, 1, 1, 2, 3, 1, 3, 3, 2, 1, 3, 3, 3, 2, 2, 1, 2, 3, 2, 3, 3, 3, 3, 2, 3, 2, 1, 2, 1, 1, 3, 3, 3, 2, 2, 3, 1, 3, 2, 1, 3, 1, 1, 3, 3, 1, 2, 2, 2, 3, 3, 1, 2, 1, 2, 1, 3, 2, 3, 3, 3, 3, 3, 3, 3, 1, 2, 3, 1, 3, 3, 2, 2, 1, 3, 1, 1, 3, 2, 1, 2, 3, 2, 1, 3, 3, 3, 2, 3, 1, 2, 3, 3, 1, 2, 2, 2, 3, 1, 2, 1, 1, 1, 3, 1, 3, 1, 3, 3, 2, 3, 1, 3, 2, 3, 3, 1, 2, 1, 3, 2, 2, 2, 2, 2, 2, 1, 2, 2, 3, 2, 2, 3, 2, 2, 2, 3, 1, 1, 3, 3, 1, 3, 1, 2, 1, 2, 1, 3, 2, 2, 1, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 3, 2, 1, 2, 3, 1, 1, 3, 1, 1, 3, 1, 3, 3, 3, 1, 1, 3, 1, 3, 2, 2, 2, 1, 1, 2, 3, 3, 2, 3, 3, 1, 2, 3, 2, 2, 3, 1, 2, 2, 2, 1, 1, 3, 1, 2, 2, 2, 1, 1, 2, 3, 1, 3, 1, 1, 3, 2, 2, 3, 2, 2, 3, 3, 1, 1, 2, 2, 3, 1, 1, 2, 3, 2, 2, 3, 1, 2, 2, 1, 1, 3, 2, 3, 1, 1, 3, 1, 3, 2, 3, 3, 3, 3, 3, 2, 2, 3, 2, 1, 1, 1, 3, 3, 1, 2, 1, 3, 2, 3, 2, 2, 1, 2, 3, 3, 1, 1, 1, 1, 3, 3, 1, 3, 3, 1, 1, 3, 1, 3, 1, 3, 2, 3, 1, 3, 3, 3, 1, 1, 2, 2, 3, 2, 3, 2, 2, 1, 2, 1, 2, 1, 2, 2, 3, 1, 1, 3, 2, 2, 3, 2, 3, 3, 2, 2, 2, 2, 2, 2, 3, 2, 3, 1, 2, 2, 1, 1, 2, 3, 3, 1, 3, 3, 1, 3, 3, 1, 3, 2, 2, 2, 1, 1, 2, 1, 3, 1, 1, 1, 2, 3, 3, 2, 3, 1, 3]
this array represents ram of new smartphones in market... have 1,2,3 gb of ram.
that's kernel density
*** update
i try code values
[1024, 1, 1024, 1000, 1024, 128, 1536, 16, 192, 2048, 2000, 2048, 24, 250, 256, 278, 288, 290, 3072, 3, 3000, 3072, 32, 384, 4096, 4, 4096, 448, 45, 512, 576, 64, 768, 8, 96]
the values in mb... think working well? think must set threshold
100% cdfv kdev 1 42 0.210097 0.499734 1024 96 0.479597 0.499983 5000 0 0.000359 0.498885 2048 36 0.181609 0.499700 3048 8 0.040299 0.499424
* update 3 *
[256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 512, 512, 512, 256, 256, 256, 512, 512, 512, 128, 128, 128, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 2048, 2048, 2048, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 128, 128, 128, 512, 512, 512, 256, 256, 256, 256, 256, 256, 1024, 1024, 1024, 512, 512, 512, 128, 128, 128, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 4, 4, 4, 3, 3, 3, 24, 24, 24, 8, 8, 8, 16, 16, 16, 16, 16, 16, 256, 256, 256, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 512, 512, 512, 512, 512, 512, 256, 256, 256, 256, 256, 256, 256, 256, 256, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 2048, 2048, 2048, 2048, 2048, 2048, 4096, 4096, 4096, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 768, 768, 768, 768, 768, 768, 2048, 2048, 2048, 2048, 2048, 2048, 3072, 3072, 3072, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 1024, 1024, 1024, 512, 512, 512, 256, 256, 256, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 3072, 3072, 3072, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 512, 512, 512, 256, 256, 256, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 512, 512, 512, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 64, 64, 64, 1024, 1024, 1024, 1024, 1024, 1024, 256, 256, 256, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 64, 64, 64, 64, 64, 64, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 128, 128, 128, 576, 576, 576, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 576, 576, 576, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 512, 512, 512, 2048, 2048, 2048, 768, 768, 768, 768, 768, 768, 768, 768, 768, 512, 512, 512, 192, 192, 192, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 384, 384, 384, 448, 448, 448, 576, 576, 576, 384, 384, 384, 288, 288, 288, 768, 768, 768, 384, 384, 384, 288, 288, 288, 64, 64, 64, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 3072, 3072, 3072, 2048, 2048, 2048, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 64, 64, 64, 128, 128, 128, 128, 128, 128, 128, 128, 128, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 256, 256, 256, 768, 768, 768, 768, 768, 768, 768, 768, 768, 256, 256, 256, 192, 192, 192, 256, 256, 256, 64, 64, 64, 256, 256, 256, 192, 192, 192, 128, 128, 128, 256, 256, 256, 192, 192, 192, 288, 288, 288, 288, 288, 288, 288, 288, 288, 288, 288, 288, 128, 128, 128, 128, 128, 128, 384, 384, 384, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 3072, 3072, 3072, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 3072, 3072, 3072, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 32, 32, 32, 768, 768, 768, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 3072, 3072, 3072, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 512, 512, 512, 512, 512, 512, 256, 256, 256, 512, 512, 512, 512, 512, 512, 512, 512, 512, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 128, 128, 128, 128, 128, 128, 1024, 1024, 1024, 1024, 1024, 1024, 128, 128, 128, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 3072, 3072, 3072, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 256, 256, 256, 256, 256, 256, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 3072, 3072, 3072, 2048, 2048, 2048, 384, 384, 384, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 3072, 3072, 3072, 3072, 3072, 3072, 3072, 3072, 3072, 128, 128, 128, 256, 256, 256, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 768, 768, 768, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 128, 128, 128, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 64, 64, 64, 64, 64, 64, 256, 256, 256, 512, 512, 512, 512, 512, 512, 512, 512, 512, 16, 16, 16, 3072, 3072, 3072, 3072, 3072, 3072, 256, 256, 256, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 512, 512, 512, 32, 32, 32, 1024, 1024, 1024, 1024, 1024, 1024, 256, 256, 256, 256, 256, 256, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 32, 32, 32, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 512, 512, 512, 1, 1, 1, 1024, 1024, 1024, 32, 32, 32, 32, 32, 32, 45, 45, 45, 8, 8, 8, 512, 512, 512, 256, 256, 256, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 16, 16, 16, 4, 4, 4, 4, 4, 4, 4, 4, 4, 16, 16, 16, 16, 16, 16, 16, 16, 16, 64, 64, 64, 8, 8, 8, 8, 8, 8, 8, 8, 8, 64, 64, 64, 64, 64, 64, 256, 256, 256, 64, 64, 64, 64, 64, 64, 512, 512, 512, 512, 512, 512, 512, 512, 512, 32, 32, 32, 32, 32, 32, 32, 32, 32, 128, 128, 128, 128, 128, 128, 128, 128, 128, 32, 32, 32, 128, 128, 128, 64, 64, 64, 64, 64, 64, 16, 16, 16, 256, 256, 256, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 256, 256, 256, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 256, 256, 256, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 256, 256, 256, 256, 256, 256, 1024, 1024, 1024, 1024, 1024, 1024, 256, 256, 256, 3072, 3072, 3072, 3072, 3072, 3072, 128, 128, 128, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 128, 128, 128, 128, 128, 128, 64, 64, 64, 256, 256, 256, 256, 256, 256, 512, 512, 512, 768, 768, 768, 768, 768, 768, 16, 16, 16, 32, 32, 32, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 512, 512, 512, 2048, 2048, 2048, 1024, 1024, 1024, 3072, 3072, 3072, 3072, 3072, 3072, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 3072, 3072, 3072, 3072, 3072, 3072, 3072, 3072, 3072, 3072, 3072, 3072, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 3072, 3072, 3072, 3072, 3072, 3072, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 64, 64, 64, 96, 96, 96, 512, 512, 512, 64, 64, 64, 64, 64, 64, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 3072, 3072, 3072, 3072, 3072, 3072, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 512, 512, 512, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 64, 64, 64, 64, 64, 64, 256, 256, 256, 1024, 1024, 1024, 512, 512, 512, 256, 256, 256, 512, 512, 512, 1024, 1024, 1024, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 3072, 3072, 3072, 3072, 3072, 3072, 2048, 2048, 2048, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 1024, 1024, 1024, 2048, 2048, 2048, 3072, 3072, 3072, 2048, 2048, 2048]
with data if try new value number
# new values x = np.asarray([128,512,1024,2048,3072,2800])
something goes wrong 3072 (all values in mb).
this result:
100% cdfv kdev 128 26 0.129688 0.499376 512 55 0.275874 0.499671 1024 91 0.454159 0.499936 2048 12 0.062298 0.499150 3072 0 0.001556 0.498364 2800 1 0.004954 0.498573
i can't understand why happens... 3072 value appears lot of time in data... histogram of datas... strange because there values 3072 , 4096.
a few general comments without going statsmodels details.
statsmodels has cdf kernels, don't remember how work, , don't think has automatic bandwidth selection it.
related answer of glen_b ali_m linked in comment:
the cdf estimate converges faster true distribution estimate of density sample grows. balance bias - variance tradeoff should use smaller bandwidth cdf kernels, undersmooth relative density estimation. estimates should more accurate corresponding density estimates.
number of tail observations:
if largest observation in sample 4 , want know cdf @ 5, data has no information it. tails have few observations variance of nonparametric estimator kernel distribution estimators large in relative terms (is 1e-5 or 1e-20?).
as alternative kernel density or kernel distribution estimation, can estimate pareto distribution tail parts. example, take largest 10 or 20 percent of observations , fit pareto distribution, , use extrapolate tail density. there several python packages powerlaw estimation, might used this.
update
the following shows how calculate "outlyingness" using parametric normal distribution assumption , gaussian kernel density estimate fixed bandwidth.
this correct if sample comes continuous distribution or can approximated continuous distribution. here pretend sample has 3 distinct values comes normal distribution. essentially, calculated cdf value distance measure not probability discrete random variable.
this uses kde scipy.stats fixed bandwidth instead of statsmodels version.
i'm not sure how bandwidth set in scipy's gaussian_kde, so, fixed bandwidth choice equal scale
wrong. don't know how choose bandwidth if there 3 distinct values, there not enough information in data. default bandwidth intended distributions approximately normal, or @ least single peaked.
import numpy np scipy import stats # data ram = np.array([2, <truncated data in description>, 3]) loc = ram.mean() scale = ram.std() # new values x = np.asarray([-1, 0, 2, 3, 4, 5, 100]) # assume normal distribution cdf_val = stats.norm.cdf(x, loc=loc, scale=scale) cdfv = np.minimum(cdf_val, 1 - cdf_val) # use gaussian kde fix bandwidth kde = stats.gaussian_kde(ram, bw_method=scale) kde_val = np.asarray([kde.integrate_box_1d(-np.inf, xx) xx in x]) kdev = np.minimum(kde_val, 1 - kde_val) #print(np.column_stack((x, cdfv, kdev))) # use pandas prettier table import pandas pd print(pd.dataframe({'cdfv': cdfv, 'kdev': kdev}, index=x)) ''' cdfv kdev -1 0.000096 0.000417 0 0.006171 0.021262 2 0.479955 0.482227 3 0.119854 0.199565 5 0.000143 0.000472 100 0.000000 0.000000 '''
Comments
Post a Comment