Package Biskit :: Package Statistics :: Module stats
[hide private]
[frames] | no frames]

Source Code for Module Biskit.Statistics.stats

   1  # Copyright (c) 1999-2000 Gary Strangman; All Rights Reserved. 
   2  # 
   3  # This software is distributable under the terms of the GNU 
   4  # General Public License (GPL) v2, the text of which can be found at 
   5  # http://www.gnu.org/copyleft/gpl.html. Installing, importing or otherwise 
   6  # using this module constitutes acceptance of the terms of this License. 
   7  # 
   8  # Disclaimer 
   9  #  
  10  # This software is provided "as-is".  There are no expressed or implied 
  11  # warranties of any kind, including, but not limited to, the warranties 
  12  # of merchantability and fittness for a given application.  In no event 
  13  # shall Gary Strangman be liable for any direct, indirect, incidental, 
  14  # special, exemplary or consequential damages (including, but not limited 
  15  # to, loss of use, data or profits, or business interruption) however 
  16  # caused and on any theory of liability, whether in contract, strict 
  17  # liability or tort (including negligence or otherwise) arising in any way 
  18  # out of the use of this software, even if advised of the possibility of 
  19  # such damage. 
  20  # 
  21  # Comments and/or additions are welcome (send e-mail to: 
  22  # strang@nmr.mgh.harvard.edu). 
  23  #  
  24  """ 
  25  stats.py module 
  26   
  27  (Requires pstat.py module.) 
  28   
  29  :: 
  30    ################################################# 
  31    #######  Written by:  Gary Strangman  ########### 
  32    #######  Last modified:  Dec 28, 2000 ########### 
  33    ################################################# 
  34   
  35  A collection of basic statistical functions for python.  The function 
  36  names appear below. 
  37   
  38  IMPORTANT:  There are really *3* sets of functions.  The first set has an 'l' 
  39  prefix, which can be used with list or tuple arguments.  The second set has 
  40  an 'a' prefix, which can accept NumPy array arguments.  These latter 
  41  functions are defined only when NumPy is available on the system.  The third 
  42  type has NO prefix (i.e., has the name that appears below).  Functions of 
  43  this set are members of a "Dispatch" class, c/o David Ascher.  This class 
  44  allows different functions to be called depending on the type of the passed 
  45  arguments.  Thus, stats.mean is a member of the Dispatch class and 
  46  stats.mean(range(20)) will call stats.lmean(range(20)) while 
  47  stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)). 
  48  This is a handy way to keep consistent function names when different 
  49  argument types require different functions to be called.  Having 
  50  implementated the Dispatch class, however, means that to get info on 
  51  a given function, you must use the REAL function name ... that is 
  52  "print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine, 
  53  while "print stats.mean.__doc__" will print the doc for the Dispatch 
  54  class.  NUMPY FUNCTIONS ('a' prefix) generally have more argument options 
  55  but should otherwise be consistent with the corresponding list functions. 
  56   
  57  Disclaimers:  The function list is obviously incomplete and, worse, the 
  58  functions are not optimized.  All functions have been tested (some more 
  59  so than others), but they are far from bulletproof.  Thus, as with any 
  60  free software, no warranty or guarantee is expressed or implied. :-)  A 
  61  few extra functions that don't appear in the list below can be found by 
  62  interested treasure-hunters.  These functions don't necessarily have 
  63  both list and array versions but were deemed useful:: 
  64   
  65      CENTRAL TENDENCY:  geometricmean 
  66                         harmonicmean 
  67                         mean 
  68                         median 
  69                         medianscore 
  70                         mode 
  71   
  72      MOMENTS:  moment 
  73                variation 
  74                skew 
  75                kurtosis 
  76                skewtest   (for Numpy arrays only) 
  77                kurtosistest (for Numpy arrays only) 
  78                normaltest (for Numpy arrays only) 
  79   
  80      ALTERED VERSIONS:  tmean  (for Numpy arrays only) 
  81                         tvar   (for Numpy arrays only) 
  82                         tmin   (for Numpy arrays only) 
  83                         tmax   (for Numpy arrays only) 
  84                         tstdev (for Numpy arrays only) 
  85                         tsem   (for Numpy arrays only) 
  86                         describe 
  87   
  88      FREQUENCY STATS:  itemfreq 
  89                        scoreatpercentile 
  90                        percentileofscore 
  91                        histogram 
  92                        cumfreq 
  93                        relfreq 
  94   
  95      VARIABILITY:  obrientransform 
  96                    samplevar 
  97                    samplestdev 
  98                    signaltonoise (for Numpy arrays only) 
  99                    var 
 100                    stdev 
 101                    sterr 
 102                    sem 
 103                    z 
 104                    zs 
 105                    zmap (for Numpy arrays only) 
 106   
 107      TRIMMING FCNS:  threshold (for Numpy arrays only) 
 108                      trimboth 
 109                      trim1 
 110                      round (round all vals to 'n' decimals; Numpy only) 
 111   
 112      CORRELATION FCNS:  covariance  (for Numpy arrays only) 
 113                         correlation (for Numpy arrays only) 
 114                         paired 
 115                         pearsonr 
 116                         spearmanr 
 117                         pointbiserialr 
 118                         kendalltau 
 119                         linregress 
 120   
 121      INFERENTIAL STATS:  ttest_1samp 
 122                          ttest_ind 
 123                          ttest_rel 
 124                          chisquare 
 125                          ks_2samp 
 126                          mannwhitneyu 
 127                          ranksums 
 128                          wilcoxont 
 129                          kruskalwallish 
 130                          friedmanchisquare 
 131   
 132      PROBABILITY CALCS:  chisqprob 
 133                          erfcc 
 134                          zprob 
 135                          ksprob 
 136                          fprob 
 137                          betacf 
 138                          gammln  
 139                          betai 
 140   
 141      ANOVA FUNCTIONS:  F_oneway 
 142                        F_value 
 143   
 144      SUPPORT FUNCTIONS:  writecc 
 145                          incr 
 146                          sign  (for Numpy arrays only) 
 147                          sum 
 148                          cumsum 
 149                          ss 
 150                          summult 
 151                          sumdiffsquared 
 152                          square_of_sums 
 153                          shellsort 
 154                          rankdata 
 155                          outputpairedstats 
 156                      findwithin 
 157  """ 
 158  ## CHANGE LOG: 
 159  ## =========== 
 160  ## 00-12-28 ... removed aanova() to separate module, fixed licensing to 
 161  ##                   match Python License, fixed doc string & imports 
 162  ## 00-04-13 ... pulled all "global" statements, except from aanova() 
 163  ##              added/fixed lots of documentation, removed io.py dependency 
 164  ##              changed to version 0.5 
 165  ## 99-11-13 ... added asign() function 
 166  ## 99-11-01 ... changed version to 0.4 ... enough incremental changes now 
 167  ## 99-10-25 ... added acovariance and acorrelation functions 
 168  ## 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors 
 169  ##              added aglm function (crude, but will be improved) 
 170  ## 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to 
 171  ##                   all handle lists of 'dimension's and keepdims 
 172  ##              REMOVED ar0, ar2, ar3, ar4 and replaced them with around 
 173  ##              reinserted fixes for abetai to avoid math overflows 
 174  ## 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to 
 175  ##                   handle multi-dimensional arrays (whew!) 
 176  ## 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990) 
 177  ##              added anormaltest per same reference 
 178  ##              re-wrote azprob to calc arrays of probs all at once 
 179  ## 99-08-22 ... edited attest_ind printing section so arrays could be rounded 
 180  ## 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on 
 181  ##                   short/byte arrays (mean of #s btw 100-300 = -150??) 
 182  ## 99-08-09 ... fixed asum so that the None case works for Byte arrays 
 183  ## 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays 
 184  ## 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap) 
 185  ## 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0]) 
 186  ## 04/11/99 ... added asignaltonoise, athreshold functions, changed all 
 187  ##                   max/min in array section to N.maximum/N.minimum, 
 188  ##                   fixed square_of_sums to prevent integer overflow 
 189  ## 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums 
 190  ## 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions 
 191  ## 02/28/99 ... Fixed aobrientransform to return an array rather than a list 
 192  ## 01/15/99 ... Essentially ceased updating list-versions of functions (!!!) 
 193  ## 01/13/99 ... CHANGED TO VERSION 0.3 
 194  ##              fixed bug in a/lmannwhitneyu p-value calculation 
 195  ## 12/31/98 ... fixed variable-name bug in ldescribe 
 196  ## 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix) 
 197  ## 12/16/98 ... changed amedianscore to return float (not array) for 1 score 
 198  ## 12/14/98 ... added atmin and atmax functions 
 199  ##              removed umath from import line (not needed) 
 200  ##              l/ageometricmean modified to reduce chance of overflows (take 
 201  ##                   nth root first, then multiply) 
 202  ## 12/07/98 ... added __version__variable (now 0.2) 
 203  ##              removed all 'stats.' from anova() fcn 
 204  ## 12/06/98 ... changed those functions (except shellsort) that altered 
 205  ##                   arguments in-place ... cumsum, ranksort, ... 
 206  ##              updated (and fixed some) doc-strings 
 207  ## 12/01/98 ... added anova() function (requires NumPy) 
 208  ##              incorporated Dispatch class 
 209  ## 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean 
 210  ##              added 'asum' function (added functionality to N.add.reduce) 
 211  ##              fixed both moment and amoment (two errors) 
 212  ##              changed name of skewness and askewness to skew and askew 
 213  ##              fixed (a)histogram (which sometimes counted points <lowerlimit) 
 214   
 215  import pstat               # required 3rd party module 
 216  import math, string, copy  # required python modules 
 217  from types import * 
 218   
 219  __version__ = 0.5 
 220   
 221  ############# DISPATCH CODE ############## 
 222   
 223   
224 -class Dispatch:
225 """ 226 The Dispatch class, care of David Ascher, allows different functions to 227 be called depending on the argument types. This way, there can be one 228 function name regardless of the argument type. To access function doc 229 in stats.py module, prefix the function with an 'l' or 'a' for list or 230 array arguments, respectively. That is, print stats.lmean.__doc__ or 231 print stats.amean.__doc__ or whatever. 232 """ 233
234 - def __init__(self, *tuples):
235 self._dispatch = {} 236 for func, types in tuples: 237 for t in types: 238 if t in self._dispatch.keys(): 239 raise ValueError, "can't have two dispatches on "+str(t) 240 self._dispatch[t] = func 241 self._types = self._dispatch.keys()
242
243 - def __call__(self, arg1, *args, **kw):
244 if type(arg1) not in self._types: 245 raise TypeError, "don't know how to dispatch %s arguments" % type(arg1) 246 return apply(self._dispatch[type(arg1)], (arg1,) + args, kw)
247 248 249 ########################################################################## 250 ######################## LIST-BASED FUNCTIONS ######################## 251 ########################################################################## 252 253 ### Define these regardless 254 255 #################################### 256 ####### CENTRAL TENDENCY ######### 257 #################################### 258
259 -def lgeometricmean (inlist):
260 """ 261 Calculates the geometric mean of the values in the passed list. 262 That is: n-th root of (x1 * x2 * ... * xn). Assumes a '1D' list. 263 264 Usage: lgeometricmean(inlist) 265 """ 266 mult = 1.0 267 one_over_n = 1.0/len(inlist) 268 for item in inlist: 269 mult = mult * pow(item,one_over_n) 270 return mult
271 272
273 -def lharmonicmean (inlist):
274 """ 275 Calculates the harmonic mean of the values in the passed list. 276 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Assumes a '1D' list. 277 278 Usage: lharmonicmean(inlist) 279 """ 280 sum = 0 281 for item in inlist: 282 sum = sum + 1.0/item 283 return len(inlist) / sum
284 285
286 -def lmean (inlist):
287 """ 288 Returns the arithematic mean of the values in the passed list. 289 Assumes a '1D' list, but will function on the 1st dim of an array(!). 290 291 Usage: lmean(inlist) 292 """ 293 sum = 0 294 for item in inlist: 295 sum = sum + item 296 return sum/float(len(inlist))
297 298
299 -def lmedian (inlist,numbins=1000):
300 """ 301 Returns the computed median value of a list of numbers, given the 302 number of bins to use for the histogram (more bins brings the computed value 303 closer to the median score, default number of bins = 1000). See G.W. 304 Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics. 305 306 Usage: lmedian (inlist, numbins=1000) 307 """ 308 (hist, smallest, binsize, extras) = histogram(inlist,numbins) # make histog 309 cumhist = cumsum(hist) # make cumulative histogram 310 for i in range(len(cumhist)): # get 1st(!) index holding 50%ile score 311 if cumhist[i]>=len(inlist)/2.0: 312 cfbin = i 313 break 314 LRL = smallest + binsize*cfbin # get lower read limit of that bin 315 cfbelow = cumhist[cfbin-1] 316 freq = float(hist[cfbin]) # frequency IN the 50%ile bin 317 median = LRL + ((len(inlist)/2.0 - cfbelow)/float(freq))*binsize # median formula 318 return median
319 320
321 -def lmedianscore (inlist):
322 """ 323 Returns the 'middle' score of the passed list. If there is an even 324 number of scores, the mean of the 2 middle scores is returned. 325 326 Usage: lmedianscore(inlist) 327 """ 328 329 newlist = copy.deepcopy(inlist) 330 newlist.sort() 331 if len(newlist) % 2 == 0: # if even number of scores, average middle 2 332 index = len(newlist)/2 # integer division correct 333 median = float(newlist[index] + newlist[index-1]) /2 334 else: 335 index = len(newlist)/2 # int divsion gives mid value when count from 0 336 median = newlist[index] 337 return median
338 339
340 -def lmode(inlist):
341 """ 342 Returns a list of the modal (most common) score(s) in the passed 343 list. If there is more than one such score, all are returned. The 344 bin-count for the mode(s) is also returned. 345 346 Usage: lmode(inlist) 347 Returns: bin-count for mode(s), a list of modal value(s) 348 """ 349 350 scores = pstat.unique(inlist) 351 scores.sort() 352 freq = [] 353 for item in scores: 354 freq.append(inlist.count(item)) 355 maxfreq = max(freq) 356 mode = [] 357 stillmore = 1 358 while stillmore: 359 try: 360 indx = freq.index(maxfreq) 361 mode.append(scores[indx]) 362 del freq[indx] 363 del scores[indx] 364 except ValueError: 365 stillmore=0 366 return maxfreq, mode
367 368 369 #################################### 370 ############ MOMENTS ############# 371 #################################### 372
373 -def lmoment(inlist,moment=1):
374 """ 375 Calculates the nth moment about the mean for a sample (defaults to 376 the 1st moment). Used to calculate coefficients of skewness and kurtosis. 377 378 Usage: lmoment(inlist,moment=1) 379 Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r) 380 """ 381 if moment == 1: 382 return 0.0 383 else: 384 mn = mean(inlist) 385 n = len(inlist) 386 s = 0 387 for x in inlist: 388 s = s + (x-mn)**moment 389 return s/float(n)
390 391
392 -def lvariation(inlist):
393 """ 394 Returns the coefficient of variation, as defined in CRC Standard 395 Probability and Statistics, p.6. 396 397 Usage: lvariation(inlist) 398 """ 399 return 100.0*samplestdev(inlist)/float(mean(inlist))
400 401
402 -def lskew(inlist):
403 """ 404 Returns the skewness of a distribution, as defined in Numerical 405 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) 406 407 Usage: lskew(inlist) 408 """ 409 return moment(inlist,3)/pow(moment(inlist,2),1.5)
410 411
412 -def lkurtosis(inlist):
413 """ 414 Returns the kurtosis of a distribution, as defined in Numerical 415 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) 416 417 Usage: lkurtosis(inlist) 418 """ 419 return moment(inlist,4)/pow(moment(inlist,2),2.0)
420 421
422 -def ldescribe(inlist):
423 """ 424 Returns some descriptive statistics of the passed list (assumed to be 1D). 425 426 Usage: ldescribe(inlist) 427 Returns: n, mean, standard deviation, skew, kurtosis 428 """ 429 n = len(inlist) 430 mm = (min(inlist),max(inlist)) 431 m = mean(inlist) 432 sd = stdev(inlist) 433 sk = skew(inlist) 434 kurt = kurtosis(inlist) 435 return n, mm, m, sd, sk, kurt
436 437 438 #################################### 439 ####### FREQUENCY STATS ########## 440 #################################### 441
442 -def litemfreq(inlist):
443 """ 444 Returns a list of pairs. Each pair consists of one of the scores in inlist 445 and it's frequency count. Assumes a 1D list is passed. 446 447 Usage: litemfreq(inlist) 448 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) 449 """ 450 scores = pstat.unique(inlist) 451 scores.sort() 452 freq = [] 453 for item in scores: 454 freq.append(inlist.count(item)) 455 return pstat.abut(scores, freq)
456 457
458 -def lscoreatpercentile (inlist, percent):
459 """ 460 Returns the score at a given percentile relative to the distribution 461 given by inlist. 462 463 Usage: lscoreatpercentile(inlist,percent) 464 """ 465 if percent > 1: 466 print "\nDividing percent>1 by 100 in lscoreatpercentile().\n" 467 percent = percent / 100.0 468 targetcf = percent*len(inlist) 469 h, lrl, binsize, extras = histogram(inlist) 470 cumhist = cumsum(copy.deepcopy(h)) 471 for i in range(len(cumhist)): 472 if cumhist[i] >= targetcf: 473 break 474 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) 475 return score
476 477
478 -def lpercentileofscore (inlist, score,histbins=10,defaultlimits=None):
479 """ 480 Returns the percentile value of a score relative to the distribution 481 given by inlist. Formula depends on the values used to histogram the data(!). 482 483 Usage: lpercentileofscore(inlist,score,histbins=10,defaultlimits=None) 484 """ 485 486 h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits) 487 cumhist = cumsum(copy.deepcopy(h)) 488 i = int((score - lrl)/float(binsize)) 489 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100 490 return pct
491 492
493 -def lhistogram (inlist,numbins=10,defaultreallimits=None,printextras=0):
494 """ 495 Returns (i) a list of histogram bin counts, (ii) the smallest value 496 of the histogram binning, and (iii) the bin width (the last 2 are not 497 necessarily integers). Default number of bins is 10. If no sequence object 498 is given for defaultreallimits, the routine picks (usually non-pretty) bins 499 spanning all the numbers in the inlist. 500 501 Usage: lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0) 502 Returns: list of bin values, lowerreallimit, binsize, extrapoints 503 """ 504 if (defaultreallimits <> None): 505 if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1: # only one limit given, assumed to be lower one & upper is calc'd 506 lowerreallimit = defaultreallimits 507 upperreallimit = 1.0001 * max(inlist) 508 else: # assume both limits given 509 lowerreallimit = defaultreallimits[0] 510 upperreallimit = defaultreallimits[1] 511 binsize = (upperreallimit-lowerreallimit)/float(numbins) 512 else: # no limits given for histogram, both must be calc'd 513 estbinwidth=(max(inlist)-min(inlist))/float(numbins) + 1 # 1=>cover all 514 binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins) 515 lowerreallimit = min(inlist) - binsize/2 #lower real limit,1st bin 516 bins = [0]*(numbins) 517 extrapoints = 0 518 for num in inlist: 519 try: 520 if (num-lowerreallimit) < 0: 521 extrapoints = extrapoints + 1 522 else: 523 bintoincrement = int((num-lowerreallimit)/float(binsize)) 524 bins[bintoincrement] = bins[bintoincrement] + 1 525 except: 526 extrapoints = extrapoints + 1 527 if (extrapoints > 0 and printextras == 1): 528 print '\nPoints outside given histogram range =',extrapoints 529 return (bins, lowerreallimit, binsize, extrapoints)
530 531
532 -def lcumfreq(inlist,numbins=10,defaultreallimits=None):
533 """ 534 Returns a cumulative frequency histogram, using the histogram function. 535 536 Usage: lcumfreq(inlist,numbins=10,defaultreallimits=None) 537 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints 538 """ 539 h,l,b,e = histogram(inlist,numbins,defaultreallimits) 540 cumhist = cumsum(copy.deepcopy(h)) 541 return cumhist,l,b,e
542 543
544 -def lrelfreq(inlist,numbins=10,defaultreallimits=None):
545 """ 546 Returns a relative frequency histogram, using the histogram function. 547 548 Usage: lrelfreq(inlist,numbins=10,defaultreallimits=None) 549 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints 550 """ 551 h,l,b,e = histogram(inlist,numbins,defaultreallimits) 552 for i in range(len(h)): 553 h[i] = h[i]/float(len(inlist)) 554 return h,l,b,e
555 556 557 #################################### 558 ##### VARIABILITY FUNCTIONS ###### 559 #################################### 560
561 -def lobrientransform(*args):
562 """ 563 Computes a transform on input data (any number of columns). Used to 564 test for homogeneity of variance prior to running one-way stats. From 565 Maxwell and Delaney, p.112. 566 567 Usage: lobrientransform(*args) 568 Returns: transformed data for use in an ANOVA 569 """ 570 TINY = 1e-10 571 k = len(args) 572 n = [0.0]*k 573 v = [0.0]*k 574 m = [0.0]*k 575 nargs = [] 576 for i in range(k): 577 nargs.append(copy.deepcopy(args[i])) 578 n[i] = float(len(nargs[i])) 579 v[i] = var(nargs[i]) 580 m[i] = mean(nargs[i]) 581 for j in range(k): 582 for i in range(n[j]): 583 t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 584 t2 = 0.5*v[j]*(n[j]-1.0) 585 t3 = (n[j]-1.0)*(n[j]-2.0) 586 nargs[j][i] = (t1-t2) / float(t3) 587 check = 1 588 for j in range(k): 589 if v[j] - mean(nargs[j]) > TINY: 590 check = 0 591 if check <> 1: 592 raise ValueError, 'Problem in obrientransform.' 593 else: 594 return nargs
595 596
597 -def lsamplevar (inlist):
598 """ 599 Returns the variance of the values in the passed list using 600 N for the denominator (i.e., DESCRIBES the sample variance only). 601 602 Usage: lsamplevar(inlist) 603 """ 604 n = len(inlist) 605 mn = mean(inlist) 606 deviations = [] 607 for item in inlist: 608 deviations.append(item-mn) 609 return ss(deviations)/float(n)
610 611
612 -def lsamplestdev (inlist):
613 """ 614 Returns the standard deviation of the values in the passed list using 615 N for the denominator (i.e., DESCRIBES the sample stdev only). 616 617 Usage: lsamplestdev(inlist) 618 """ 619 return math.sqrt(samplevar(inlist))
620 621
622 -def lvar (inlist):
623 """ 624 Returns the variance of the values in the passed list using N-1 625 for the denominator (i.e., for estimating population variance). 626 627 Usage: lvar(inlist) 628 """ 629 n = len(inlist) 630 mn = mean(inlist) 631 deviations = [0]*len(inlist) 632 for i in range(len(inlist)): 633 deviations[i] = inlist[i] - mn 634 return ss(deviations)/float(n-1)
635 636
637 -def lstdev (inlist):
638 """ 639 Returns the standard deviation of the values in the passed list 640 using N-1 in the denominator (i.e., to estimate population stdev). 641 642 Usage: lstdev(inlist) 643 """ 644 return math.sqrt(var(inlist))
645 646
647 -def lsterr(inlist):
648 """ 649 Returns the standard error of the values in the passed list using N-1 650 in the denominator (i.e., to estimate population standard error). 651 652 Usage: lsterr(inlist) 653 """ 654 return stdev(inlist) / float(math.sqrt(len(inlist)))
655 656
657 -def lsem (inlist):
658 """ 659 Returns the estimated standard error of the mean (sx-bar) of the 660 values in the passed list. sem = stdev / sqrt(n) 661 662 Usage: lsem(inlist) 663 """ 664 sd = stdev(inlist) 665 n = len(inlist) 666 return sd/math.sqrt(n)
667 668
669 -def lz (inlist, score):
670 """ 671 Returns the z-score for a given input score, given that score and the 672 list from which that score came. Not appropriate for population calculations. 673 674 Usage: lz(inlist, score) 675 """ 676 z = (score-mean(inlist))/samplestdev(inlist) 677 return z
678 679
680 -def lzs (inlist):
681 """ 682 Returns a list of z-scores, one for each score in the passed list. 683 684 Usage: lzs(inlist) 685 """ 686 zscores = [] 687 for item in inlist: 688 zscores.append(z(inlist,item)) 689 return zscores
690 691 692 #################################### 693 ####### TRIMMING FUNCTIONS ####### 694 #################################### 695
696 -def ltrimboth (l,proportiontocut):
697 """ 698 Slices off the passed proportion of items from BOTH ends of the passed 699 list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost' 700 10% of scores. Assumes list is sorted by magnitude. Slices off LESS if 701 proportion results in a non-integer slice index (i.e., conservatively 702 slices off proportiontocut). 703 704 Usage: ltrimboth (l,proportiontocut) 705 Returns: trimmed version of list l 706 """ 707 lowercut = int(proportiontocut*len(l)) 708 uppercut = len(l) - lowercut 709 return l[lowercut:uppercut]
710 711
712 -def ltrim1 (l,proportiontocut,tail='right'):
713 """ 714 Slices off the passed proportion of items from ONE end of the passed 715 list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' 716 10% of scores). Slices off LESS if proportion results in a non-integer 717 slice index (i.e., conservatively slices off proportiontocut). 718 719 Usage: ltrim1 (l,proportiontocut,tail='right') or set tail='left' 720 Returns: trimmed version of list l 721 """ 722 if tail == 'right': 723 lowercut = 0 724 uppercut = len(l) - int(proportiontocut*len(l)) 725 elif tail == 'left': 726 lowercut = int(proportiontocut*len(l)) 727 uppercut = len(l) 728 return l[lowercut:uppercut]
729 730 731 #################################### 732 ##### CORRELATION FUNCTIONS ###### 733 #################################### 734
735 -def lpaired(x,y):
736 """ 737 Interactively determines the type of data and then runs the 738 appropriated statistic for paired group data. 739 740 Usage: lpaired(x,y) 741 Returns: appropriate statistic name, value, and probability 742 """ 743 samples = '' 744 while samples not in ['i','r','I','R','c','C']: 745 print '\nIndependent or related samples, or correlation (i,r,c): ', 746 samples = raw_input() 747 748 if samples in ['i','I','r','R']: 749 print '\nComparing variances ...', 750 # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 751 r = obrientransform(x,y) 752 f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1)) 753 if p<0.05: 754 vartype='unequal, p='+str(round(p,4)) 755 else: 756 vartype='equal' 757 print vartype 758 if samples in ['i','I']: 759 if vartype[0]=='e': 760 t,p = ttest_ind(x,y,0) 761 print '\nIndependent samples t-test: ', round(t,4),round(p,4) 762 else: 763 if len(x)>20 or len(y)>20: 764 z,p = ranksums(x,y) 765 print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4) 766 else: 767 u,p = mannwhitneyu(x,y) 768 print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4) 769 770 else: # RELATED SAMPLES 771 if vartype[0]=='e': 772 t,p = ttest_rel(x,y,0) 773 print '\nRelated samples t-test: ', round(t,4),round(p,4) 774 else: 775 t,p = ranksums(x,y) 776 print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4) 777 else: # CORRELATION ANALYSIS 778 corrtype = '' 779 while corrtype not in ['c','C','r','R','d','D']: 780 print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', 781 corrtype = raw_input() 782 if corrtype in ['c','C']: 783 m,b,r,p,see = linregress(x,y) 784 print '\nLinear regression for continuous variables ...' 785 lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]] 786 pstat.printcc(lol) 787 elif corrtype in ['r','R']: 788 r,p = spearmanr(x,y) 789 print '\nCorrelation for ranked variables ...' 790 print "Spearman's r: ",round(r,4),round(p,4) 791 else: # DICHOTOMOUS 792 r,p = pointbiserialr(x,y) 793 print '\nAssuming x contains a dichotomous variable ...' 794 print 'Point Biserial r: ',round(r,4),round(p,4) 795 print '\n\n' 796 return None
797 798
799 -def lpearsonr(x,y):
800 """ 801 Calculates a Pearson correlation coefficient and the associated 802 probability value. Taken from Heiman's Basic Statistics for the Behav. 803 Sci (2nd), p.195. 804 805 Usage: lpearsonr(x,y) where x and y are equal-length lists 806 Returns: Pearson's r value, two-tailed p-value 807 """ 808 TINY = 1.0e-30 809 if len(x) <> len(y): 810 raise ValueError, 'Input values not paired in pearsonr. Aborting.' 811 n = len(x) 812 x = map(float,x) 813 y = map(float,y) 814 xmean = mean(x) 815 ymean = mean(y) 816 r_num = n*(summult(x,y)) - sum(x)*sum(y) 817 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) 818 r = (r_num / r_den) # denominator already a float 819 df = n-2 820 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 821 prob = betai(0.5*df,0.5,df/float(df+t*t)) 822 return r, prob
823 824
825 -def lspearmanr(x,y):
826 """ 827 Calculates a Spearman rank-order correlation coefficient. Taken 828 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. 829 830 Usage: lspearmanr(x,y) where x and y are equal-length lists 831 Returns: Spearman's r, two-tailed p-value 832 """ 833 TINY = 1e-30 834 if len(x) <> len(y): 835 raise ValueError, 'Input values not paired in spearmanr. Aborting.' 836 n = len(x) 837 rankx = rankdata(x) 838 ranky = rankdata(y) 839 dsq = sumdiffsquared(rankx,ranky) 840 rs = 1 - 6*dsq / float(n*(n**2-1)) 841 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) 842 df = n-2 843 probrs = betai(0.5*df,0.5,df/(df+t*t)) # t already a float 844 # probability values for rs are from part 2 of the spearman function in 845 # Numerical Recipies, p.510. They are close to tables, but not exact. (?) 846 return rs, probrs
847 848
849 -def lpointbiserialr(x,y):
850 """ 851 Calculates a point-biserial correlation coefficient and the associated 852 probability value. Taken from Heiman's Basic Statistics for the Behav. 853 Sci (1st), p.194. 854 855 Usage: lpointbiserialr(x,y) where x,y are equal-length lists 856 Returns: Point-biserial r, two-tailed p-value 857 """ 858 TINY = 1e-30 859 if len(x) <> len(y): 860 raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr. ABORTING.' 861 data = pstat.abut(x,y) 862 categories = pstat.unique(x) 863 if len(categories) <> 2: 864 raise ValueError, "Exactly 2 categories required for pointbiserialr()." 865 else: # there are 2 categories, continue 866 codemap = pstat.abut(categories,range(2)) 867 recoded = pstat.recode(data,codemap,0) 868 x = pstat.linexand(data,0,categories[0]) 869 y = pstat.linexand(data,0,categories[1]) 870 xmean = mean(pstat.colex(x,1)) 871 ymean = mean(pstat.colex(y,1)) 872 n = len(data) 873 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) 874 rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust 875 df = n-2 876 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) 877 prob = betai(0.5*df,0.5,df/(df+t*t)) # t already a float 878 return rpb, prob
879 880
881 -def lkendalltau(x,y):
882 """ 883 Calculates Kendall's tau ... correlation of ordinal data. Adapted 884 from function kendl1 in Numerical Recipies. Needs good test-routine.@@@ 885 886 Usage: lkendalltau(x,y) 887 Returns: Kendall's tau, two-tailed p-value 888 """ 889 n1 = 0 890 n2 = 0 891 iss = 0 892 for j in range(len(x)-1): 893 for k in range(j,len(y)): 894 a1 = x[j] - x[k] 895 a2 = y[j] - y[k] 896 aa = a1 * a2 897 if (aa): # neither list has a tie 898 n1 = n1 + 1 899 n2 = n2 + 1 900 if aa > 0: 901 iss = iss + 1 902 else: 903 iss = iss -1 904 else: 905 if (a1): 906 n1 = n1 + 1 907 else: 908 n2 = n2 + 1 909 tau = iss / math.sqrt(n1*n2) 910 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) 911 z = tau / math.sqrt(svar) 912 prob = erfcc(abs(z)/1.4142136) 913 return tau, prob
914 915
916 -def llinregress(x,y):
917 """ 918 Calculates a regression line on x,y pairs. 919 920 Usage: llinregress(x,y) x,y are equal-length lists of x-y coordinates 921 Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate 922 """ 923 TINY = 1.0e-20 924 if len(x) <> len(y): 925 raise ValueError, 'Input values not paired in linregress. Aborting.' 926 n = len(x) 927 x = map(float,x) 928 y = map(float,y) 929 xmean = mean(x) 930 ymean = mean(y) 931 r_num = float(n*(summult(x,y)) - sum(x)*sum(y)) 932 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) 933 r = r_num / r_den 934 z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY)) 935 df = n-2 936 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 937 prob = betai(0.5*df,0.5,df/(df+t*t)) 938 slope = r_num / float(n*ss(x) - square_of_sums(x)) 939 intercept = ymean - slope*xmean 940 sterrest = math.sqrt(1-r*r)*samplestdev(y) 941 return slope, intercept, r, prob, sterrest
942 943 944 #################################### 945 ##### INFERENTIAL STATISTICS ##### 946 #################################### 947
948 -def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
949 """ 950 Calculates the t-obtained for the independent samples T-test on ONE group 951 of scores a, given a population mean. If printit=1, results are printed 952 to the screen. If printit='filename', the results are output to 'filename' 953 using the given writemode (default=append). Returns t-value, and prob. 954 955 Usage: lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') 956 Returns: t-value, two-tailed prob 957 """ 958 x = mean(a) 959 v = var(a) 960 n = len(a) 961 df = n-1 962 svar = ((n-1)*v)/float(df) 963 t = (x-popmean)/math.sqrt(svar*(1.0/n)) 964 prob = betai(0.5*df,0.5,float(df)/(df+t*t)) 965 966 if printit <> 0: 967 statname = 'Single-sample T-test.' 968 outputpairedstats(printit,writemode, 969 'Population','--',popmean,0,0,0, 970 name,n,x,v,min(a),max(a), 971 statname,t,prob) 972 return t,prob
973 974
975 -def lttest_ind (a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
976 """ 977 Calculates the t-obtained T-test on TWO INDEPENDENT samples of 978 scores a, and b. From Numerical Recipies, p.483. If printit=1, results 979 are printed to the screen. If printit='filename', the results are output 980 to 'filename' using the given writemode (default=append). Returns t-value, 981 and prob. 982 983 Usage: lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a') 984 Returns: t-value, two-tailed prob 985 """ 986 x1 = mean(a) 987 x2 = mean(b) 988 v1 = stdev(a)**2 989 v2 = stdev(b)**2 990 n1 = len(a) 991 n2 = len(b) 992 df = n1+n2-2 993 svar = ((n1-1)*v1+(n2-1)*v2)/float(df) 994 t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2)) 995 prob = betai(0.5*df,0.5,df/(df+t*t)) 996 997 if printit <> 0: 998 statname = 'Independent samples T-test.' 999 outputpairedstats(printit,writemode, 1000 name1,n1,x1,v1,min(a),max(a), 1001 name2,n2,x2,v2,min(b),max(b), 1002 statname,t,prob) 1003 return t,prob
1004 1005
1006 -def lttest_rel (a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'):
1007 """ 1008 Calculates the t-obtained T-test on TWO RELATED samples of scores, 1009 a and b. From Numerical Recipies, p.483. If printit=1, results are 1010 printed to the screen. If printit='filename', the results are output to 1011 'filename' using the given writemode (default=append). Returns t-value, 1012 and prob. 1013 1014 Usage: lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a') 1015 Returns: t-value, two-tailed prob 1016 """ 1017 if len(a)<>len(b): 1018 raise ValueError, 'Unequal length lists in ttest_rel.' 1019 x1 = mean(a) 1020 x2 = mean(b) 1021 v1 = var(a) 1022 v2 = var(b) 1023 n = len(a) 1024 cov = 0 1025 for i in range(len(a)): 1026 cov = cov + (a[i]-x1) * (b[i]-x2) 1027 df = n-1 1028 cov = cov / float(df) 1029 sd = math.sqrt((v1+v2 - 2.0*cov)/float(n)) 1030 t = (x1-x2)/sd 1031 prob = betai(0.5*df,0.5,df/(df+t*t)) 1032 1033 if printit <> 0: 1034 statname = 'Related samples T-test.' 1035 outputpairedstats(printit,writemode, 1036 name1,n,x1,v1,min(a),max(a), 1037 name2,n,x2,v2,min(b),max(b), 1038 statname,t,prob) 1039 return t, prob
1040 1041
1042 -def lchisquare(f_obs,f_exp=None):
1043 """ 1044 Calculates a one-way chi square for list of observed frequencies and returns 1045 the result. If no expected frequencies are given, the total N is assumed to 1046 be equally distributed across all groups. 1047 1048 Usage: lchisquare(f_obs, f_exp=None) f_obs = list of observed cell freq. 1049 Returns: chisquare-statistic, associated p-value 1050 """ 1051 k = len(f_obs) # number of groups 1052 if f_exp == None: 1053 f_exp = [sum(f_obs)/float(k)] * len(f_obs) # create k bins with = freq. 1054 chisq = 0 1055 for i in range(len(f_obs)): 1056 chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i]) 1057 return chisq, chisqprob(chisq, k-1)
1058 1059
1060 -def lks_2samp (data1,data2):
1061 """ 1062 Computes the Kolmogorov-Smirnof statistic on 2 samples. From 1063 Numerical Recipies in C, page 493. 1064 1065 Usage: lks_2samp(data1,data2) data1&2 are lists of values for 2 conditions 1066 Returns: KS D-value, associated p-value 1067 """ 1068 j1 = 0 1069 j2 = 0 1070 fn1 = 0.0 1071 fn2 = 0.0 1072 n1 = len(data1) 1073 n2 = len(data2) 1074 en1 = n1 1075 en2 = n2 1076 d = 0.0 1077 data1.sort() 1078 data2.sort() 1079 while j1 < n1 and j2 < n2: 1080 d1=data1[j1] 1081 d2=data2[j2] 1082 if d1 <= d2: 1083 fn1 = (j1)/float(en1) 1084 j1 = j1 + 1 1085 if d2 <= d1: 1086 fn2 = (j2)/float(en2) 1087 j2 = j2 + 1 1088 dt = (fn2-fn1) 1089 if math.fabs(dt) > math.fabs(d): 1090 d = dt 1091 try: 1092 en = math.sqrt(en1*en2/float(en1+en2)) 1093 prob = ksprob((en+0.12+0.11/en)*abs(d)) 1094 except: 1095 prob = 1.0 1096 return d, prob
1097 1098
1099 -def lmannwhitneyu(x,y):
1100 """ 1101 Calculates a Mann-Whitney U statistic on the provided scores and 1102 returns the result. Use only when the n in each condition is < 20 and 1103 you have 2 independent samples of ranks. NOTE: Mann-Whitney U is 1104 significant if the u-obtained is LESS THAN or equal to the critical 1105 value of U found in the tables. Equivalent to Kruskal-Wallis H with 1106 just 2 groups. 1107 1108 Usage: lmannwhitneyu(data) 1109 Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) 1110 """ 1111 n1 = len(x) 1112 n2 = len(y) 1113 ranked = rankdata(x+y) 1114 rankx = ranked[0:n1] # get the x-ranks 1115 ranky = ranked[n1:] # the rest are y-ranks 1116 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x 1117 u2 = n1*n2 - u1 # remainder is U for y 1118 bigu = max(u1,u2) 1119 smallu = min(u1,u2) 1120 T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores 1121 if T == 0: 1122 raise ValueError, 'All numbers are identical in lmannwhitneyu' 1123 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) 1124 z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc 1125 return smallu, 1.0 - zprob(z)
1126 1127
1128 -def ltiecorrect(rankvals):
1129 """ 1130 Corrects for ties in Mann Whitney U and Kruskal Wallis H tests. See 1131 Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences. 1132 New York: McGraw-Hill. Code adapted from |Stat rankind.c code. 1133 1134 Usage: ltiecorrect(rankvals) 1135 Returns: T correction factor for U or H 1136 """ 1137 sorted,posn = shellsort(rankvals) 1138 n = len(sorted) 1139 T = 0.0 1140 i = 0 1141 while (i<n-1): 1142 if sorted[i] == sorted[i+1]: 1143 nties = 1 1144 while (i<n-1) and (sorted[i] == sorted[i+1]): 1145 nties = nties +1 1146 i = i +1 1147 T = T + nties**3 - nties 1148 i = i+1 1149 T = T / float(n**3-n) 1150 return 1.0 - T
1151 1152
1153 -def lranksums(x,y):
1154 """ 1155 Calculates the rank sums statistic on the provided scores and 1156 returns the result. Use only when the n in each condition is > 20 and you 1157 have 2 independent samples of ranks. 1158 1159 Usage: lranksums(x,y) 1160 Returns: a z-statistic, two-tailed p-value 1161 """ 1162 n1 = len(x) 1163 n2 = len(y) 1164 alldata = x+y 1165 ranked = rankdata(alldata) 1166 x = ranked[:n1] 1167 y = ranked[n1:] 1168 s = sum(x) 1169 expected = n1*(n1+n2+1) / 2.0 1170 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) 1171 prob = 2*(1.0 -zprob(abs(z))) 1172 return z, prob
1173 1174
1175 -def lwilcoxont(x,y):
1176 """ 1177 Calculates the Wilcoxon T-test for related samples and returns the 1178 result. A non-parametric T-test. 1179 1180 Usage: lwilcoxont(x,y) 1181 Returns: a t-statistic, two-tail probability estimate 1182 """ 1183 if len(x) <> len(y): 1184 raise ValueError, 'Unequal N in wilcoxont. Aborting.' 1185 d=[] 1186 for i in range(len(x)): 1187 diff = x[i] - y[i] 1188 if diff <> 0: 1189 d.append(diff) 1190 count = len(d) 1191 absd = map(abs,d) 1192 absranked = rankdata(absd) 1193 r_plus = 0.0 1194 r_minus = 0.0 1195 for i in range(len(absd)): 1196 if d[i] < 0: 1197 r_minus = r_minus + absranked[i] 1198 else: 1199 r_plus = r_plus + absranked[i] 1200 wt = min(r_plus, r_minus) 1201 mn = count * (count+1) * 0.25 1202 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) 1203 z = math.fabs(wt-mn) / se 1204 prob = 2*(1.0 -zprob(abs(z))) 1205 return wt, prob
1206 1207
1208 -def lkruskalwallish(*args):
1209 """ 1210 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more 1211 groups, requiring at least 5 subjects in each group. This function 1212 calculates the Kruskal-Wallis H-test for 3 or more independent samples 1213 and returns the result. 1214 1215 Usage: lkruskalwallish(*args) 1216 Returns: H-statistic (corrected for ties), associated p-value 1217 """ 1218 args = list(args) 1219 n = [0]*len(args) 1220 all = [] 1221 n = map(len,args) 1222 for i in range(len(args)): 1223 all = all + args[i] 1224 ranked = rankdata(all) 1225 T = tiecorrect(ranked) 1226 for i in range(len(args)): 1227 args[i] = ranked[0:n[i]] 1228 del ranked[0:n[i]] 1229 rsums = [] 1230 for i in range(len(args)): 1231 rsums.append(sum(args[i])**2) 1232 rsums[i] = rsums[i] / float(n[i]) 1233 ssbn = sum(rsums) 1234 totaln = sum(n) 1235 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) 1236 df = len(args) - 1 1237 if T == 0: 1238 raise ValueError, 'All numbers are identical in lkruskalwallish' 1239 h = h / float(T) 1240 return h, chisqprob(h,df)
1241 1242
1243 -def lfriedmanchisquare(*args):
1244 """ 1245 Friedman Chi-Square is a non-parametric, one-way within-subjects 1246 ANOVA. This function calculates the Friedman Chi-square test for repeated 1247 measures and returns the result, along with the associated probability 1248 value. It assumes 3 or more repeated measures. Only 3 levels requires a 1249 minimum of 10 subjects in the study. Four levels requires 5 subjects per 1250 level(??). 1251 1252 Usage: lfriedmanchisquare(*args) 1253 Returns: chi-square statistic, associated p-value 1254 """ 1255 k = len(args) 1256 if k < 3: 1257 raise ValueError, 'Less than 3 levels. Friedman test not appropriate.' 1258 n = len(args[0]) 1259 data = apply(pstat.abut,tuple(args)) 1260 for i in range(len(data)): 1261 data[i] = rankdata(data[i]) 1262 ssbn = 0 1263 for i in range(k): 1264 ssbn = ssbn + sum(args[i])**2 1265 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) 1266 return chisq, chisqprob(chisq,k-1)
1267 1268 1269 #################################### 1270 #### PROBABILITY CALCULATIONS #### 1271 #################################### 1272
1273 -def lchisqprob(chisq,df):
1274 """ 1275 Returns the (1-tailed) probability value associated with the provided 1276 chi-square value and df. Adapted from chisq.c in Gary Perlman's |Stat. 1277 1278 Usage: lchisqprob(chisq,df) 1279 """ 1280 BIG = 20.0 1281 def ex(x): 1282 BIG = 20.0 1283 if x < -BIG: 1284 return 0.0 1285 else: 1286 return math.exp(x)
1287 1288 if chisq <=0 or df < 1: 1289 return 1.0 1290 a = 0.5 * chisq 1291 if df%2 == 0: 1292 even = 1 1293 else: 1294 even = 0 1295 if df > 1: 1296 y = ex(-a) 1297 if even: 1298 s = y 1299 else: 1300 s = 2.0 * zprob(-math.sqrt(chisq)) 1301 if (df > 2): 1302 chisq = 0.5 * (df - 1.0) 1303 if even: 1304 z = 1.0 1305 else: 1306 z = 0.5 1307 if a > BIG: 1308 if even: 1309 e = 0.0 1310 else: 1311 e = math.log(math.sqrt(math.pi)) 1312 c = math.log(a) 1313 while (z <= chisq): 1314 e = math.log(z) + e 1315 s = s + ex(c*z-a-e) 1316 z = z + 1.0 1317 return s 1318 else: 1319 if even: 1320 e = 1.0 1321 else: 1322 e = 1.0 / math.sqrt(math.pi) / math.sqrt(a) 1323 c = 0.0 1324 while (z <= chisq): 1325 e = e * (a/float(z)) 1326 c = c + e 1327 z = z + 1.0 1328 return (c*y+s) 1329 else: 1330 return s 1331 1332
1333 -def lerfcc(x):
1334 """ 1335 Returns the complementary error function erfc(x) with fractional 1336 error everywhere less than 1.2e-7. Adapted from Numerical Recipies. 1337 1338 Usage: lerfcc(x) 1339 """ 1340 z = abs(x) 1341 t = 1.0 / (1.0+0.5*z) 1342 ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) 1343 if x >= 0: 1344 return ans 1345 else: 1346 return 2.0 - ans
1347 1348
1349 -def lzprob(z):
1350 """ 1351 Returns the area under the normal curve 'to the left of' the given z value. 1352 Thus, :: 1353 for z<0, zprob(z) = 1-tail probability 1354 for z>0, 1.0-zprob(z) = 1-tail probability 1355 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 1356 Adapted from z.c in Gary Perlman's |Stat. 1357 1358 Usage: lzprob(z) 1359 """ 1360 Z_MAX = 6.0 # maximum meaningful z-value 1361 if z == 0.0: 1362 x = 0.0 1363 else: 1364 y = 0.5 * math.fabs(z) 1365 if y >= (Z_MAX*0.5): 1366 x = 1.0 1367 elif (y < 1.0): 1368 w = y*y 1369 x = ((((((((0.000124818987 * w 1370 -0.001075204047) * w +0.005198775019) * w 1371 -0.019198292004) * w +0.059054035642) * w 1372 -0.151968751364) * w +0.319152932694) * w 1373 -0.531923007300) * w +0.797884560593) * y * 2.0 1374 else: 1375 y = y - 2.0 1376 x = (((((((((((((-0.000045255659 * y 1377 +0.000152529290) * y -0.000019538132) * y 1378 -0.000676904986) * y +0.001390604284) * y 1379 -0.000794620820) * y -0.002034254874) * y 1380 +0.006549791214) * y -0.010557625006) * y 1381 +0.011630447319) * y -0.009279453341) * y 1382 +0.005353579108) * y -0.002141268741) * y 1383 +0.000535310849) * y +0.999936657524 1384 if z > 0.0: 1385 prob = ((x+1.0)*0.5) 1386 else: 1387 prob = ((1.0-x)*0.5) 1388 return prob
1389 1390
1391 -def lksprob(alam):
1392 """ 1393 Computes a Kolmolgorov-Smirnov t-test significance level. Adapted from 1394 Numerical Recipies. 1395 1396 Usage: lksprob(alam) 1397 """ 1398 fac = 2.0 1399 sum = 0.0 1400 termbf = 0.0 1401 a2 = -2.0*alam*alam 1402 for j in range(1,201): 1403 term = fac*math.exp(a2*j*j) 1404 sum = sum + term 1405 if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum): 1406 return sum 1407 fac = -fac 1408 termbf = math.fabs(term) 1409 return 1.0 # Get here only if fails to converge; was 0.0!!
1410 1411
1412 -def lfprob (dfnum, dfden, F):
1413 """ 1414 Returns the (1-tailed) significance level (p-value) of an F 1415 statistic given the degrees of freedom for the numerator (dfR-dfF) and 1416 the degrees of freedom for the denominator (dfF). 1417 1418 Usage: lfprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn 1419 """ 1420 p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F)) 1421 return p
1422 1423
1424 -def lbetacf(a,b,x):
1425 """ 1426 This function evaluates the continued fraction form of the incomplete 1427 Beta function, betai. (Adapted from: Numerical Recipies in C.) 1428 1429 Usage: lbetacf(a,b,x) 1430 """ 1431 ITMAX = 200 1432 EPS = 3.0e-7 1433 1434 bm = az = am = 1.0 1435 qab = a+b 1436 qap = a+1.0 1437 qam = a-1.0 1438 bz = 1.0-qab*x/qap 1439 for i in range(ITMAX+1): 1440 em = float(i+1) 1441 tem = em + em 1442 d = em*(b-em)*x/((qam+tem)*(a+tem)) 1443 ap = az + d*am 1444 bp = bz+d*bm 1445 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) 1446 app = ap+d*az 1447 bpp = bp+d*bz 1448 aold = az 1449 am = ap/bpp 1450 bm = bp/bpp 1451 az = app/bpp 1452 bz = 1.0 1453 if (abs(az-aold)<(EPS*abs(az))): 1454 return az 1455 print 'a or b too big, or ITMAX too small in Betacf.'
1456 1457
1458 -def lgammln(xx):
1459 """ 1460 Returns the gamma function of xx.:: 1461 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. 1462 (Adapted from: Numerical Recipies in C.) 1463 1464 Usage: lgammln(xx) 1465 """ 1466 1467 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 1468 0.120858003e-2, -0.536382e-5] 1469 x = xx - 1.0 1470 tmp = x + 5.5 1471 tmp = tmp - (x+0.5)*math.log(tmp) 1472 ser = 1.0 1473 for j in range(len(coeff)): 1474 x = x + 1 1475 ser = ser + coeff[j]/x 1476 return -tmp + math.log(2.50662827465*ser)
1477 1478
1479 -def lbetai(a,b,x):
1480 """ 1481 Returns the incomplete beta function:: 1482 1483 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) 1484 1485 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma 1486 function of a. The continued fraction formulation is implemented here, 1487 using the betacf function. (Adapted from: Numerical Recipies in C.) 1488 1489 Usage: lbetai(a,b,x) 1490 """ 1491 if (x<0.0 or x>1.0): 1492 raise ValueError, 'Bad x in lbetai' 1493 if (x==0.0 or x==1.0): 1494 bt = 0.0 1495 else: 1496 bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b* 1497 math.log(1.0-x)) 1498 if (x<(a+1.0)/(a+b+2.0)): 1499 return bt*betacf(a,b,x)/float(a) 1500 else: 1501 return 1.0-bt*betacf(b,a,1.0-x)/float(b)
1502 1503 1504 #################################### 1505 ####### ANOVA CALCULATIONS ####### 1506 #################################### 1507
1508 -def lF_oneway(*lists):
1509 """ 1510 Performs a 1-way ANOVA, returning an F-value and probability given 1511 any number of groups. From Heiman, pp.394-7. 1512 1513 Usage: F_oneway(*lists) where *lists is any number of lists, one per treatment group 1514 Returns: F value, one-tailed p-value 1515 """ 1516 a = len(lists) # ANOVA on 'a' groups, each in it's own list 1517 means = [0]*a 1518 vars = [0]*a 1519 ns = [0]*a 1520 alldata = [] 1521 tmp = map(N.array,lists) 1522 means = map(amean,tmp) 1523 vars = map(avar,tmp) 1524 ns = map(len,lists) 1525 for i in range(len(lists)): 1526 alldata = alldata + lists[i] 1527 alldata = N.array(alldata) 1528 bign = len(alldata) 1529 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) 1530 ssbn = 0 1531 for list in lists: 1532 ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list)) 1533 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) 1534 sswn = sstot-ssbn 1535 dfbn = a-1 1536 dfwn = bign - a 1537 msb = ssbn/float(dfbn) 1538 msw = sswn/float(dfwn) 1539 f = msb/msw 1540 prob = fprob(dfbn,dfwn,f) 1541 return f, prob
1542 1543
1544 -def lF_value (ER,EF,dfnum,dfden):
1545 """ 1546 Returns an F-statistic given the following:: 1547 ER = error associated with the null hypothesis (the Restricted model) 1548 EF = error associated with the alternate hypothesis (the Full model) 1549 dfR-dfF = degrees of freedom of the numerator 1550 dfF = degrees of freedom associated with the denominator/Full model 1551 1552 Usage: lF_value(ER,EF,dfnum,dfden) 1553 """ 1554 return ((ER-EF)/float(dfnum) / (EF/float(dfden)))
1555 1556 1557 #################################### 1558 ######## SUPPORT FUNCTIONS ####### 1559 #################################### 1560
1561 -def writecc (listoflists,file,writetype='w',extra=2):
1562 """ 1563 Writes a list of lists to a file in columns, customized by the max 1564 size of items within the columns (max size of items in col, +2 characters) 1565 to specified file. File-overwrite is the default. 1566 1567 Usage: writecc (listoflists,file,writetype='w',extra=2) 1568 Returns: None 1569 """ 1570 if type(listoflists[0]) not in [ListType,TupleType]: 1571 listoflists = [listoflists] 1572 outfile = open(file,writetype) 1573 rowstokill = [] 1574 list2print = copy.deepcopy(listoflists) 1575 for i in range(len(listoflists)): 1576 if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes': 1577 rowstokill = rowstokill + [i] 1578 rowstokill.reverse() 1579 for row in rowstokill: 1580 del list2print[row] 1581 maxsize = [0]*len(list2print[0]) 1582 for col in range(len(list2print[0])): 1583 items = pstat.colex(list2print,col) 1584 items = map(pstat.makestr,items) 1585 maxsize[col] = max(map(len,items)) + extra 1586 for row in listoflists: 1587 if row == ['\n'] or row == '\n': 1588 outfile.write('\n') 1589 elif row == ['dashes'] or row == 'dashes': 1590 dashes = [0]*len(maxsize) 1591 for j in range(len(maxsize)): 1592 dashes[j] = '-'*(maxsize[j]-2) 1593 outfile.write(pstat.lineincustcols(dashes,maxsize)) 1594 else: 1595 outfile.write(pstat.lineincustcols(row,maxsize)) 1596 outfile.write('\n') 1597 outfile.close() 1598 return None
1599 1600
1601 -def lincr(l,cap): # to increment a list up to a max-list of 'cap'
1602 """ 1603 Simulate a counting system from an n-dimensional list. 1604 1605 Usage: lincr(l,cap) l=list to increment, cap=max values for each list pos'n 1606 Returns: next set of values for list l, OR -1 (if overflow) 1607 """ 1608 l[0] = l[0] + 1 # e.g., [0,0,0] --> [2,4,3] (=cap) 1609 for i in range(len(l)): 1610 if l[i] > cap[i] and i < len(l)-1: # if carryover AND not done 1611 l[i] = 0 1612 l[i+1] = l[i+1] + 1 1613 elif l[i] > cap[i] and i == len(l)-1: # overflow past last column, must be finished 1614 l = -1 1615 return l 1616 1617
1618 -def lsum (inlist):
1619 """ 1620 Returns the sum of the items in the passed list. 1621 1622 Usage: lsum(inlist) 1623 """ 1624 s = 0 1625 for item in inlist: 1626 s = s + item 1627 return s
1628 1629
1630 -def lcumsum (inlist):
1631 """ 1632 Returns a list consisting of the cumulative sum of the items in the 1633 passed list. 1634 1635 Usage: lcumsum(inlist) 1636 """ 1637 newlist = copy.deepcopy(inlist) 1638 for i in range(1,len(newlist)): 1639 newlist[i] = newlist[i] + newlist[i-1] 1640 return newlist
1641 1642
1643 -def lss(inlist):
1644 """ 1645 Squares each value in the passed list, adds up these squares and 1646 returns the result. 1647 1648 Usage: lss(inlist) 1649 """ 1650 ss = 0 1651 for item in inlist: 1652 ss = ss + item*item 1653 return ss
1654 1655
1656 -def lsummult (list1,list2):
1657 """ 1658 Multiplies elements in list1 and list2, element by element, and 1659 returns the sum of all resulting multiplications. Must provide equal 1660 length lists. 1661 1662 Usage: lsummult(list1,list2) 1663 """ 1664 if len(list1) <> len(list2): 1665 raise ValueError, "Lists not equal length in summult." 1666 s = 0 1667 for item1,item2 in pstat.abut(list1,list2): 1668 s = s + item1*item2 1669 return s
1670 1671
1672 -def lsumdiffsquared(x,y):
1673 """ 1674 Takes pairwise differences of the values in lists x and y, squares 1675 these differences, and returns the sum of these squares. 1676 1677 Usage: lsumdiffsquared(x,y) 1678 Returns: sum[(x[i]-y[i])**2] 1679 """ 1680 sds = 0 1681 for i in range(len(x)): 1682 sds = sds + (x[i]-y[i])**2 1683 return sds
1684 1685
1686 -def lsquare_of_sums(inlist):
1687 """ 1688 Adds the values in the passed list, squares the sum, and returns 1689 the result. 1690 1691 Usage: lsquare_of_sums(inlist) 1692 Returns: sum(inlist[i])**2 1693 """ 1694 s = sum(inlist) 1695 return float(s)*s
1696 1697
1698 -def lshellsort(inlist):
1699 """ 1700 Shellsort algorithm. Sorts a 1D-list. 1701 1702 Usage: lshellsort(inlist) 1703 Returns: sorted-inlist, sorting-index-vector (for original list) 1704 """ 1705 n = len(inlist) 1706 svec = copy.deepcopy(inlist) 1707 ivec = range(n) 1708 gap = n/2 # integer division needed 1709 while gap >0: 1710 for i in range(gap,n): 1711 for j in range(i-gap,-1,-gap): 1712 while j>=0 and svec[j]>svec[j+gap]: 1713 temp = svec[j] 1714 svec[j] = svec[j+gap] 1715 svec[j+gap] = temp 1716 itemp = ivec[j] 1717 ivec[j] = ivec[j+gap] 1718 ivec[j+gap] = itemp 1719 gap = gap / 2 # integer division needed 1720 # svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]] 1721 return svec, ivec
1722 1723
1724 -def lrankdata(inlist):
1725 """ 1726 Ranks the data in inlist, dealing with ties appropritely. Assumes 1727 a 1D inlist. Adapted from Gary Perlman's |Stat ranksort. 1728 1729 Usage: lrankdata(inlist) 1730 Returns: a list of length equal to inlist, containing rank scores 1731 """ 1732 n = len(inlist) 1733 svec, ivec = shellsort(inlist) 1734 sumranks = 0 1735 dupcount = 0 1736 newlist = [0]*n 1737 for i in range(n): 1738 sumranks = sumranks + i 1739 dupcount = dupcount + 1 1740 if i==n-1 or svec[i] <> svec[i+1]: 1741 averank = sumranks / float(dupcount) + 1 1742 for j in range(i-dupcount+1,i+1): 1743 newlist[ivec[j]] = averank 1744 sumranks = 0 1745 dupcount = 0 1746 return newlist
1747 1748
1749 -def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob):
1750 """ 1751 Prints or write to a file stats for two groups, using the name, n, 1752 mean, sterr, min and max for each group, as well as the statistic name, 1753 its value, and the associated p-value. 1754 1755 Usage:: 1756 outputpairedstats(fname,writemode, 1757 name1,n1,mean1,stderr1,min1,max1, 1758 name2,n2,mean2,stderr2,min2,max2, 1759 statname,stat,prob) 1760 Returns: None 1761 """ 1762 suffix = '' # for *s after the p-value 1763 try: 1764 x = prob.shape 1765 prob = prob[0] 1766 except: 1767 pass 1768 if prob < 0.001: suffix = ' ***' 1769 elif prob < 0.01: suffix = ' **' 1770 elif prob < 0.05: suffix = ' *' 1771 title = [['Name','N','Mean','SD','Min','Max']] 1772 lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1], 1773 [name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]] 1774 if type(fname)<>StringType or len(fname)==0: 1775 print 1776 print statname 1777 print 1778 pstat.printcc(lofl) 1779 print 1780 try: 1781 if stat.shape == (): 1782 stat = stat[0] 1783 if prob.shape == (): 1784 prob = prob[0] 1785 except: 1786 pass 1787 print 'Test statistic = ',round(stat,3),' p = ',round(prob,3),suffix 1788 print 1789 else: 1790 file = open(fname,writemode) 1791 file.write('\n'+statname+'\n\n') 1792 file.close() 1793 writecc(lofl,fname,'a') 1794 file = open(fname,'a') 1795 try: 1796 if stat.shape == (): 1797 stat = stat[0] 1798 if prob.shape == (): 1799 prob = prob[0] 1800 except: 1801 pass 1802 file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),' p = ',round(prob,4),suffix,'\n\n'])) 1803 file.close() 1804 return None
1805 1806
1807 -def lfindwithin (data):
1808 """ 1809 Returns an integer representing a binary vector, where 1=within- 1810 subject factor, 0=between. Input equals the entire data 2D list (i.e., 1811 column 0=random factor, column -1=measured values (those two are skipped). 1812 Note: input data is in |Stat format ... a list of lists ("2D list") with 1813 one row per measured value, first column=subject identifier, last column= 1814 score, one in-between column per factor (these columns contain level 1815 designations on each factor). See also stats.anova.__doc__. 1816 1817 Usage: lfindwithin(data) data in |Stat format 1818 """ 1819 1820 numfact = len(data[0])-1 1821 withinvec = 0 1822 for col in range(1,numfact): 1823 examplelevel = pstat.unique(pstat.colex(data,col))[0] 1824 rows = pstat.linexand(data,col,examplelevel) # get 1 level of this factor 1825 factsubjs = pstat.unique(pstat.colex(rows,0)) 1826 allsubjs = pstat.unique(pstat.colex(data,0)) 1827 if len(factsubjs) == len(allsubjs): # fewer Ss than scores on this factor? 1828 withinvec = withinvec + (1 << col) 1829 return withinvec
1830 1831 1832 ######################################################### 1833 ######################################################### 1834 ####### DISPATCH LISTS AND TUPLES TO ABOVE FCNS ######### 1835 ######################################################### 1836 ######################################################### 1837 1838 ## CENTRAL TENDENCY: 1839 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), ) 1840 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), ) 1841 mean = Dispatch ( (lmean, (ListType, TupleType)), ) 1842 median = Dispatch ( (lmedian, (ListType, TupleType)), ) 1843 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), ) 1844 mode = Dispatch ( (lmode, (ListType, TupleType)), ) 1845 1846 ## MOMENTS: 1847 moment = Dispatch ( (lmoment, (ListType, TupleType)), ) 1848 variation = Dispatch ( (lvariation, (ListType, TupleType)), ) 1849 skew = Dispatch ( (lskew, (ListType, TupleType)), ) 1850 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), ) 1851 describe = Dispatch ( (ldescribe, (ListType, TupleType)), ) 1852 1853 ## FREQUENCY STATISTICS: 1854 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), ) 1855 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), ) 1856 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), ) 1857 histogram = Dispatch ( (lhistogram, (ListType, TupleType)), ) 1858 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), ) 1859 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), ) 1860 1861 ## VARIABILITY: 1862 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), ) 1863 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), ) 1864 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), ) 1865 var = Dispatch ( (lvar, (ListType, TupleType)), ) 1866 stdev = Dispatch ( (lstdev, (ListType, TupleType)), ) 1867 sterr = Dispatch ( (lsterr, (ListType, TupleType)), ) 1868 sem = Dispatch ( (lsem, (ListType, TupleType)), ) 1869 z = Dispatch ( (lz, (ListType, TupleType)), ) 1870 zs = Dispatch ( (lzs, (ListType, TupleType)), ) 1871 1872 ## TRIMMING FCNS: 1873 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), ) 1874 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), ) 1875 1876 ## CORRELATION FCNS: 1877 paired = Dispatch ( (lpaired, (ListType, TupleType)), ) 1878 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), ) 1879 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), ) 1880 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), ) 1881 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), ) 1882 linregress = Dispatch ( (llinregress, (ListType, TupleType)), ) 1883 1884 ## INFERENTIAL STATS: 1885 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), ) 1886 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), ) 1887 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), ) 1888 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), ) 1889 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), ) 1890 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), ) 1891 ranksums = Dispatch ( (lranksums, (ListType, TupleType)), ) 1892 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), ) 1893 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), ) 1894 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), ) 1895 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), ) 1896 1897 ## PROBABILITY CALCS: 1898 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), ) 1899 zprob = Dispatch ( (lzprob, (IntType, FloatType)), ) 1900 ksprob = Dispatch ( (lksprob, (IntType, FloatType)), ) 1901 fprob = Dispatch ( (lfprob, (IntType, FloatType)), ) 1902 betacf = Dispatch ( (lbetacf, (IntType, FloatType)), ) 1903 betai = Dispatch ( (lbetai, (IntType, FloatType)), ) 1904 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), ) 1905 gammln = Dispatch ( (lgammln, (IntType, FloatType)), ) 1906 1907 ## ANOVA FUNCTIONS: 1908 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), ) 1909 F_value = Dispatch ( (lF_value, (ListType, TupleType)), ) 1910 1911 ## SUPPORT FUNCTIONS: 1912 incr = Dispatch ( (lincr, (ListType, TupleType)), ) 1913 sum = Dispatch ( (lsum, (ListType, TupleType)), ) 1914 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), ) 1915 ss = Dispatch ( (lss, (ListType, TupleType)), ) 1916 summult = Dispatch ( (lsummult, (ListType, TupleType)), ) 1917 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), ) 1918 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), ) 1919 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), ) 1920 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), ) 1921 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), ) 1922 1923 1924 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1925 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1926 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1927 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1928 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1929 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1930 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1931 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1932 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1933 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1934 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1935 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1936 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1937 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1938 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1939 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1940 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1941 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1942 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1943 1944 try: # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE 1945 import Numeric 1946 N = Numeric 1947 import LinearAlgebra 1948 LA = LinearAlgebra 1949 1950 1951 ##################################### 1952 ######## ACENTRAL TENDENCY ######## 1953 ##################################### 1954
1955 - def ageometricmean (inarray,dimension=None,keepdims=0):
1956 """ 1957 Calculates the geometric mean of the values in the passed array. 1958 That is: n-th root of (x1 * x2 * ... * xn). Defaults to ALL values in 1959 the passed array. Use dimension=None to flatten array first. REMEMBER: if 1960 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 1961 if dimension is a sequence, it collapses over all specified dimensions. If 1962 keepdims is set to 1, the resulting array will have as many dimensions as 1963 inarray, with only 1 'level' per dim that was collapsed over. 1964 1965 Usage: ageometricmean(inarray,dimension=None,keepdims=0) 1966 Returns: geometric mean computed over dim(s) listed in dimension 1967 """ 1968 inarray = N.array(inarray,N.Float) 1969 if dimension == None: 1970 inarray = N.ravel(inarray) 1971 size = len(inarray) 1972 mult = N.power(inarray,1.0/size) 1973 mult = N.multiply.reduce(mult) 1974 elif type(dimension) in [IntType,FloatType]: 1975 size = inarray.shape[dimension] 1976 mult = N.power(inarray,1.0/size) 1977 mult = N.multiply.reduce(mult,dimension) 1978 if keepdims == 1: 1979 shp = list(inarray.shape) 1980 shp[dimension] = 1 1981 sum = N.reshape(sum,shp) 1982 else: # must be a SEQUENCE of dims to average over 1983 dims = list(dimension) 1984 dims.sort() 1985 dims.reverse() 1986 size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float) 1987 mult = N.power(inarray,1.0/size) 1988 for dim in dims: 1989 mult = N.multiply.reduce(mult,dim) 1990 if keepdims == 1: 1991 shp = list(inarray.shape) 1992 for dim in dims: 1993 shp[dim] = 1 1994 mult = N.reshape(mult,shp) 1995 return mult
1996 1997
1998 - def aharmonicmean (inarray,dimension=None,keepdims=0):
1999 """ 2000 Calculates the harmonic mean of the values in the passed array. 2001 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Defaults to ALL values in 2002 the passed array. Use dimension=None to flatten array first. REMEMBER: if 2003 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 2004 if dimension is a sequence, it collapses over all specified dimensions. If 2005 keepdims is set to 1, the resulting array will have as many dimensions as 2006 inarray, with only 1 'level' per dim that was collapsed over. 2007 2008 Usage: aharmonicmean(inarray,dimension=None,keepdims=0) 2009 Returns: harmonic mean computed over dim(s) in dimension 2010 """ 2011 inarray = inarray.astype(N.Float) 2012 if dimension == None: 2013 inarray = N.ravel(inarray) 2014 size = len(inarray) 2015 s = N.add.reduce(1.0 / inarray) 2016 elif type(dimension) in [IntType,FloatType]: 2017 size = float(inarray.shape[dimension]) 2018 s = N.add.reduce(1.0/inarray, dimension) 2019 if keepdims == 1: 2020 shp = list(inarray.shape) 2021 shp[dimension] = 1 2022 s = N.reshape(s,shp) 2023 else: # must be a SEQUENCE of dims to average over 2024 dims = list(dimension) 2025 dims.sort() 2026 nondims = [] 2027 for i in range(len(inarray.shape)): 2028 if i not in dims: 2029 nondims.append(i) 2030 tinarray = N.transpose(inarray,nondims+dims) # put keep-dims first 2031 idx = [0] *len(nondims) 2032 if idx == []: 2033 size = len(N.ravel(inarray)) 2034 s = asum(1.0 / inarray) 2035 if keepdims == 1: 2036 s = N.reshape([s],N.ones(len(inarray.shape))) 2037 else: 2038 idx[0] = -1 2039 loopcap = N.array(tinarray.shape[0:len(nondims)]) -1 2040 s = N.zeros(loopcap+1,N.Float) 2041 while incr(idx,loopcap) <> -1: 2042 s[idx] = asum(1.0/tinarray[idx]) 2043 size = N.multiply.reduce(N.take(inarray.shape,dims)) 2044 if keepdims == 1: 2045 shp = list(inarray.shape) 2046 for dim in dims: 2047 shp[dim] = 1 2048 s = N.reshape(s,shp) 2049 return size / s
2050 2051
2052 - def amean (inarray,dimension=None,keepdims=0):
2053 """ 2054 Calculates the arithmatic mean of the values in the passed array. 2055 That is: 1/n * (x1 + x2 + ... + xn). Defaults to ALL values in the 2056 passed array. Use dimension=None to flatten array first. REMEMBER: if 2057 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 2058 if dimension is a sequence, it collapses over all specified dimensions. If 2059 keepdims is set to 1, the resulting array will have as many dimensions as 2060 inarray, with only 1 'level' per dim that was collapsed over. 2061 2062 Usage: amean(inarray,dimension=None,keepdims=0) 2063 Returns: arithematic mean calculated over dim(s) in dimension 2064 """ 2065 if inarray.typecode() in ['l','s','b']: 2066 inarray = inarray.astype(N.Float) 2067 if dimension == None: 2068 inarray = N.ravel(inarray) 2069 sum = N.add.reduce(inarray) 2070 denom = float(len(inarray)) 2071 elif type(dimension) in [IntType,FloatType]: 2072 sum = asum(inarray,dimension) 2073 denom = float(inarray.shape[dimension]) 2074 if keepdims == 1: 2075 shp = list(inarray.shape) 2076 shp[dimension] = 1 2077 sum = N.reshape(sum,shp) 2078 else: # must be a TUPLE of dims to average over 2079 dims = list(dimension) 2080 dims.sort() 2081 dims.reverse() 2082 sum = inarray *1.0 2083 for dim in dims: 2084 sum = N.add.reduce(sum,dim) 2085 denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float) 2086 if keepdims == 1: 2087 shp = list(inarray.shape) 2088 for dim in dims: 2089 shp[dim] = 1 2090 sum = N.reshape(sum,shp) 2091 return sum/denom
2092 2093
2094 - def amedian (inarray,numbins=1000):
2095 """ 2096 Calculates the COMPUTED median value of an array of numbers, given the 2097 number of bins to use for the histogram (more bins approaches finding the 2098 precise median value of the array; default number of bins = 1000). From 2099 G.W. Heiman's Basic Stats, or CRC Probability & Statistics. 2100 NOTE: THIS ROUTINE ALWAYS uses the entire passed array (flattens it first). 2101 2102 Usage: amedian(inarray,numbins=1000) 2103 Returns: median calculated over ALL values in inarray 2104 """ 2105 inarray = N.ravel(inarray) 2106 (hist, smallest, binsize, extras) = ahistogram(inarray,numbins) 2107 cumhist = N.cumsum(hist) # make cumulative histogram 2108 otherbins = N.greater_equal(cumhist,len(inarray)/2.0) 2109 otherbins = list(otherbins) # list of 0/1s, 1s start at median bin 2110 cfbin = otherbins.index(1) # get 1st(!) index holding 50%ile score 2111 LRL = smallest + binsize*cfbin # get lower read limit of that bin 2112 cfbelow = N.add.reduce(hist[0:cfbin]) # cum. freq. below bin 2113 freq = hist[cfbin] # frequency IN the 50%ile bin 2114 median = LRL + ((len(inarray)/2.0-cfbelow)/float(freq))*binsize # MEDIAN 2115 return median
2116 2117
2118 - def amedianscore (inarray,dimension=None):
2119 """ 2120 Returns the 'middle' score of the passed array. If there is an even 2121 number of scores, the mean of the 2 middle scores is returned. Can function 2122 with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can 2123 be None, to pre-flatten the array, or else dimension must equal 0). 2124 2125 Usage: amedianscore(inarray,dimension=None) 2126 Returns: 'middle' score of the array, or the mean of the 2 middle scores 2127 """ 2128 if dimension == None: 2129 inarray = N.ravel(inarray) 2130 dimension = 0 2131 inarray = N.sort(inarray,dimension) 2132 if inarray.shape[dimension] % 2 == 0: # if even number of elements 2133 indx = inarray.shape[dimension]/2 # integer division correct 2134 median = N.asarray(inarray[indx]+inarray[indx-1]) / 2.0 2135 else: 2136 indx = inarray.shape[dimension] / 2 # integer division correct 2137 median = N.take(inarray,[indx],dimension) 2138 if median.shape == (1,): 2139 median = median[0] 2140 return median
2141 2142
2143 - def amode(a, dimension=None):
2144 """ 2145 Returns an array of the modal (most common) score in the passed array. 2146 If there is more than one such score, ONLY THE FIRST is returned. 2147 The bin-count for the modal values is also returned. Operates on whole 2148 array (dimension=None), or on a given dimension. 2149 2150 Usage: amode(a, dimension=None) 2151 Returns: array of bin-counts for mode(s), array of corresponding modal values 2152 """ 2153 2154 if dimension == None: 2155 a = N.ravel(a) 2156 dimension = 0 2157 scores = pstat.aunique(N.ravel(a)) # get ALL unique values 2158 testshape = list(a.shape) 2159 testshape[dimension] = 1 2160 oldmostfreq = N.zeros(testshape) 2161 oldcounts = N.zeros(testshape) 2162 for score in scores: 2163 template = N.equal(a,score) 2164 counts = asum(template,dimension,1) 2165 mostfrequent = N.where(N.greater(counts,oldcounts),score,oldmostfreq) 2166 oldcounts = N.where(N.greater(counts,oldcounts),counts,oldcounts) 2167 oldmostfreq = mostfrequent 2168 return oldcounts, mostfrequent
2169 2170
2171 - def atmean(a,limits=None,inclusive=(1,1)):
2172 """ 2173 Returns the arithmetic mean of all values in an array, ignoring values 2174 strictly outside the sequence passed to 'limits'. Note: either limit 2175 in the sequence, or the value of limits itself, can be set to None. The 2176 inclusive list/tuple determines whether the lower and upper limiting bounds 2177 (respectively) are open/exclusive (0) or closed/inclusive (1). 2178 2179 Usage: atmean(a,limits=None,inclusive=(1,1)) 2180 """ 2181 if a.typecode() in ['l','s','b']: 2182 a = a.astype(N.Float) 2183 if limits == None: 2184 return mean(a) 2185 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atmean" 2186 if inclusive[0]: lowerfcn = N.greater_equal 2187 else: lowerfcn = N.greater 2188 if inclusive[1]: upperfcn = N.less_equal 2189 else: upperfcn = N.less 2190 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 2191 raise ValueError, "No array values within given limits (atmean)." 2192 elif limits[0]==None and limits[1]<>None: 2193 mask = upperfcn(a,limits[1]) 2194 elif limits[0]<>None and limits[1]==None: 2195 mask = lowerfcn(a,limits[0]) 2196 elif limits[0]<>None and limits[1]<>None: 2197 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 2198 s = float(N.add.reduce(N.ravel(a*mask))) 2199 n = float(N.add.reduce(N.ravel(mask))) 2200 return s/n
2201 2202
2203 - def atvar(a,limits=None,inclusive=(1,1)):
2204 """ 2205 Returns the sample variance of values in an array, (i.e., using N-1), 2206 ignoring values strictly outside the sequence passed to 'limits'. 2207 Note: either limit in the sequence, or the value of limits itself, 2208 can be set to None. The inclusive list/tuple determines whether the lower 2209 and upper limiting bounds (respectively) are open/exclusive (0) or 2210 closed/inclusive (1). 2211 2212 Usage: atvar(a,limits=None,inclusive=(1,1)) 2213 """ 2214 a = a.astype(N.Float) 2215 if limits == None or limits == [None,None]: 2216 term1 = N.add.reduce(N.ravel(a*a)) 2217 n = float(len(N.ravel(a))) - 1 2218 term2 = N.add.reduce(N.ravel(a))**2 / n 2219 return (term1 - term2) / n 2220 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atvar" 2221 if inclusive[0]: lowerfcn = N.greater_equal 2222 else: lowerfcn = N.greater 2223 if inclusive[1]: upperfcn = N.less_equal 2224 else: upperfcn = N.less 2225 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 2226 raise ValueError, "No array values within given limits (atvar)." 2227 elif limits[0]==None and limits[1]<>None: 2228 mask = upperfcn(a,limits[1]) 2229 elif limits[0]<>None and limits[1]==None: 2230 mask = lowerfcn(a,limits[0]) 2231 elif limits[0]<>None and limits[1]<>None: 2232 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 2233 term1 = N.add.reduce(N.ravel(a*a*mask)) 2234 n = float(N.add.reduce(N.ravel(mask))) - 1 2235 term2 = N.add.reduce(N.ravel(a*mask))**2 / n 2236 return (term1 - term2) / n
2237 2238
2239 - def atmin(a,lowerlimit=None,dimension=None,inclusive=1):
2240 """ 2241 Returns the minimum value of a, along dimension, including only values less 2242 than (or equal to, if inclusive=1) lowerlimit. If the limit is set to None, 2243 all values in the array are used. 2244 2245 Usage: atmin(a,lowerlimit=None,dimension=None,inclusive=1) 2246 """ 2247 if inclusive: lowerfcn = N.greater 2248 else: lowerfcn = N.greater_equal 2249 if dimension == None: 2250 a = N.ravel(a) 2251 dimension = 0 2252 if lowerlimit == None: 2253 lowerlimit = N.minimum.reduce(N.ravel(a))-11 2254 biggest = N.maximum.reduce(N.ravel(a)) 2255 ta = N.where(lowerfcn(a,lowerlimit),a,biggest) 2256 return N.minimum.reduce(ta,dimension)
2257 2258
2259 - def atmax(a,upperlimit,dimension=None,inclusive=1):
2260 """ 2261 Returns the maximum value of a, along dimension, including only values greater 2262 than (or equal to, if inclusive=1) upperlimit. If the limit is set to None, 2263 a limit larger than the max value in the array is used. 2264 2265 Usage: atmax(a,upperlimit,dimension=None,inclusive=1) 2266 """ 2267 if inclusive: upperfcn = N.less 2268 else: upperfcn = N.less_equal 2269 if dimension == None: 2270 a = N.ravel(a) 2271 dimension = 0 2272 if upperlimit == None: 2273 upperlimit = N.maximum.reduce(N.ravel(a))+1 2274 smallest = N.minimum.reduce(N.ravel(a)) 2275 ta = N.where(upperfcn(a,upperlimit),a,smallest) 2276 return N.maximum.reduce(ta,dimension)
2277 2278
2279 - def atstdev(a,limits=None,inclusive=(1,1)):
2280 """ 2281 Returns the standard deviation of all values in an array, ignoring values 2282 strictly outside the sequence passed to 'limits'. Note: either limit 2283 in the sequence, or the value of limits itself, can be set to None. The 2284 inclusive list/tuple determines whether the lower and upper limiting bounds 2285 (respectively) are open/exclusive (0) or closed/inclusive (1). 2286 2287 Usage: atstdev(a,limits=None,inclusive=(1,1)) 2288 """ 2289 return N.sqrt(tvar(a,limits,inclusive))
2290 2291
2292 - def atsem(a,limits=None,inclusive=(1,1)):
2293 """ 2294 Returns the standard error of the mean for the values in an array, 2295 (i.e., using N for the denominator), ignoring values strictly outside 2296 the sequence passed to 'limits'. Note: either limit in the sequence, 2297 or the value of limits itself, can be set to None. The inclusive list/tuple 2298 determines whether the lower and upper limiting bounds (respectively) are 2299 open/exclusive (0) or closed/inclusive (1). 2300 2301 Usage: atsem(a,limits=None,inclusive=(1,1)) 2302 """ 2303 sd = tstdev(a,limits,inclusive) 2304 if limits == None or limits == [None,None]: 2305 n = float(len(N.ravel(a))) 2306 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atsem" 2307 if inclusive[0]: lowerfcn = N.greater_equal 2308 else: lowerfcn = N.greater 2309 if inclusive[1]: upperfcn = N.less_equal 2310 else: upperfcn = N.less 2311 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 2312 raise ValueError, "No array values within given limits (atsem)." 2313 elif limits[0]==None and limits[1]<>None: 2314 mask = upperfcn(a,limits[1]) 2315 elif limits[0]<>None and limits[1]==None: 2316 mask = lowerfcn(a,limits[0]) 2317 elif limits[0]<>None and limits[1]<>None: 2318 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 2319 term1 = N.add.reduce(N.ravel(a*a*mask)) 2320 n = float(N.add.reduce(N.ravel(mask))) 2321 return sd/math.sqrt(n)
2322 2323 2324 ##################################### 2325 ############ AMOMENTS ############# 2326 ##################################### 2327
2328 - def amoment(a,moment=1,dimension=None):
2329 """ 2330 Calculates the nth moment about the mean for a sample (defaults to the 2331 1st moment). Generally used to calculate coefficients of skewness and 2332 kurtosis. Dimension can equal None (ravel array first), an integer 2333 (the dimension over which to operate), or a sequence (operate over 2334 multiple dimensions). 2335 2336 Usage: amoment(a,moment=1,dimension=None) 2337 Returns: appropriate moment along given dimension 2338 """ 2339 if dimension == None: 2340 a = N.ravel(a) 2341 dimension = 0 2342 if moment == 1: 2343 return 0.0 2344 else: 2345 mn = amean(a,dimension,1) # 1=keepdims 2346 s = N.power((a-mn),moment) 2347 return amean(s,dimension)
2348 2349
2350 - def avariation(a,dimension=None):
2351 """ 2352 Returns the coefficient of variation, as defined in CRC Standard 2353 Probability and Statistics, p.6. Dimension can equal None (ravel array 2354 first), an integer (the dimension over which to operate), or a 2355 sequence (operate over multiple dimensions). 2356 2357 Usage: avariation(a,dimension=None) 2358 """ 2359 return 100.0*asamplestdev(a,dimension)/amean(a,dimension)
2360 2361
2362 - def askew(a,dimension=None):
2363 """ 2364 Returns the skewness of a distribution (normal ==> 0.0; >0 means extra 2365 weight in left tail). Use askewtest() to see if it's close enough. 2366 Dimension can equal None (ravel array first), an integer (the 2367 dimension over which to operate), or a sequence (operate over multiple 2368 dimensions). 2369 2370 Usage: askew(a, dimension=None) 2371 Returns: skew of vals in a along dimension, returning ZERO where all vals equal 2372 """ 2373 denom = N.power(amoment(a,2,dimension),1.5) 2374 zero = N.equal(denom,0) 2375 if type(denom) == N.ArrayType and asum(zero) <> 0: 2376 print "Number of zeros in askew: ",asum(zero) 2377 denom = denom + zero # prevent divide-by-zero 2378 return N.where(zero, 0, amoment(a,3,dimension)/denom) 2379 2380
2381 - def akurtosis(a,dimension=None):
2382 """ 2383 Returns the kurtosis of a distribution (normal ==> 3.0; >3 means 2384 heavier in the tails, and usually more peaked). Use akurtosistest() 2385 to see if it's close enough. Dimension can equal None (ravel array 2386 first), an integer (the dimension over which to operate), or a 2387 sequence (operate over multiple dimensions). 2388 2389 Usage: akurtosis(a,dimension=None) 2390 Returns: kurtosis of values in a along dimension, and ZERO where all vals equal 2391 """ 2392 denom = N.power(amoment(a,2,dimension),2) 2393 zero = N.equal(denom,0) 2394 if type(denom) == N.ArrayType and asum(zero) <> 0: 2395 print "Number of zeros in akurtosis: ",asum(zero) 2396 denom = denom + zero # prevent divide-by-zero 2397 return N.where(zero,0,amoment(a,4,dimension)/denom)
2398 2399
2400 - def adescribe(inarray,dimension=None):
2401 """ 2402 Returns several descriptive statistics of the passed array. Dimension 2403 can equal None (ravel array first), an integer (the dimension over 2404 which to operate), or a sequence (operate over multiple dimensions). 2405 2406 Usage: adescribe(inarray,dimension=None) 2407 Returns: n, (min,max), mean, standard deviation, skew, kurtosis 2408 """ 2409 if dimension == None: 2410 inarray = N.ravel(inarray) 2411 dimension = 0 2412 n = inarray.shape[dimension] 2413 mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray)) 2414 m = amean(inarray,dimension) 2415 sd = astdev(inarray,dimension) 2416 skew = askew(inarray,dimension) 2417 kurt = akurtosis(inarray,dimension) 2418 return n, mm, m, sd, skew, kurt
2419 2420 2421 ##################################### 2422 ######## NORMALITY TESTS ########## 2423 ##################################### 2424
2425 - def askewtest(a,dimension=None):
2426 """ 2427 Tests whether the skew is significantly different from a normal 2428 distribution. Dimension can equal None (ravel array first), an 2429 integer (the dimension over which to operate), or a sequence (operate 2430 over multiple dimensions). 2431 2432 Usage: askewtest(a,dimension=None) 2433 Returns: z-score and 2-tail z-probability 2434 """ 2435 if dimension == None: 2436 a = N.ravel(a) 2437 dimension = 0 2438 b2 = askew(a,dimension) 2439 n = float(a.shape[dimension]) 2440 y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)) ) 2441 beta2 = ( 3.0*(n*n+27*n-70)*(n+1)*(n+3) ) / ( (n-2.0)*(n+5)*(n+7)*(n+9) ) 2442 W2 = -1 + N.sqrt(2*(beta2-1)) 2443 delta = 1/N.sqrt(N.log(N.sqrt(W2))) 2444 alpha = N.sqrt(2/(W2-1)) 2445 y = N.where(N.equal(y,0),1,y) 2446 Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1)) 2447 return Z, (1.0-zprob(Z))*2
2448 2449
2450 - def akurtosistest(a,dimension=None):
2451 """ 2452 Tests whether a dataset has normal kurtosis (i.e., 2453 kurtosis=3(n-1)/(n+1)) Valid only for n>20. Dimension can equal None 2454 (ravel array first), an integer (the dimension over which to operate), 2455 or a sequence (operate over multiple dimensions). 2456 2457 Usage: akurtosistest(a,dimension=None) 2458 Returns: z-score and 2-tail z-probability, returns 0 for bad pixels 2459 """ 2460 if dimension == None: 2461 a = N.ravel(a) 2462 dimension = 0 2463 n = float(a.shape[dimension]) 2464 if n<20: 2465 print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n 2466 b2 = akurtosis(a,dimension) 2467 E = 3.0*(n-1) /(n+1) 2468 varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5)) 2469 x = (b2-E)/N.sqrt(varb2) 2470 sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))/ 2471 (n*(n-2)*(n-3))) 2472 A = 6.0 + 8.0/sqrtbeta1 *(2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2))) 2473 term1 = 1 -2/(9.0*A) 2474 denom = 1 +x*N.sqrt(2/(A-4.0)) 2475 denom = N.where(N.less(denom,0), 99, denom) 2476 term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0)) 2477 Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A)) 2478 Z = N.where(N.equal(denom,99), 0, Z) 2479 return Z, (1.0-zprob(Z))*2
2480 2481
2482 - def anormaltest(a,dimension=None):
2483 """ 2484 Tests whether skew and/OR kurtosis of dataset differs from normal 2485 curve. Can operate over multiple dimensions. Dimension can equal 2486 None (ravel array first), an integer (the dimension over which to 2487 operate), or a sequence (operate over multiple dimensions). 2488 2489 Usage: anormaltest(a,dimension=None) 2490 Returns: z-score and 2-tail probability 2491 """ 2492 if dimension == None: 2493 a = N.ravel(a) 2494 dimension = 0 2495 s,p = askewtest(a,dimension) 2496 k,p = akurtosistest(a,dimension) 2497 k2 = N.power(s,2) + N.power(k,2) 2498 return k2, achisqprob(k2,2)
2499 2500 2501 ##################################### 2502 ###### AFREQUENCY FUNCTIONS ####### 2503 ##################################### 2504
2505 - def aitemfreq(a):
2506 """ 2507 Returns a 2D array of item frequencies. Column 1 contains item values, 2508 column 2 contains their respective counts. Assumes a 1D array is passed. 2509 2510 Usage: aitemfreq(a) 2511 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) 2512 """ 2513 scores = pstat.aunique(a) 2514 scores = N.sort(scores) 2515 freq = N.zeros(len(scores)) 2516 for i in range(len(scores)): 2517 freq[i] = N.add.reduce(N.equal(a,scores[i])) 2518 return N.array(pstat.aabut(scores, freq))
2519 2520
2521 - def ascoreatpercentile (inarray, percent):
2522 """ 2523 Usage: ascoreatpercentile(inarray,percent) 0<percent<100 2524 Returns: score at given percentile, relative to inarray distribution 2525 """ 2526 percent = percent / 100.0 2527 targetcf = percent*len(inarray) 2528 h, lrl, binsize, extras = histogram(inarray) 2529 cumhist = cumsum(h*1) 2530 for i in range(len(cumhist)): 2531 if cumhist[i] >= targetcf: 2532 break 2533 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) 2534 return score
2535 2536
2537 - def apercentileofscore (inarray,score,histbins=10,defaultlimits=None):
2538 """ 2539 Note: result of this function depends on the values used to histogram 2540 the data(!). 2541 2542 Usage: apercentileofscore(inarray,score,histbins=10,defaultlimits=None) 2543 Returns: percentile-position of score (0-100) relative to inarray 2544 """ 2545 h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits) 2546 cumhist = cumsum(h*1) 2547 i = int((score - lrl)/float(binsize)) 2548 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100 2549 return pct
2550 2551
2552 - def ahistogram (inarray,numbins=10,defaultlimits=None,printextras=1):
2553 """ 2554 Returns (i) an array of histogram bin counts, (ii) the smallest value 2555 of the histogram binning, and (iii) the bin width (the last 2 are not 2556 necessarily integers). Default number of bins is 10. Defaultlimits 2557 can be None (the routine picks bins spanning all the numbers in the 2558 inarray) or a 2-sequence (lowerlimit, upperlimit). Returns all of the 2559 following: array of bin values, lowerreallimit, binsize, extrapoints. 2560 2561 Usage: ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1) 2562 Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range) 2563 """ 2564 inarray = N.ravel(inarray) # flatten any >1D arrays 2565 if (defaultlimits <> None): 2566 lowerreallimit = defaultlimits[0] 2567 upperreallimit = defaultlimits[1] 2568 binsize = (upperreallimit-lowerreallimit) / float(numbins) 2569 else: 2570 Min = N.minimum.reduce(inarray) 2571 Max = N.maximum.reduce(inarray) 2572 estbinwidth = float(Max - Min)/float(numbins) + 1 2573 binsize = (Max-Min+estbinwidth)/float(numbins) 2574 lowerreallimit = Min - binsize/2.0 #lower real limit,1st bin 2575 bins = N.zeros(numbins) 2576 extrapoints = 0 2577 for num in inarray: 2578 try: 2579 if (num-lowerreallimit) < 0: 2580 extrapoints = extrapoints + 1 2581 else: 2582 bintoincrement = int((num-lowerreallimit) / float(binsize)) 2583 bins[bintoincrement] = bins[bintoincrement] + 1 2584 except: # point outside lower/upper limits 2585 extrapoints = extrapoints + 1 2586 if (extrapoints > 0 and printextras == 1): 2587 print '\nPoints outside given histogram range =',extrapoints 2588 return (bins, lowerreallimit, binsize, extrapoints)
2589 2590
2591 - def acumfreq(a,numbins=10,defaultreallimits=None):
2592 """ 2593 Returns a cumulative frequency histogram, using the histogram function. 2594 Defaultreallimits can be None (use all data), or a 2-sequence containing 2595 lower and upper limits on values to include. 2596 2597 Usage: acumfreq(a,numbins=10,defaultreallimits=None) 2598 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints 2599 """ 2600 h,l,b,e = histogram(a,numbins,defaultreallimits) 2601 cumhist = cumsum(h*1) 2602 return cumhist,l,b,e
2603 2604
2605 - def arelfreq(a,numbins=10,defaultreallimits=None):
2606 """ 2607 Returns a relative frequency histogram, using the histogram function. 2608 Defaultreallimits can be None (use all data), or a 2-sequence containing 2609 lower and upper limits on values to include. 2610 2611 Usage: arelfreq(a,numbins=10,defaultreallimits=None) 2612 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints 2613 """ 2614 h,l,b,e = histogram(a,numbins,defaultreallimits) 2615 h = N.array(h/float(a.shape[0])) 2616 return h,l,b,e
2617 2618 2619 ##################################### 2620 ###### AVARIABILITY FUNCTIONS ##### 2621 ##################################### 2622
2623 - def aobrientransform(*args):
2624 """ 2625 Computes a transform on input data (any number of columns). Used to 2626 test for homogeneity of variance prior to running one-way stats. Each 2627 array in *args is one level of a factor. If an F_oneway() run on the 2628 transformed data and found significant, variances are unequal. From 2629 Maxwell and Delaney, p.112. 2630 2631 Usage: aobrientransform(*args) *args = 1D arrays, one per level of factor 2632 Returns: transformed data for use in an ANOVA 2633 """ 2634 TINY = 1e-10 2635 k = len(args) 2636 n = N.zeros(k,N.Float) 2637 v = N.zeros(k,N.Float) 2638 m = N.zeros(k,N.Float) 2639 nargs = [] 2640 for i in range(k): 2641 nargs.append(args[i].astype(N.Float)) 2642 n[i] = float(len(nargs[i])) 2643 v[i] = var(nargs[i]) 2644 m[i] = mean(nargs[i]) 2645 for j in range(k): 2646 for i in range(n[j]): 2647 t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 2648 t2 = 0.5*v[j]*(n[j]-1.0) 2649 t3 = (n[j]-1.0)*(n[j]-2.0) 2650 nargs[j][i] = (t1-t2) / float(t3) 2651 check = 1 2652 for j in range(k): 2653 if v[j] - mean(nargs[j]) > TINY: 2654 check = 0 2655 if check <> 1: 2656 raise ValueError, 'Lack of convergence in obrientransform.' 2657 else: 2658 return N.array(nargs)
2659 2660
2661 - def asamplevar (inarray,dimension=None,keepdims=0):
2662 """ 2663 Returns the sample standard deviation of the values in the passed 2664 array (i.e., using N). Dimension can equal None (ravel array first), 2665 an integer (the dimension over which to operate), or a sequence 2666 (operate over multiple dimensions). Set keepdims=1 to return an array 2667 with the same number of dimensions as inarray. 2668 2669 Usage: asamplevar(inarray,dimension=None,keepdims=0) 2670 """ 2671 if dimension == None: 2672 inarray = N.ravel(inarray) 2673 dimension = 0 2674 if dimension == 1: 2675 mn = amean(inarray,dimension)[:,N.NewAxis] 2676 else: 2677 mn = amean(inarray,dimension,keepdims=1) 2678 deviations = inarray - mn 2679 if type(dimension) == ListType: 2680 n = 1 2681 for d in dimension: 2682 n = n*inarray.shape[d] 2683 else: 2684 n = inarray.shape[dimension] 2685 svar = ass(deviations,dimension,keepdims) / float(n) 2686 return svar
2687 2688
2689 - def asamplestdev (inarray, dimension=None, keepdims=0):
2690 """ 2691 Returns the sample standard deviation of the values in the passed 2692 array (i.e., using N). Dimension can equal None (ravel array first), 2693 an integer (the dimension over which to operate), or a sequence 2694 (operate over multiple dimensions). Set keepdims=1 to return an array 2695 with the same number of dimensions as inarray. 2696 2697 Usage: asamplestdev(inarray,dimension=None,keepdims=0) 2698 """ 2699 return N.sqrt(asamplevar(inarray,dimension,keepdims))
2700 2701
2702 - def asignaltonoise(instack,dimension=0):
2703 """ 2704 Calculates signal-to-noise. Dimension can equal None (ravel array 2705 first), an integer (the dimension over which to operate), or a 2706 sequence (operate over multiple dimensions). 2707 2708 Usage: asignaltonoise(instack,dimension=0): 2709 Returns: array containing the value of (mean/stdev) along dimension, or 0 when stdev=0 2710 """ 2711 m = mean(instack,dimension) 2712 sd = stdev(instack,dimension) 2713 return N.where(N.equal(sd,0),0,m/sd)
2714 2715
2716 - def avar (inarray, dimension=None,keepdims=0):
2717 """ 2718 Returns the estimated population variance of the values in the passed 2719 array (i.e., N-1). Dimension can equal None (ravel array first), an 2720 integer (the dimension over which to operate), or a sequence (operate 2721 over multiple dimensions). Set keepdims=1 to return an array with the 2722 same number of dimensions as inarray. 2723 2724 Usage: avar(inarray,dimension=None,keepdims=0) 2725 """ 2726 if dimension == None: 2727 inarray = N.ravel(inarray) 2728 dimension = 0 2729 mn = amean(inarray,dimension,1) 2730 deviations = inarray - mn 2731 if type(dimension) == ListType: 2732 n = 1 2733 for d in dimension: 2734 n = n*inarray.shape[d] 2735 else: 2736 n = inarray.shape[dimension] 2737 var = ass(deviations,dimension,keepdims)/float(n-1) 2738 return var
2739 2740
2741 - def astdev (inarray, dimension=None, keepdims=0):
2742 """ 2743 Returns the estimated population standard deviation of the values in 2744 the passed array (i.e., N-1). Dimension can equal None (ravel array 2745 first), an integer (the dimension over which to operate), or a 2746 sequence (operate over multiple dimensions). Set keepdims=1 to return 2747 an array with the same number of dimensions as inarray. 2748 2749 Usage: astdev(inarray,dimension=None,keepdims=0) 2750 """ 2751 return N.sqrt(avar(inarray,dimension,keepdims))
2752 2753
2754 - def asterr (inarray, dimension=None, keepdims=0):
2755 """ 2756 Returns the estimated population standard error of the values in the 2757 passed array (i.e., N-1). Dimension can equal None (ravel array 2758 first), an integer (the dimension over which to operate), or a 2759 sequence (operate over multiple dimensions). Set keepdims=1 to return 2760 an array with the same number of dimensions as inarray. 2761 2762 Usage: asterr(inarray,dimension=None,keepdims=0) 2763 """ 2764 if dimension == None: 2765 inarray = N.ravel(inarray) 2766 dimension = 0 2767 return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension]))
2768 2769
2770 - def asem (inarray, dimension=None, keepdims=0):
2771 """ 2772 Returns the standard error of the mean (i.e., using N) of the values 2773 in the passed array. Dimension can equal None (ravel array first), an 2774 integer (the dimension over which to operate), or a sequence (operate 2775 over multiple dimensions). Set keepdims=1 to return an array with the 2776 same number of dimensions as inarray. 2777 2778 Usage: asem(inarray,dimension=None, keepdims=0) 2779 """ 2780 if dimension == None: 2781 inarray = N.ravel(inarray) 2782 dimension = 0 2783 if type(dimension) == ListType: 2784 n = 1 2785 for d in dimension: 2786 n = n*inarray.shape[d] 2787 else: 2788 n = inarray.shape[dimension] 2789 s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1) 2790 return s
2791 2792
2793 - def az (a, score):
2794 """ 2795 Returns the z-score of a given input score, given thearray from which 2796 that score came. Not appropriate for population calculations, nor for 2797 arrays > 1D. 2798 2799 Usage: az(a, score) 2800 """ 2801 z = (score-amean(a)) / asamplestdev(a) 2802 return z
2803 2804
2805 - def azs (a):
2806 """ 2807 Returns a 1D array of z-scores, one for each score in the passed array, 2808 computed relative to the passed array. 2809 2810 Usage: azs(a) 2811 """ 2812 zscores = [] 2813 for item in a: 2814 zscores.append(z(a,item)) 2815 return N.array(zscores)
2816 2817
2818 - def azmap (scores, compare, dimension=0):
2819 """ 2820 Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to 2821 array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0 2822 of the compare array. 2823 2824 Usage: azs(scores, compare, dimension=0) 2825 """ 2826 mns = amean(compare,dimension) 2827 sstd = asamplestdev(compare,0) 2828 return (scores - mns) / sstd
2829 2830 2831 ##################################### 2832 ####### ATRIMMING FUNCTIONS ####### 2833 ##################################### 2834
2835 - def around(a,digits=1):
2836 """ 2837 Rounds all values in array a to 'digits' decimal places. 2838 2839 Usage: around(a,digits) 2840 Returns: a, where each value is rounded to 'digits' decimals 2841 """ 2842 def ar(x,d=digits): 2843 return round(x,d)
2844 2845 if type(a) <> N.ArrayType: 2846 try: 2847 a = N.array(a) 2848 except: 2849 a = N.array(a,'O') 2850 shp = a.shape 2851 if a.typecode() in ['f','F','d','D']: 2852 b = N.ravel(a) 2853 b = N.array(map(ar,b)) 2854 b.shape = shp 2855 elif a.typecode() in ['o','O']: 2856 b = N.ravel(a)*1 2857 for i in range(len(b)): 2858 if type(b[i]) == FloatType: 2859 b[i] = round(b[i],digits) 2860 b.shape = shp 2861 else: # not a float, double or Object array 2862 b = a*1 2863 return b 2864 2865
2866 - def athreshold(a,threshmin=None,threshmax=None,newval=0):
2867 """ 2868 Like Numeric.clip() except that values <threshmid or >threshmax are replaced 2869 by newval instead of by threshmin/threshmax (respectively). 2870 2871 Usage: athreshold(a,threshmin=None,threshmax=None,newval=0) 2872 Returns: a, with values <threshmin or >threshmax replaced with newval 2873 """ 2874 mask = N.zeros(a.shape) 2875 if threshmin <> None: 2876 mask = mask + N.where(N.less(a,threshmin),1,0) 2877 if threshmax <> None: 2878 mask = mask + N.where(N.greater(a,threshmax),1,0) 2879 mask = N.clip(mask,0,1) 2880 return N.where(mask,newval,a)
2881 2882
2883 - def atrimboth (a,proportiontocut):
2884 """ 2885 Slices off the passed proportion of items from BOTH ends of the passed 2886 array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 2887 'rightmost' 10% of scores. You must pre-sort the array if you want 2888 "proper" trimming. Slices off LESS if proportion results in a 2889 non-integer slice index (i.e., conservatively slices off 2890 proportiontocut). 2891 2892 Usage: atrimboth (a,proportiontocut) 2893 Returns: trimmed version of array a 2894 """ 2895 lowercut = int(proportiontocut*len(a)) 2896 uppercut = len(a) - lowercut 2897 return a[lowercut:uppercut]
2898 2899
2900 - def atrim1 (a,proportiontocut,tail='right'):
2901 """ 2902 Slices off the passed proportion of items from ONE end of the passed 2903 array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' 2904 10% of scores). Slices off LESS if proportion results in a non-integer 2905 slice index (i.e., conservatively slices off proportiontocut). 2906 2907 Usage: atrim1(a,proportiontocut,tail='right') or set tail='left' 2908 Returns: trimmed version of array a 2909 """ 2910 if string.lower(tail) == 'right': 2911 lowercut = 0 2912 uppercut = len(a) - int(proportiontocut*len(a)) 2913 elif string.lower(tail) == 'left': 2914 lowercut = int(proportiontocut*len(a)) 2915 uppercut = len(a) 2916 return a[lowercut:uppercut]
2917 2918 2919 ##################################### 2920 ##### ACORRELATION FUNCTIONS ###### 2921 ##################################### 2922
2923 - def acovariance(X):
2924 """ 2925 Computes the covariance matrix of a matrix X. Requires a 2D matrix input. 2926 2927 Usage: acovariance(X) 2928 Returns: covariance matrix of X 2929 """ 2930 if len(X.shape) <> 2: 2931 raise TypeError, "acovariance requires 2D matrices" 2932 n = X.shape[0] 2933 mX = amean(X,0) 2934 return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX)
2935 2936
2937 - def acorrelation(X):
2938 """ 2939 Computes the correlation matrix of a matrix X. Requires a 2D matrix input. 2940 2941 Usage: acorrelation(X) 2942 Returns: correlation matrix of X 2943 """ 2944 C = acovariance(X) 2945 V = N.diagonal(C) 2946 return C / N.sqrt(N.multiply.outer(V,V))
2947 2948
2949 - def apaired(x,y):
2950 """ 2951 Interactively determines the type of data in x and y, and then runs the 2952 appropriated statistic for paired group data. 2953 2954 Usage: apaired(x,y) x,y = the two arrays of values to be compared 2955 Returns: appropriate statistic name, value, and probability 2956 """ 2957 samples = '' 2958 while samples not in ['i','r','I','R','c','C']: 2959 print '\nIndependent or related samples, or correlation (i,r,c): ', 2960 samples = raw_input() 2961 2962 if samples in ['i','I','r','R']: 2963 print '\nComparing variances ...', 2964 # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 2965 r = obrientransform(x,y) 2966 f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1)) 2967 if p<0.05: 2968 vartype='unequal, p='+str(round(p,4)) 2969 else: 2970 vartype='equal' 2971 print vartype 2972 if samples in ['i','I']: 2973 if vartype[0]=='e': 2974 t,p = ttest_ind(x,y,None,0) 2975 print '\nIndependent samples t-test: ', round(t,4),round(p,4) 2976 else: 2977 if len(x)>20 or len(y)>20: 2978 z,p = ranksums(x,y) 2979 print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4) 2980 else: 2981 u,p = mannwhitneyu(x,y) 2982 print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4) 2983 2984 else: # RELATED SAMPLES 2985 if vartype[0]=='e': 2986 t,p = ttest_rel(x,y,0) 2987 print '\nRelated samples t-test: ', round(t,4),round(p,4) 2988 else: 2989 t,p = ranksums(x,y) 2990 print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4) 2991 else: # CORRELATION ANALYSIS 2992 corrtype = '' 2993 while corrtype not in ['c','C','r','R','d','D']: 2994 print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', 2995 corrtype = raw_input() 2996 if corrtype in ['c','C']: 2997 m,b,r,p,see = linregress(x,y) 2998 print '\nLinear regression for continuous variables ...' 2999 lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]] 3000 pstat.printcc(lol) 3001 elif corrtype in ['r','R']: 3002 r,p = spearmanr(x,y) 3003 print '\nCorrelation for ranked variables ...' 3004 print "Spearman's r: ",round(r,4),round(p,4) 3005 else: # DICHOTOMOUS 3006 r,p = pointbiserialr(x,y) 3007 print '\nAssuming x contains a dichotomous variable ...' 3008 print 'Point Biserial r: ',round(r,4),round(p,4) 3009 print '\n\n' 3010 return None
3011 3012
3013 - def apearsonr(x,y,verbose=1):
3014 """ 3015 Calculates a Pearson correlation coefficient and returns p. Taken 3016 from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195. 3017 3018 Usage: apearsonr(x,y,verbose=1) where x,y are equal length arrays 3019 Returns: Pearson's r, two-tailed p-value 3020 """ 3021 TINY = 1.0e-20 3022 n = len(x) 3023 xmean = amean(x) 3024 ymean = amean(y) 3025 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) 3026 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) 3027 r = (r_num / r_den) 3028 df = n-2 3029 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 3030 prob = abetai(0.5*df,0.5,df/(df+t*t),verbose) 3031 return r,prob
3032 3033
3034 - def aspearmanr(x,y):
3035 """ 3036 Calculates a Spearman rank-order correlation coefficient. Taken 3037 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. 3038 3039 Usage: aspearmanr(x,y) where x,y are equal-length arrays 3040 Returns: Spearman's r, two-tailed p-value 3041 """ 3042 TINY = 1e-30 3043 n = len(x) 3044 rankx = rankdata(x) 3045 ranky = rankdata(y) 3046 dsq = N.add.reduce((rankx-ranky)**2) 3047 rs = 1 - 6*dsq / float(n*(n**2-1)) 3048 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) 3049 df = n-2 3050 probrs = abetai(0.5*df,0.5,df/(df+t*t)) 3051 # probability values for rs are from part 2 of the spearman function in 3052 # Numerical Recipies, p.510. They close to tables, but not exact.(?) 3053 return rs, probrs
3054 3055
3056 - def apointbiserialr(x,y):
3057 """ 3058 Calculates a point-biserial correlation coefficient and the associated 3059 probability value. Taken from Heiman's Basic Statistics for the Behav. 3060 Sci (1st), p.194. 3061 3062 Usage: apointbiserialr(x,y) where x,y are equal length arrays 3063 Returns: Point-biserial r, two-tailed p-value 3064 """ 3065 TINY = 1e-30 3066 categories = pstat.aunique(x) 3067 data = pstat.aabut(x,y) 3068 if len(categories) <> 2: 3069 raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()." 3070 else: # there are 2 categories, continue 3071 codemap = pstat.aabut(categories,N.arange(2)) 3072 recoded = pstat.arecode(data,codemap,0) 3073 x = pstat.alinexand(data,0,categories[0]) 3074 y = pstat.alinexand(data,0,categories[1]) 3075 xmean = amean(pstat.acolex(x,1)) 3076 ymean = amean(pstat.acolex(y,1)) 3077 n = len(data) 3078 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) 3079 rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust 3080 df = n-2 3081 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) 3082 prob = abetai(0.5*df,0.5,df/(df+t*t)) 3083 return rpb, prob
3084 3085
3086 - def akendalltau(x,y):
3087 """ 3088 Calculates Kendall's tau ... correlation of ordinal data. Adapted 3089 from function kendl1 in Numerical Recipies. Needs good test-cases.@@@ 3090 3091 Usage: akendalltau(x,y) 3092 Returns: Kendall's tau, two-tailed p-value 3093 """ 3094 n1 = 0 3095 n2 = 0 3096 iss = 0 3097 for j in range(len(x)-1): 3098 for k in range(j,len(y)): 3099 a1 = x[j] - x[k] 3100 a2 = y[j] - y[k] 3101 aa = a1 * a2 3102 if (aa): # neither array has a tie 3103 n1 = n1 + 1 3104 n2 = n2 + 1 3105 if aa > 0: 3106 iss = iss + 1 3107 else: 3108 iss = iss -1 3109 else: 3110 if (a1): 3111 n1 = n1 + 1 3112 else: 3113 n2 = n2 + 1 3114 tau = iss / math.sqrt(n1*n2) 3115 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) 3116 z = tau / math.sqrt(svar) 3117 prob = erfcc(abs(z)/1.4142136) 3118 return tau, prob
3119 3120
3121 - def alinregress(*args):
3122 """ 3123 Calculates a regression line on two arrays, x and y, corresponding to x,y 3124 pairs. If a single 2D array is passed, alinregress finds dim with 2 levels 3125 and splits data into x,y pairs along that dim. 3126 3127 Usage: alinregress(*args) args=2 equal-length arrays, or one 2D array 3128 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate 3129 """ 3130 TINY = 1.0e-20 3131 if len(args) == 1: # more than 1D array? 3132 args = args[0] 3133 if len(args) == 2: 3134 x = args[0] 3135 y = args[1] 3136 else: 3137 x = args[:,0] 3138 y = args[:,1] 3139 else: 3140 x = args[0] 3141 y = args[1] 3142 n = len(x) 3143 xmean = amean(x) 3144 ymean = amean(y) 3145 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) 3146 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) 3147 r = r_num / r_den 3148 z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY)) 3149 df = n-2 3150 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 3151 prob = abetai(0.5*df,0.5,df/(df+t*t)) 3152 slope = r_num / (float(n)*ass(x) - asquare_of_sums(x)) 3153 intercept = ymean - slope*xmean 3154 sterrest = math.sqrt(1-r*r)*asamplestdev(y) 3155 return slope, intercept, r, prob, sterrest
3156 3157 3158 ##################################### 3159 ##### AINFERENTIAL STATISTICS ##### 3160 ##################################### 3161
3162 - def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
3163 """ 3164 Calculates the t-obtained for the independent samples T-test on ONE group 3165 of scores a, given a population mean. If printit=1, results are printed 3166 to the screen. If printit='filename', the results are output to 'filename' 3167 using the given writemode (default=append). Returns t-value, and prob. 3168 3169 Usage: attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') 3170 Returns: t-value, two-tailed prob 3171 """ 3172 if type(a) != N.ArrayType: 3173 a = N.array(a) 3174 x = amean(a) 3175 v = avar(a) 3176 n = len(a) 3177 df = n-1 3178 svar = ((n-1)*v) / float(df) 3179 t = (x-popmean)/math.sqrt(svar*(1.0/n)) 3180 prob = abetai(0.5*df,0.5,df/(df+t*t)) 3181 3182 if printit <> 0: 3183 statname = 'Single-sample T-test.' 3184 outputpairedstats(printit,writemode, 3185 'Population','--',popmean,0,0,0, 3186 name,n,x,v,N.minimum.reduce(N.ravel(a)), 3187 N.maximum.reduce(N.ravel(a)), 3188 statname,t,prob) 3189 return t,prob
3190 3191
3192 - def attest_ind (a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'):
3193 """ 3194 Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores 3195 a, and b. From Numerical Recipies, p.483. If printit=1, results are 3196 printed to the screen. If printit='filename', the results are output 3197 to 'filename' using the given writemode (default=append). Dimension 3198 can equal None (ravel array first), or an integer (the dimension over 3199 which to operate on a and b). 3200 3201 Usage:: 3202 attest_ind (a,b,dimension=None,printit=0, 3203 Name1='Samp1',Name2='Samp2',writemode='a') 3204 Returns: t-value, two-tailed p-value 3205 """ 3206 if dimension == None: 3207 a = N.ravel(a) 3208 b = N.ravel(b) 3209 dimension = 0 3210 x1 = amean(a,dimension) 3211 x2 = amean(b,dimension) 3212 v1 = avar(a,dimension) 3213 v2 = avar(b,dimension) 3214 n1 = a.shape[dimension] 3215 n2 = b.shape[dimension] 3216 df = n1+n2-2 3217 svar = ((n1-1)*v1+(n2-1)*v2) / float(df) 3218 zerodivproblem = N.equal(svar,0) 3219 t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2)) # N-D COMPUTATION HERE!!!!!! 3220 t = N.where(zerodivproblem,1.0,t) # replace NaN t-values with 1.0 3221 probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 3222 3223 if type(t) == N.ArrayType: 3224 probs = N.reshape(probs,t.shape) 3225 if len(probs) == 1: 3226 probs = probs[0] 3227 3228 if printit <> 0: 3229 if type(t) == N.ArrayType: 3230 t = t[0] 3231 if type(probs) == N.ArrayType: 3232 probs = probs[0] 3233 statname = 'Independent samples T-test.' 3234 outputpairedstats(printit,writemode, 3235 name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)), 3236 N.maximum.reduce(N.ravel(a)), 3237 name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)), 3238 N.maximum.reduce(N.ravel(b)), 3239 statname,t,probs) 3240 return 3241 return t, probs
3242 3243
3244 - def attest_rel (a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'):
3245 """ 3246 Calculates the t-obtained T-test on TWO RELATED samples of scores, a 3247 and b. From Numerical Recipies, p.483. If printit=1, results are 3248 printed to the screen. If printit='filename', the results are output 3249 to 'filename' using the given writemode (default=append). Dimension 3250 can equal None (ravel array first), or an integer (the dimension over 3251 which to operate on a and b). 3252 3253 Usage:: 3254 attest_rel(a,b,dimension=None,printit=0, 3255 name1='Samp1',name2='Samp2',writemode='a') 3256 Returns: t-value, two-tailed p-value 3257 """ 3258 if dimension == None: 3259 a = N.ravel(a) 3260 b = N.ravel(b) 3261 dimension = 0 3262 if len(a)<>len(b): 3263 raise ValueError, 'Unequal length arrays.' 3264 x1 = amean(a,dimension) 3265 x2 = amean(b,dimension) 3266 v1 = avar(a,dimension) 3267 v2 = avar(b,dimension) 3268 n = a.shape[dimension] 3269 df = float(n-1) 3270 d = (a-b).astype('d') 3271 3272 denom = N.sqrt((n*N.add.reduce(d*d,dimension) - N.add.reduce(d,dimension)**2) /df) 3273 zerodivproblem = N.equal(denom,0) 3274 t = N.add.reduce(d,dimension) / denom # N-D COMPUTATION HERE!!!!!! 3275 t = N.where(zerodivproblem,1.0,t) # replace NaN t-values with 1.0 3276 t = N.where(zerodivproblem,1.0,t) # replace NaN t-values with 1.0 3277 probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 3278 if type(t) == N.ArrayType: 3279 probs = N.reshape(probs,t.shape) 3280 if len(probs) == 1: 3281 probs = probs[0] 3282 3283 if printit <> 0: 3284 statname = 'Related samples T-test.' 3285 outputpairedstats(printit,writemode, 3286 name1,n,x1,v1,N.minimum.reduce(N.ravel(a)), 3287 N.maximum.reduce(N.ravel(a)), 3288 name2,n,x2,v2,N.minimum.reduce(N.ravel(b)), 3289 N.maximum.reduce(N.ravel(b)), 3290 statname,t,probs) 3291 return 3292 return t, probs
3293 3294
3295 - def achisquare(f_obs,f_exp=None):
3296 """ 3297 Calculates a one-way chi square for array of observed frequencies and returns 3298 the result. If no expected frequencies are given, the total N is assumed to 3299 be equally distributed across all groups. 3300 3301 Usage: achisquare(f_obs, f_exp=None) f_obs = array of observed cell freq. 3302 Returns: chisquare-statistic, associated p-value 3303 """ 3304 3305 k = len(f_obs) 3306 if f_exp == None: 3307 f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.Float) 3308 f_exp = f_exp.astype(N.Float) 3309 chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp) 3310 return chisq, chisqprob(chisq, k-1)
3311 3312
3313 - def aks_2samp (data1,data2):
3314 """ 3315 Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from 3316 Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc- 3317 like. 3318 3319 Usage: aks_2samp(data1,data2) where data1 and data2 are 1D arrays 3320 Returns: KS D-value, p-value 3321 """ 3322 j1 = 0 # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE 3323 j2 = 0 # N.zeros(data2.shape[1:]) 3324 fn1 = 0.0 # N.zeros(data1.shape[1:],N.Float) 3325 fn2 = 0.0 # N.zeros(data2.shape[1:],N.Float) 3326 n1 = data1.shape[0] 3327 n2 = data2.shape[0] 3328 en1 = n1*1 3329 en2 = n2*1 3330 d = N.zeros(data1.shape[1:],N.Float) 3331 data1 = N.sort(data1,0) 3332 data2 = N.sort(data2,0) 3333 while j1 < n1 and j2 < n2: 3334 d1=data1[j1] 3335 d2=data2[j2] 3336 if d1 <= d2: 3337 fn1 = (j1)/float(en1) 3338 j1 = j1 + 1 3339 if d2 <= d1: 3340 fn2 = (j2)/float(en2) 3341 j2 = j2 + 1 3342 dt = (fn2-fn1) 3343 if abs(dt) > abs(d): 3344 d = dt 3345 try: 3346 en = math.sqrt(en1*en2/float(en1+en2)) 3347 prob = aksprob((en+0.12+0.11/en)*N.fabs(d)) 3348 except: 3349 prob = 1.0 3350 return d, prob
3351 3352
3353 - def amannwhitneyu(x,y):
3354 """ 3355 Calculates a Mann-Whitney U statistic on the provided scores and 3356 returns the result. Use only when the n in each condition is < 20 and 3357 you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is 3358 significant if the u-obtained is LESS THAN or equal to the critical 3359 value of U. 3360 3361 Usage: amannwhitneyu(x,y) where x,y are arrays of values for 2 conditions 3362 Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) 3363 """ 3364 n1 = len(x) 3365 n2 = len(y) 3366 ranked = rankdata(N.concatenate((x,y))) 3367 rankx = ranked[0:n1] # get the x-ranks 3368 ranky = ranked[n1:] # the rest are y-ranks 3369 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x 3370 u2 = n1*n2 - u1 # remainder is U for y 3371 bigu = max(u1,u2) 3372 smallu = min(u1,u2) 3373 T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores 3374 if T == 0: 3375 raise ValueError, 'All numbers are identical in amannwhitneyu' 3376 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) 3377 z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc 3378 return smallu, 1.0 - zprob(z)
3379 3380
3381 - def atiecorrect(rankvals):
3382 """ 3383 Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests. 3384 See Siegel, S. (1956) Nonparametric Statistics for the Behavioral 3385 Sciences. New York: McGraw-Hill. Code adapted from |Stat rankind.c 3386 code. 3387 3388 Usage: atiecorrect(rankvals) 3389 Returns: T correction factor for U or H 3390 """ 3391 sorted,posn = ashellsort(N.array(rankvals)) 3392 n = len(sorted) 3393 T = 0.0 3394 i = 0 3395 while (i<n-1): 3396 if sorted[i] == sorted[i+1]: 3397 nties = 1 3398 while (i<n-1) and (sorted[i] == sorted[i+1]): 3399 nties = nties +1 3400 i = i +1 3401 T = T + nties**3 - nties 3402 i = i+1 3403 T = T / float(n**3-n) 3404 return 1.0 - T
3405 3406
3407 - def aranksums(x,y):
3408 """ 3409 Calculates the rank sums statistic on the provided scores and returns 3410 the result. 3411 3412 Usage: aranksums(x,y) where x,y are arrays of values for 2 conditions 3413 Returns: z-statistic, two-tailed p-value 3414 """ 3415 n1 = len(x) 3416 n2 = len(y) 3417 alldata = N.concatenate((x,y)) 3418 ranked = arankdata(alldata) 3419 x = ranked[:n1] 3420 y = ranked[n1:] 3421 s = sum(x) 3422 expected = n1*(n1+n2+1) / 2.0 3423 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) 3424 prob = 2*(1.0 -zprob(abs(z))) 3425 return z, prob
3426 3427
3428 - def awilcoxont(x,y):
3429 """ 3430 Calculates the Wilcoxon T-test for related samples and returns the 3431 result. A non-parametric T-test. 3432 3433 Usage: awilcoxont(x,y) where x,y are equal-length arrays for 2 conditions 3434 Returns: t-statistic, two-tailed p-value 3435 """ 3436 if len(x) <> len(y): 3437 raise ValueError, 'Unequal N in awilcoxont. Aborting.' 3438 d = x-y 3439 d = N.compress(N.not_equal(d,0),d) # Keep all non-zero differences 3440 count = len(d) 3441 absd = abs(d) 3442 absranked = arankdata(absd) 3443 r_plus = 0.0 3444 r_minus = 0.0 3445 for i in range(len(absd)): 3446 if d[i] < 0: 3447 r_minus = r_minus + absranked[i] 3448 else: 3449 r_plus = r_plus + absranked[i] 3450 wt = min(r_plus, r_minus) 3451 mn = count * (count+1) * 0.25 3452 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) 3453 z = math.fabs(wt-mn) / se 3454 z = math.fabs(wt-mn) / se 3455 prob = 2*(1.0 -zprob(abs(z))) 3456 return wt, prob
3457 3458
3459 - def akruskalwallish(*args):
3460 """ 3461 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more 3462 groups, requiring at least 5 subjects in each group. This function 3463 calculates the Kruskal-Wallis H and associated p-value for 3 or more 3464 independent samples. 3465 3466 Usage: akruskalwallish(*args) args are separate arrays for 3+ conditions 3467 Returns: H-statistic (corrected for ties), associated p-value 3468 """ 3469 assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()" 3470 args = list(args) 3471 n = [0]*len(args) 3472 n = map(len,args) 3473 all = [] 3474 for i in range(len(args)): 3475 all = all + args[i].tolist() 3476 ranked = rankdata(all) 3477 T = tiecorrect(ranked) 3478 for i in range(len(args)): 3479 args[i] = ranked[0:n[i]] 3480 del ranked[0:n[i]] 3481 rsums = [] 3482 for i in range(len(args)): 3483 rsums.append(sum(args[i])**2) 3484 rsums[i] = rsums[i] / float(n[i]) 3485 ssbn = sum(rsums) 3486 totaln = sum(n) 3487 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) 3488 df = len(args) - 1 3489 if T == 0: 3490 raise ValueError, 'All numbers are identical in akruskalwallish' 3491 h = h / float(T) 3492 return h, chisqprob(h,df)
3493 3494
3495 - def afriedmanchisquare(*args):
3496 """ 3497 Friedman Chi-Square is a non-parametric, one-way within-subjects 3498 ANOVA. This function calculates the Friedman Chi-square test for 3499 repeated measures and returns the result, along with the associated 3500 probability value. It assumes 3 or more repeated measures. Only 3 3501 levels requires a minimum of 10 subjects in the study. Four levels 3502 requires 5 subjects per level(??). 3503 3504 Usage: afriedmanchisquare(*args) args are separate arrays for 2+ conditions 3505 Returns: chi-square statistic, associated p-value 3506 """ 3507 k = len(args) 3508 if k < 3: 3509 raise ValueError, '\nLess than 3 levels. Friedman test not appropriate.\n' 3510 n = len(args[0]) 3511 data = apply(pstat.aabut,args) 3512 data = data.astype(N.Float) 3513 for i in range(len(data)): 3514 data[i] = arankdata(data[i]) 3515 ssbn = asum(asum(args,1)**2) 3516 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) 3517 return chisq, chisqprob(chisq,k-1)
3518 3519 3520 ##################################### 3521 #### APROBABILITY CALCULATIONS #### 3522 ##################################### 3523
3524 - def achisqprob(chisq,df):
3525 """ 3526 Returns the (1-tail) probability value associated with the provided chi-square 3527 value and df. Heavily modified from chisq.c in Gary Perlman's |Stat. Can 3528 handle multiple dimensions. 3529 3530 Usage: achisqprob(chisq,df) chisq=chisquare stat., df=degrees of freedom 3531 """ 3532 BIG = 200.0 3533 def ex(x): 3534 BIG = 200.0 3535 exponents = N.where(N.less(x,-BIG),-BIG,x) 3536 return N.exp(exponents)
3537 3538 if type(chisq) == N.ArrayType: 3539 arrayflag = 1 3540 else: 3541 arrayflag = 0 3542 chisq = N.array([chisq]) 3543 if df < 1: 3544 return N.ones(chisq.shape,N.float) 3545 probs = N.zeros(chisq.shape,N.Float) 3546 probs = N.where(N.less_equal(chisq,0),1.0,probs) # set prob=1 for chisq<0 3547 a = 0.5 * chisq 3548 if df > 1: 3549 y = ex(-a) 3550 if df%2 == 0: 3551 even = 1 3552 s = y*1 3553 s2 = s*1 3554 else: 3555 even = 0 3556 s = 2.0 * azprob(-N.sqrt(chisq)) 3557 s2 = s*1 3558 if (df > 2): 3559 chisq = 0.5 * (df - 1.0) 3560 if even: 3561 z = N.ones(probs.shape,N.Float) 3562 else: 3563 z = 0.5 *N.ones(probs.shape,N.Float) 3564 if even: 3565 e = N.zeros(probs.shape,N.Float) 3566 else: 3567 e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.Float) 3568 c = N.log(a) 3569 mask = N.zeros(probs.shape) 3570 a_big = N.greater(a,BIG) 3571 a_big_frozen = -1 *N.ones(probs.shape,N.Float) 3572 totalelements = N.multiply.reduce(N.array(probs.shape)) 3573 while asum(mask)<>totalelements: 3574 e = N.log(z) + e 3575 s = s + ex(c*z-a-e) 3576 z = z + 1.0 3577 print z, e, s 3578 newmask = N.greater(z,chisq) 3579 a_big_frozen = N.where(newmask*N.equal(mask,0)*a_big, s, a_big_frozen) 3580 mask = N.clip(newmask+mask,0,1) 3581 if even: 3582 z = N.ones(probs.shape,N.Float) 3583 e = N.ones(probs.shape,N.Float) 3584 else: 3585 z = 0.5 *N.ones(probs.shape,N.Float) 3586 e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.Float) 3587 c = 0.0 3588 mask = N.zeros(probs.shape) 3589 a_notbig_frozen = -1 *N.ones(probs.shape,N.Float) 3590 while asum(mask)<>totalelements: 3591 e = e * (a/z.astype(N.Float)) 3592 c = c + e 3593 z = z + 1.0 3594 print '#2', z, e, c, s, c*y+s2 3595 newmask = N.greater(z,chisq) 3596 a_notbig_frozen = N.where(newmask*N.equal(mask,0)*(1-a_big), 3597 c*y+s2, a_notbig_frozen) 3598 mask = N.clip(newmask+mask,0,1) 3599 probs = N.where(N.equal(probs,1),1, 3600 N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen)) 3601 return probs 3602 else: 3603 return s 3604 3605
3606 - def aerfcc(x):
3607 """ 3608 Returns the complementary error function erfc(x) with fractional error 3609 everywhere less than 1.2e-7. Adapted from Numerical Recipies. Can 3610 handle multiple dimensions. 3611 3612 Usage: aerfcc(x) 3613 """ 3614 z = abs(x) 3615 t = 1.0 / (1.0+0.5*z) 3616 ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) 3617 return N.where(N.greater_equal(x,0), ans, 2.0-ans)
3618 3619
3620 - def azprob(z):
3621 """ 3622 Returns the area under the normal curve 'to the left of' the given z value. 3623 Thus, :: 3624 for z<0, zprob(z) = 1-tail probability 3625 for z>0, 1.0-zprob(z) = 1-tail probability 3626 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 3627 Adapted from z.c in Gary Perlman's |Stat. Can handle multiple dimensions. 3628 3629 Usage: azprob(z) where z is a z-value 3630 """ 3631 def yfunc(y): 3632 x = (((((((((((((-0.000045255659 * y 3633 +0.000152529290) * y -0.000019538132) * y 3634 -0.000676904986) * y +0.001390604284) * y 3635 -0.000794620820) * y -0.002034254874) * y 3636 +0.006549791214) * y -0.010557625006) * y 3637 +0.011630447319) * y -0.009279453341) * y 3638 +0.005353579108) * y -0.002141268741) * y 3639 +0.000535310849) * y +0.999936657524 3640 return x
3641 3642 def wfunc(w): 3643 x = ((((((((0.000124818987 * w 3644 -0.001075204047) * w +0.005198775019) * w 3645 -0.019198292004) * w +0.059054035642) * w 3646 -0.151968751364) * w +0.319152932694) * w 3647 -0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0 3648 return x 3649 3650 Z_MAX = 6.0 # maximum meaningful z-value 3651 x = N.zeros(z.shape,N.Float) # initialize 3652 y = 0.5 * N.fabs(z) 3653 x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0)) # get x's 3654 x = N.where(N.greater(y,Z_MAX*0.5),1.0,x) # kill those with big Z 3655 prob = N.where(N.greater(z,0),(x+1)*0.5,(1-x)*0.5) 3656 return prob 3657 3658
3659 - def aksprob(alam):
3660 """ 3661 Returns the probability value for a K-S statistic computed via ks_2samp. 3662 Adapted from Numerical Recipies. Can handle multiple dimensions. 3663 3664 Usage: aksprob(alam) 3665 """ 3666 if type(alam) == N.ArrayType: 3667 frozen = -1 *N.ones(alam.shape,N.Float64) 3668 alam = alam.astype(N.Float64) 3669 arrayflag = 1 3670 else: 3671 frozen = N.array(-1.) 3672 alam = N.array(alam,N.Float64) 3673 mask = N.zeros(alam.shape) 3674 fac = 2.0 *N.ones(alam.shape,N.Float) 3675 sum = N.zeros(alam.shape,N.Float) 3676 termbf = N.zeros(alam.shape,N.Float) 3677 a2 = N.array(-2.0*alam*alam,N.Float64) 3678 totalelements = N.multiply.reduce(N.array(mask.shape)) 3679 for j in range(1,201): 3680 if asum(mask) == totalelements: 3681 break 3682 exponents = (a2*j*j) 3683 overflowmask = N.less(exponents,-746) 3684 frozen = N.where(overflowmask,0,frozen) 3685 mask = mask+overflowmask 3686 term = fac*N.exp(exponents) 3687 sum = sum + term 3688 newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) + 3689 N.less(abs(term),1.0e-8*sum), 1, 0) 3690 frozen = N.where(newmask*N.equal(mask,0), sum, frozen) 3691 mask = N.clip(mask+newmask,0,1) 3692 fac = -fac 3693 termbf = abs(term) 3694 if arrayflag: 3695 return N.where(N.equal(frozen,-1), 1.0, frozen) # 1.0 if doesn't converge 3696 else: 3697 return N.where(N.equal(frozen,-1), 1.0, frozen)[0] # 1.0 if doesn't converge
3698 3699
3700 - def afprob (dfnum, dfden, F):
3701 """ 3702 Returns the 1-tailed significance level (p-value) of an F statistic 3703 given the degrees of freedom for the numerator (dfR-dfF) and the degrees 3704 of freedom for the denominator (dfF). Can handle multiple dims for F. 3705 3706 Usage: afprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn 3707 """ 3708 if type(F) == N.ArrayType: 3709 return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F)) 3710 else: 3711 return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
3712 3713
3714 - def abetacf(a,b,x,verbose=1):
3715 """ 3716 Evaluates the continued fraction form of the incomplete Beta function, 3717 betai. (Adapted from: Numerical Recipies in C.) Can handle multiple 3718 dimensions for x. 3719 3720 Usage: abetacf(a,b,x,verbose=1) 3721 """ 3722 ITMAX = 200 3723 EPS = 3.0e-7 3724 3725 arrayflag = 1 3726 if type(x) == N.ArrayType: 3727 frozen = N.ones(x.shape,N.Float) *-1 #start out w/ -1s, should replace all 3728 else: 3729 arrayflag = 0 3730 frozen = N.array([-1]) 3731 x = N.array([x]) 3732 mask = N.zeros(x.shape) 3733 bm = az = am = 1.0 3734 qab = a+b 3735 qap = a+1.0 3736 qam = a-1.0 3737 bz = 1.0-qab*x/qap 3738 for i in range(ITMAX+1): 3739 if N.sum(N.ravel(N.equal(frozen,-1)))==0: 3740 break 3741 em = float(i+1) 3742 tem = em + em 3743 d = em*(b-em)*x/((qam+tem)*(a+tem)) 3744 ap = az + d*am 3745 bp = bz+d*bm 3746 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) 3747 app = ap+d*az 3748 bpp = bp+d*bz 3749 aold = az*1 3750 am = ap/bpp 3751 bm = bp/bpp 3752 az = app/bpp 3753 bz = 1.0 3754 newmask = N.less(abs(az-aold),EPS*abs(az)) 3755 frozen = N.where(newmask*N.equal(mask,0), az, frozen) 3756 mask = N.clip(mask+newmask,0,1) 3757 noconverge = asum(N.equal(frozen,-1)) 3758 if noconverge <> 0 and verbose: 3759 print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements' 3760 if arrayflag: 3761 return frozen 3762 else: 3763 return frozen[0]
3764 3765
3766 - def agammln(xx):
3767 """ 3768 Returns the gamma function of xx.:: 3769 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. 3770 Adapted from: Numerical Recipies in C. Can handle multiple dims ... but 3771 probably doesn't normally have to. 3772 3773 Usage: agammln(xx) 3774 """ 3775 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 3776 0.120858003e-2, -0.536382e-5] 3777 x = xx - 1.0 3778 tmp = x + 5.5 3779 tmp = tmp - (x+0.5)*N.log(tmp) 3780 ser = 1.0 3781 for j in range(len(coeff)): 3782 x = x + 1 3783 ser = ser + coeff[j]/x 3784 return -tmp + N.log(2.50662827465*ser)
3785 3786
3787 - def abetai(a,b,x,verbose=1):
3788 """ 3789 Returns the incomplete beta function:: 3790 3791 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) 3792 3793 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma 3794 function of a. The continued fraction formulation is implemented 3795 here, using the betacf function. (Adapted from: Numerical Recipies in 3796 C.) Can handle multiple dimensions. 3797 3798 Usage: abetai(a,b,x,verbose=1) 3799 """ 3800 TINY = 1e-15 3801 if type(a) == N.ArrayType: 3802 if asum(N.less(x,0)+N.greater(x,1)) <> 0: 3803 raise ValueError, 'Bad x in abetai' 3804 x = N.where(N.equal(x,0),TINY,x) 3805 x = N.where(N.equal(x,1.0),1-TINY,x) 3806 3807 bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1) 3808 exponents = ( gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b* 3809 N.log(1.0-x) ) 3810 # 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW 3811 exponents = N.where(N.less(exponents,-740),-740,exponents) 3812 bt = N.exp(exponents) 3813 if type(x) == N.ArrayType: 3814 ans = N.where(N.less(x,(a+1)/(a+b+2.0)), 3815 bt*abetacf(a,b,x,verbose)/float(a), 3816 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)) 3817 else: 3818 if x<(a+1)/(a+b+2.0): 3819 ans = bt*abetacf(a,b,x,verbose)/float(a) 3820 else: 3821 ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b) 3822 return ans
3823 3824 3825 ##################################### 3826 ####### AANOVA CALCULATIONS ####### 3827 ##################################### 3828 3829 import LinearAlgebra, operator 3830 LA = LinearAlgebra 3831
3832 - def aglm(data,para):
3833 """ 3834 Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken 3835 from:: 3836 Peterson et al. Statistical limitations in functional neuroimaging 3837 I. Non-inferential methods and statistical models. Phil Trans Royal Soc 3838 Lond B 354: 1239-1260. 3839 3840 Usage: aglm(data,para) 3841 Returns: statistic, p-value ??? 3842 """ 3843 if len(para) <> len(data): 3844 print "data and para must be same length in aglm" 3845 return 3846 n = len(para) 3847 p = pstat.aunique(para) 3848 x = N.zeros((n,len(p))) # design matrix 3849 for l in range(len(p)): 3850 x[:,l] = N.equal(para,p[l]) 3851 b = N.dot(N.dot(LA.inverse(N.dot(N.transpose(x),x)), # i.e., b=inv(X'X)X'Y 3852 N.transpose(x)), 3853 data) 3854 diffs = (data - N.dot(x,b)) 3855 s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs) 3856 3857 if len(p) == 2: # ttest_ind 3858 c = N.array([1,-1]) 3859 df = n-2 3860 fact = asum(1.0/asum(x,0)) # i.e., 1/n1 + 1/n2 + 1/n3 ... 3861 t = N.dot(c,b) / N.sqrt(s_sq*fact) 3862 probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 3863 return t, probs
3864 3865
3866 - def aF_oneway(*args):
3867 """ 3868 Performs a 1-way ANOVA, returning an F-value and probability given 3869 any number of groups. From Heiman, pp.394-7. 3870 3871 Usage: aF_oneway (*args) where *args is 2 or more arrays, one per treatment group 3872 Returns: f-value, probability 3873 """ 3874 na = len(args) # ANOVA on 'na' groups, each in it's own array 3875 means = [0]*na 3876 vars = [0]*na 3877 ns = [0]*na 3878 alldata = [] 3879 tmp = map(N.array,args) 3880 means = map(amean,tmp) 3881 vars = map(avar,tmp) 3882 ns = map(len,args) 3883 alldata = N.concatenate(args) 3884 bign = len(alldata) 3885 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) 3886 ssbn = 0 3887 for a in args: 3888 ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a)) 3889 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) 3890 sswn = sstot-ssbn 3891 dfbn = na-1 3892 dfwn = bign - na 3893 msb = ssbn/float(dfbn) 3894 msw = sswn/float(dfwn) 3895 f = msb/msw 3896 prob = fprob(dfbn,dfwn,f) 3897 return f, prob
3898 3899
3900 - def aF_value (ER,EF,dfR,dfF):
3901 """ 3902 Returns an F-statistic given the following:: 3903 ER = error associated with the null hypothesis (the Restricted model) 3904 EF = error associated with the alternate hypothesis (the Full model) 3905 dfR = degrees of freedom the Restricted model 3906 dfF = degrees of freedom associated with the Restricted model 3907 """ 3908 return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF)))
3909 3910
3911 - def outputfstats(Enum, Eden, dfnum, dfden, f, prob):
3912 Enum = round(Enum,3) 3913 Eden = round(Eden,3) 3914 dfnum = round(Enum,3) 3915 dfden = round(dfden,3) 3916 f = round(f,3) 3917 prob = round(prob,3) 3918 suffix = '' # for *s after the p-value 3919 if prob < 0.001: suffix = ' ***' 3920 elif prob < 0.01: suffix = ' **' 3921 elif prob < 0.05: suffix = ' *' 3922 title = [['EF/ER','DF','Mean Square','F-value','prob','']] 3923 lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix], 3924 [Eden, dfden, round(Eden/float(dfden),3),'','','']] 3925 pstat.printcc(lofl) 3926 return
3927 3928
3929 - def F_value_multivariate(ER, EF, dfnum, dfden):
3930 """ 3931 Returns an F-statistic given the following:: 3932 ER = error associated with the null hypothesis (the Restricted model) 3933 EF = error associated with the alternate hypothesis (the Full model) 3934 dfR = degrees of freedom the Restricted model 3935 dfF = degrees of freedom associated with the Restricted model 3936 where ER and EF are matrices from a multivariate F calculation. 3937 """ 3938 if type(ER) in [IntType, FloatType]: 3939 ER = N.array([[ER]]) 3940 if type(EF) in [IntType, FloatType]: 3941 EF = N.array([[EF]]) 3942 n_um = (LA.determinant(ER) - LA.determinant(EF)) / float(dfnum) 3943 d_en = LA.determinant(EF) / float(dfden) 3944 return n_um / d_en
3945 3946 3947 ##################################### 3948 ####### ASUPPORT FUNCTIONS ######## 3949 ##################################### 3950
3951 - def asign(a):
3952 """ 3953 Usage: asign(a) 3954 Returns: array shape of a, with -1 where a<0 and +1 where a>=0 3955 """ 3956 a = N.asarray(a) 3957 if ((type(a) == type(1.4)) or (type(a) == type(1))): 3958 return a-a-N.less(a,0)+N.greater(a,0) 3959 else: 3960 return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0)
3961 3962
3963 - def asum (a, dimension=None,keepdims=0):
3964 """ 3965 An alternative to the Numeric.add.reduce function, which allows one to 3966 (1) collapse over multiple dimensions at once, and/or (2) to retain 3967 all dimensions in the original array (squashing one down to size. 3968 Dimension can equal None (ravel array first), an integer (the 3969 dimension over which to operate), or a sequence (operate over multiple 3970 dimensions). If keepdims=1, the resulting array will have as many 3971 dimensions as the input array. 3972 3973 Usage: asum(a, dimension=None, keepdims=0) 3974 Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1 3975 """ 3976 if type(a) == N.ArrayType and a.typecode() in ['l','s','b']: 3977 a = a.astype(N.Float) 3978 if dimension == None: 3979 s = N.sum(N.ravel(a)) 3980 elif type(dimension) in [IntType,FloatType]: 3981 s = N.add.reduce(a, dimension) 3982 if keepdims == 1: 3983 shp = list(a.shape) 3984 shp[dimension] = 1 3985 s = N.reshape(s,shp) 3986 else: # must be a SEQUENCE of dims to sum over 3987 dims = list(dimension) 3988 dims.sort() 3989 dims.reverse() 3990 s = a *1.0 3991 for dim in dims: 3992 s = N.add.reduce(s,dim) 3993 if keepdims == 1: 3994 shp = list(a.shape) 3995 for dim in dims: 3996 shp[dim] = 1 3997 s = N.reshape(s,shp) 3998 return s
3999 4000
4001 - def acumsum (a,dimension=None):
4002 """ 4003 Returns an array consisting of the cumulative sum of the items in the 4004 passed array. Dimension can equal None (ravel array first), an 4005 integer (the dimension over which to operate), or a sequence (operate 4006 over multiple dimensions, but this last one just barely makes sense). 4007 4008 Usage: acumsum(a,dimension=None) 4009 """ 4010 if dimension == None: 4011 a = N.ravel(a) 4012 dimension = 0 4013 if type(dimension) in [ListType, TupleType, N.ArrayType]: 4014 dimension = list(dimension) 4015 dimension.sort() 4016 dimension.reverse() 4017 for d in dimension: 4018 a = N.add.accumulate(a,d) 4019 return a 4020 else: 4021 return N.add.accumulate(a,dimension)
4022 4023
4024 - def ass(inarray, dimension=None, keepdims=0):
4025 """ 4026 Squares each value in the passed array, adds these squares & returns 4027 the result. Unfortunate function name. :-) Defaults to ALL values in 4028 the array. Dimension can equal None (ravel array first), an integer 4029 (the dimension over which to operate), or a sequence (operate over 4030 multiple dimensions). Set keepdims=1 to maintain the original number 4031 of dimensions. 4032 4033 Usage: ass(inarray, dimension=None, keepdims=0) 4034 Returns: sum-along-'dimension' for (inarray*inarray) 4035 """ 4036 if dimension == None: 4037 inarray = N.ravel(inarray) 4038 dimension = 0 4039 return asum(inarray*inarray,dimension,keepdims)
4040 4041
4042 - def asummult (array1,array2,dimension=None,keepdims=0):
4043 """ 4044 Multiplies elements in array1 and array2, element by element, and 4045 returns the sum (along 'dimension') of all resulting multiplications. 4046 Dimension can equal None (ravel array first), an integer (the 4047 dimension over which to operate), or a sequence (operate over multiple 4048 dimensions). A trivial function, but included for completeness. 4049 4050 Usage: asummult(array1,array2,dimension=None,keepdims=0) 4051 """ 4052 if dimension == None: 4053 array1 = N.ravel(array1) 4054 array2 = N.ravel(array2) 4055 dimension = 0 4056 return asum(array1*array2,dimension,keepdims)
4057 4058
4059 - def asquare_of_sums(inarray, dimension=None, keepdims=0):
4060 """ 4061 Adds the values in the passed array, squares that sum, and returns the 4062 result. Dimension can equal None (ravel array first), an integer (the 4063 dimension over which to operate), or a sequence (operate over multiple 4064 dimensions). If keepdims=1, the returned array will have the same 4065 NUMBER of dimensions as the original. 4066 4067 Usage: asquare_of_sums(inarray, dimension=None, keepdims=0) 4068 Returns: the square of the sum over dim(s) in dimension 4069 """ 4070 if dimension == None: 4071 inarray = N.ravel(inarray) 4072 dimension = 0 4073 s = asum(inarray,dimension,keepdims) 4074 if type(s) == N.ArrayType: 4075 return s.astype(N.Float)*s 4076 else: 4077 return float(s)*s
4078 4079
4080 - def asumdiffsquared(a,b, dimension=None, keepdims=0):
4081 """ 4082 Takes pairwise differences of the values in arrays a and b, squares 4083 these differences, and returns the sum of these squares. Dimension 4084 can equal None (ravel array first), an integer (the dimension over 4085 which to operate), or a sequence (operate over multiple dimensions). 4086 keepdims=1 means the return shape = len(a.shape) = len(b.shape) 4087 4088 Usage: asumdiffsquared(a,b) 4089 Returns: sum[ravel(a-b)**2] 4090 """ 4091 if dimension == None: 4092 inarray = N.ravel(a) 4093 dimension = 0 4094 return asum((a-b)**2,dimension,keepdims)
4095 4096
4097 - def ashellsort(inarray):
4098 """ 4099 Shellsort algorithm. Sorts a 1D-array. 4100 4101 Usage: ashellsort(inarray) 4102 Returns: sorted-inarray, sorting-index-vector (for original array) 4103 """ 4104 n = len(inarray) 4105 svec = inarray *1.0 4106 ivec = range(n) 4107 gap = n/2 # integer division needed 4108 while gap >0: 4109 for i in range(gap,n): 4110 for j in range(i-gap,-1,-gap): 4111 while j>=0 and svec[j]>svec[j+gap]: 4112 temp = svec[j] 4113 svec[j] = svec[j+gap] 4114 svec[j+gap] = temp 4115 itemp = ivec[j] 4116 ivec[j] = ivec[j+gap] 4117 ivec[j+gap] = itemp 4118 gap = gap / 2 # integer division needed 4119 # svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]] 4120 return svec, ivec
4121 4122
4123 - def arankdata(inarray):
4124 """ 4125 Ranks the data in inarray, dealing with ties appropritely. Assumes 4126 a 1D inarray. Adapted from Gary Perlman's |Stat ranksort. 4127 4128 Usage: arankdata(inarray) 4129 Returns: array of length equal to inarray, containing rank scores 4130 """ 4131 n = len(inarray) 4132 svec, ivec = ashellsort(inarray) 4133 sumranks = 0 4134 dupcount = 0 4135 newarray = N.zeros(n,N.Float) 4136 for i in range(n): 4137 sumranks = sumranks + i 4138 dupcount = dupcount + 1 4139 if i==n-1 or svec[i] <> svec[i+1]: 4140 averank = sumranks / float(dupcount) + 1 4141 for j in range(i-dupcount+1,i+1): 4142 newarray[ivec[j]] = averank 4143 sumranks = 0 4144 dupcount = 0 4145 return newarray
4146 4147
4148 - def afindwithin(data):
4149 """ 4150 Returns a binary vector, 1=within-subject factor, 0=between. Input 4151 equals the entire data array (i.e., column 1=random factor, last 4152 column = measured values. 4153 4154 Usage: afindwithin(data) data in |Stat format 4155 """ 4156 numfact = len(data[0])-2 4157 withinvec = [0]*numfact 4158 for col in range(1,numfact+1): 4159 rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0]) # get 1 level of this factor 4160 if len(pstat.unique(pstat.colex(rows,0))) < len(rows): # if fewer subjects than scores on this factor 4161 withinvec[col-1] = 1 4162 return withinvec
4163 4164 4165 ######################################################### 4166 ######################################################### 4167 ###### RE-DEFINE DISPATCHES TO INCLUDE ARRAYS ######### 4168 ######################################################### 4169 ######################################################### 4170 4171 ## CENTRAL TENDENCY: 4172 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), 4173 (ageometricmean, (N.ArrayType,)) ) 4174 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), 4175 (aharmonicmean, (N.ArrayType,)) ) 4176 mean = Dispatch ( (lmean, (ListType, TupleType)), 4177 (amean, (N.ArrayType,)) ) 4178 median = Dispatch ( (lmedian, (ListType, TupleType)), 4179 (amedian, (N.ArrayType,)) ) 4180 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), 4181 (amedianscore, (N.ArrayType,)) ) 4182 mode = Dispatch ( (lmode, (ListType, TupleType)), 4183 (amode, (N.ArrayType,)) ) 4184 tmean = Dispatch ( (atmean, (N.ArrayType,)) ) 4185 tvar = Dispatch ( (atvar, (N.ArrayType,)) ) 4186 tstdev = Dispatch ( (atstdev, (N.ArrayType,)) ) 4187 tsem = Dispatch ( (atsem, (N.ArrayType,)) ) 4188 4189 ## VARIATION: 4190 moment = Dispatch ( (lmoment, (ListType, TupleType)), 4191 (amoment, (N.ArrayType,)) ) 4192 variation = Dispatch ( (lvariation, (ListType, TupleType)), 4193 (avariation, (N.ArrayType,)) ) 4194 skew = Dispatch ( (lskew, (ListType, TupleType)), 4195 (askew, (N.ArrayType,)) ) 4196 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), 4197 (akurtosis, (N.ArrayType,)) ) 4198 describe = Dispatch ( (ldescribe, (ListType, TupleType)), 4199 (adescribe, (N.ArrayType,)) ) 4200 4201 ## DISTRIBUTION TESTS 4202 4203 skewtest = Dispatch ( (askewtest, (ListType, TupleType)), 4204 (askewtest, (N.ArrayType,)) ) 4205 kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)), 4206 (akurtosistest, (N.ArrayType,)) ) 4207 normaltest = Dispatch ( (anormaltest, (ListType, TupleType)), 4208 (anormaltest, (N.ArrayType,)) ) 4209 4210 ## FREQUENCY STATS: 4211 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), 4212 (aitemfreq, (N.ArrayType,)) ) 4213 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), 4214 (ascoreatpercentile, (N.ArrayType,)) ) 4215 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), 4216 (apercentileofscore, (N.ArrayType,)) ) 4217 histogram = Dispatch ( (lhistogram, (ListType, TupleType)), 4218 (ahistogram, (N.ArrayType,)) ) 4219 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), 4220 (acumfreq, (N.ArrayType,)) ) 4221 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), 4222 (arelfreq, (N.ArrayType,)) ) 4223 4224 ## VARIABILITY: 4225 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), 4226 (aobrientransform, (N.ArrayType,)) ) 4227 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), 4228 (asamplevar, (N.ArrayType,)) ) 4229 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), 4230 (asamplestdev, (N.ArrayType,)) ) 4231 signaltonoise = Dispatch( (asignaltonoise, (N.ArrayType,)),) 4232 var = Dispatch ( (lvar, (ListType, TupleType)), 4233 (avar, (N.ArrayType,)) ) 4234 stdev = Dispatch ( (lstdev, (ListType, TupleType)), 4235 (astdev, (N.ArrayType,)) ) 4236 sterr = Dispatch ( (lsterr, (ListType, TupleType)), 4237 (asterr, (N.ArrayType,)) ) 4238 sem = Dispatch ( (lsem, (ListType, TupleType)), 4239 (asem, (N.ArrayType,)) ) 4240 z = Dispatch ( (lz, (ListType, TupleType)), 4241 (az, (N.ArrayType,)) ) 4242 zs = Dispatch ( (lzs, (ListType, TupleType)), 4243 (azs, (N.ArrayType,)) ) 4244 4245 ## TRIMMING FCNS: 4246 threshold = Dispatch( (athreshold, (N.ArrayType,)),) 4247 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), 4248 (atrimboth, (N.ArrayType,)) ) 4249 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), 4250 (atrim1, (N.ArrayType,)) ) 4251 4252 ## CORRELATION FCNS: 4253 paired = Dispatch ( (lpaired, (ListType, TupleType)), 4254 (apaired, (N.ArrayType,)) ) 4255 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), 4256 (apearsonr, (N.ArrayType,)) ) 4257 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), 4258 (aspearmanr, (N.ArrayType,)) ) 4259 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), 4260 (apointbiserialr, (N.ArrayType,)) ) 4261 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), 4262 (akendalltau, (N.ArrayType,)) ) 4263 linregress = Dispatch ( (llinregress, (ListType, TupleType)), 4264 (alinregress, (N.ArrayType,)) ) 4265 4266 ## INFERENTIAL STATS: 4267 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), 4268 (attest_1samp, (N.ArrayType,)) ) 4269 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), 4270 (attest_ind, (N.ArrayType,)) ) 4271 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), 4272 (attest_rel, (N.ArrayType,)) ) 4273 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), 4274 (achisquare, (N.ArrayType,)) ) 4275 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), 4276 (aks_2samp, (N.ArrayType,)) ) 4277 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), 4278 (amannwhitneyu, (N.ArrayType,)) ) 4279 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), 4280 (atiecorrect, (N.ArrayType,)) ) 4281 ranksums = Dispatch ( (lranksums, (ListType, TupleType)), 4282 (aranksums, (N.ArrayType,)) ) 4283 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), 4284 (awilcoxont, (N.ArrayType,)) ) 4285 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), 4286 (akruskalwallish, (N.ArrayType,)) ) 4287 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), 4288 (afriedmanchisquare, (N.ArrayType,)) ) 4289 4290 ## PROBABILITY CALCS: 4291 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), 4292 (achisqprob, (N.ArrayType,)) ) 4293 zprob = Dispatch ( (lzprob, (IntType, FloatType)), 4294 (azprob, (N.ArrayType,)) ) 4295 ksprob = Dispatch ( (lksprob, (IntType, FloatType)), 4296 (aksprob, (N.ArrayType,)) ) 4297 fprob = Dispatch ( (lfprob, (IntType, FloatType)), 4298 (afprob, (N.ArrayType,)) ) 4299 betacf = Dispatch ( (lbetacf, (IntType, FloatType)), 4300 (abetacf, (N.ArrayType,)) ) 4301 betai = Dispatch ( (lbetai, (IntType, FloatType)), 4302 (abetai, (N.ArrayType,)) ) 4303 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), 4304 (aerfcc, (N.ArrayType,)) ) 4305 gammln = Dispatch ( (lgammln, (IntType, FloatType)), 4306 (agammln, (N.ArrayType,)) ) 4307 4308 ## ANOVA FUNCTIONS: 4309 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), 4310 (aF_oneway, (N.ArrayType,)) ) 4311 F_value = Dispatch ( (lF_value, (ListType, TupleType)), 4312 (aF_value, (N.ArrayType,)) ) 4313 4314 ## SUPPORT FUNCTIONS: 4315 incr = Dispatch ( (lincr, (ListType, TupleType, N.ArrayType)), ) 4316 sum = Dispatch ( (lsum, (ListType, TupleType)), 4317 (asum, (N.ArrayType,)) ) 4318 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), 4319 (acumsum, (N.ArrayType,)) ) 4320 ss = Dispatch ( (lss, (ListType, TupleType)), 4321 (ass, (N.ArrayType,)) ) 4322 summult = Dispatch ( (lsummult, (ListType, TupleType)), 4323 (asummult, (N.ArrayType,)) ) 4324 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), 4325 (asquare_of_sums, (N.ArrayType,)) ) 4326 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), 4327 (asumdiffsquared, (N.ArrayType,)) ) 4328 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), 4329 (ashellsort, (N.ArrayType,)) ) 4330 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), 4331 (arankdata, (N.ArrayType,)) ) 4332 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), 4333 (afindwithin, (N.ArrayType,)) ) 4334 4335 ###################### END OF NUMERIC FUNCTION BLOCK ##################### 4336 4337 ###################### END OF STATISTICAL FUNCTIONS ###################### 4338 4339 except ImportError: 4340 pass 4341