Package Biskit :: Package Statistics :: Module stats
Source Code for Module Biskit.Statistics.stats

   1  # Copyright (c) 1999-2000 Gary Strangman; All Rights Reserved. 
   2  # 
   3  # This software is distributable under the terms of the GNU 
   4  # General Public License (GPL) v2, the text of which can be found at 
   5  # http://www.gnu.org/copyleft/gpl.html. Installing, importing or otherwise 
   6  # using this module constitutes acceptance of the terms of this License. 
   7  # 
   8  # Disclaimer 
   9  #  
  10  # This software is provided "as-is".  There are no expressed or implied 
  11  # warranties of any kind, including, but not limited to, the warranties 
  12  # of merchantability and fittness for a given application.  In no event 
  13  # shall Gary Strangman be liable for any direct, indirect, incidental, 
  14  # special, exemplary or consequential damages (including, but not limited 
  15  # to, loss of use, data or profits, or business interruption) however 
  16  # caused and on any theory of liability, whether in contract, strict 
  17  # liability or tort (including negligence or otherwise) arising in any way 
  18  # out of the use of this software, even if advised of the possibility of 
  19  # such damage. 
  20  # 
  21  # Comments and/or additions are welcome (send e-mail to: 
  22  # strang@nmr.mgh.harvard.edu). 
  23  #  
  24  """ 
  25  stats.py module 
  26   
  27  (Requires pstat.py module.) 
  28   
  29  :: 
  30    ################################################# 
  31    #######  Written by:  Gary Strangman  ########### 
  32    #######  Last modified:  Dec 28, 2000 ########### 
  33    ################################################# 
  34   
  35  A collection of basic statistical functions for python.  The function 
  36  names appear below. 
  37   
  38  IMPORTANT:  There are really *3* sets of functions.  The first set has an 'l' 
  39  prefix, which can be used with list or tuple arguments.  The second set has 
  40  an 'a' prefix, which can accept NumPy array arguments.  These latter 
  41  functions are defined only when NumPy is available on the system.  The third 
  42  type has NO prefix (i.e., has the name that appears below).  Functions of 
  43  this set are members of a "Dispatch" class, c/o David Ascher.  This class 
  44  allows different functions to be called depending on the type of the passed 
  45  arguments.  Thus, stats.mean is a member of the Dispatch class and 
  46  stats.mean(range(20)) will call stats.lmean(range(20)) while 
  47  stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)). 
  48  This is a handy way to keep consistent function names when different 
  49  argument types require different functions to be called.  Having 
  50  implementated the Dispatch class, however, means that to get info on 
  51  a given function, you must use the REAL function name ... that is 
  52  "print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine, 
  53  while "print stats.mean.__doc__" will print the doc for the Dispatch 
  54  class.  NUMPY FUNCTIONS ('a' prefix) generally have more argument options 
  55  but should otherwise be consistent with the corresponding list functions. 
  56   
  57  Disclaimers:  The function list is obviously incomplete and, worse, the 
  58  functions are not optimized.  All functions have been tested (some more 
  59  so than others), but they are far from bulletproof.  Thus, as with any 
  60  free software, no warranty or guarantee is expressed or implied. :-)  A 
  61  few extra functions that don't appear in the list below can be found by 
  62  interested treasure-hunters.  These functions don't necessarily have 
  63  both list and array versions but were deemed useful:: 
  64   
  65      CENTRAL TENDENCY:  geometricmean 
  66                         harmonicmean 
  67                         mean 
  68                         median 
  69                         medianscore 
  70                         mode 
  71   
  72      MOMENTS:  moment 
  73                variation 
  74                skew 
  75                kurtosis 
  76                skewtest   (for Numpy arrays only) 
  77                kurtosistest (for Numpy arrays only) 
  78                normaltest (for Numpy arrays only) 
  79   
  80      ALTERED VERSIONS:  tmean  (for Numpy arrays only) 
  81                         tvar   (for Numpy arrays only) 
  82                         tmin   (for Numpy arrays only) 
  83                         tmax   (for Numpy arrays only) 
  84                         tstdev (for Numpy arrays only) 
  85                         tsem   (for Numpy arrays only) 
  86                         describe 
  87   
  88      FREQUENCY STATS:  itemfreq 
  89                        scoreatpercentile 
  90                        percentileofscore 
  91                        histogram 
  92                        cumfreq 
  93                        relfreq 
  94   
  95      VARIABILITY:  obrientransform 
  96                    samplevar 
  97                    samplestdev 
  98                    signaltonoise (for Numpy arrays only) 
  99                    var 
 100                    stdev 
 101                    sterr 
 102                    sem 
 103                    z 
 104                    zs 
 105                    zmap (for Numpy arrays only) 
 106   
 107      TRIMMING FCNS:  threshold (for Numpy arrays only) 
 108                      trimboth 
 109                      trim1 
 110                      round (round all vals to 'n' decimals; Numpy only) 
 111   
 112      CORRELATION FCNS:  covariance  (for Numpy arrays only) 
 113                         correlation (for Numpy arrays only) 
 114                         paired 
 115                         pearsonr 
 116                         spearmanr 
 117                         pointbiserialr 
 118                         kendalltau 
 119                         linregress 
 120   
 121      INFERENTIAL STATS:  ttest_1samp 
 122                          ttest_ind 
 123                          ttest_rel 
 124                          chisquare 
 125                          ks_2samp 
 126                          mannwhitneyu 
 127                          ranksums 
 128                          wilcoxont 
 129                          kruskalwallish 
 130                          friedmanchisquare 
 131   
 132      PROBABILITY CALCS:  chisqprob 
 133                          erfcc 
 134                          zprob 
 135                          ksprob 
 136                          fprob 
 137                          betacf 
 138                          gammln  
 139                          betai 
 140   
 141      ANOVA FUNCTIONS:  F_oneway 
 142                        F_value 
 143   
 144      SUPPORT FUNCTIONS:  writecc 
 145                          incr 
 146                          sign  (for Numpy arrays only) 
 147                          sum 
 148                          cumsum 
 149                          ss 
 150                          summult 
 151                          sumdiffsquared 
 152                          square_of_sums 
 153                          shellsort 
 154                          rankdata 
 155                          outputpairedstats 
 156                      findwithin 
 157  """ 
 158  ## CHANGE LOG: 
 159  ## =========== 
 160  ## 00-12-28 ... removed aanova() to separate module, fixed licensing to 
 161  ##                   match Python License, fixed doc string & imports 
 162  ## 00-04-13 ... pulled all "global" statements, except from aanova() 
 163  ##              added/fixed lots of documentation, removed io.py dependency 
 164  ##              changed to version 0.5 
 165  ## 99-11-13 ... added asign() function 
 166  ## 99-11-01 ... changed version to 0.4 ... enough incremental changes now 
 167  ## 99-10-25 ... added acovariance and acorrelation functions 
 168  ## 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors 
 169  ##              added aglm function (crude, but will be improved) 
 170  ## 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to 
 171  ##                   all handle lists of 'dimension's and keepdims 
 172  ##              REMOVED ar0, ar2, ar3, ar4 and replaced them with around 
 173  ##              reinserted fixes for abetai to avoid math overflows 
 174  ## 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to 
 175  ##                   handle multi-dimensional arrays (whew!) 
 176  ## 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990) 
 177  ##              added anormaltest per same reference 
 178  ##              re-wrote azprob to calc arrays of probs all at once 
 179  ## 99-08-22 ... edited attest_ind printing section so arrays could be rounded 
 180  ## 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on 
 181  ##                   short/byte arrays (mean of #s btw 100-300 = -150??) 
 182  ## 99-08-09 ... fixed asum so that the None case works for Byte arrays 
 183  ## 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays 
 184  ## 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap) 
 185  ## 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0]) 
 186  ## 04/11/99 ... added asignaltonoise, athreshold functions, changed all 
 187  ##                   max/min in array section to N.maximum/N.minimum, 
 188  ##                   fixed square_of_sums to prevent integer overflow 
 189  ## 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums 
 190  ## 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions 
 191  ## 02/28/99 ... Fixed aobrientransform to return an array rather than a list 
 192  ## 01/15/99 ... Essentially ceased updating list-versions of functions (!!!) 
 193  ## 01/13/99 ... CHANGED TO VERSION 0.3 
 194  ##              fixed bug in a/lmannwhitneyu p-value calculation 
 195  ## 12/31/98 ... fixed variable-name bug in ldescribe 
 196  ## 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix) 
 197  ## 12/16/98 ... changed amedianscore to return float (not array) for 1 score 
 198  ## 12/14/98 ... added atmin and atmax functions 
 199  ##              removed umath from import line (not needed) 
 200  ##              l/ageometricmean modified to reduce chance of overflows (take 
 201  ##                   nth root first, then multiply) 
 202  ## 12/07/98 ... added __version__variable (now 0.2) 
 203  ##              removed all 'stats.' from anova() fcn 
 204  ## 12/06/98 ... changed those functions (except shellsort) that altered 
 205  ##                   arguments in-place ... cumsum, ranksort, ... 
 206  ##              updated (and fixed some) doc-strings 
 207  ## 12/01/98 ... added anova() function (requires NumPy) 
 208  ##              incorporated Dispatch class 
 209  ## 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean 
 210  ##              added 'asum' function (added functionality to N.add.reduce) 
 211  ##              fixed both moment and amoment (two errors) 
 212  ##              changed name of skewness and askewness to skew and askew 
 213  ##              fixed (a)histogram (which sometimes counted points <lowerlimit) 
 214   
 215  import pstat               # required 3rd party module 
 216  import math, string, copy  # required python modules 
 217  from types import * 
 218   
 219  __version__ = 0.5 
 220   
 221  ############# DISPATCH CODE ############## 
 222   
 223   
 224 -class Dispatch: 
 225      """ 
 226  The Dispatch class, care of David Ascher, allows different functions to 
 227  be called depending on the argument types.  This way, there can be one 
 228  function name regardless of the argument type.  To access function doc 
 229  in stats.py module, prefix the function with an 'l' or 'a' for list or 
 230  array arguments, respectively.  That is, print stats.lmean.__doc__ or 
 231  print stats.amean.__doc__ or whatever. 
 232  """ 
 233   
 234 -    def __init__(self, *tuples): 
 235          self._dispatch = {} 
 236          for func, types in tuples: 
 237              for t in types: 
 238                  if t in self._dispatch.keys(): 
 239                      raise ValueError, "can't have two dispatches on "+str(t) 
 240                  self._dispatch[t] = func 
 241          self._types = self._dispatch.keys() 
 242   
 243 -    def __call__(self, arg1, *args, **kw): 
 244          if type(arg1) not in self._types: 
 245              raise TypeError, "don't know how to dispatch %s arguments" %  type(arg1) 
 246          return apply(self._dispatch[type(arg1)], (arg1,) + args, kw) 
 247   
 248   
 249  ########################################################################## 
 250  ########################   LIST-BASED FUNCTIONS   ######################## 
 251  ########################################################################## 
 252   
 253  ### Define these regardless 
 254   
 255  #################################### 
 256  #######  CENTRAL TENDENCY  ######### 
 257  #################################### 
 258   
 259 -def lgeometricmean (inlist): 
 260      """ 
 261  Calculates the geometric mean of the values in the passed list. 
 262  That is:  n-th root of (x1 * x2 * ... * xn).  Assumes a '1D' list. 
 263   
 264  Usage:   lgeometricmean(inlist) 
 265  """ 
 266      mult = 1.0 
 267      one_over_n = 1.0/len(inlist) 
 268      for item in inlist: 
 269          mult = mult * pow(item,one_over_n) 
 270      return mult 
 271   
 272   
 273 -def lharmonicmean (inlist): 
 274      """ 
 275  Calculates the harmonic mean of the values in the passed list. 
 276  That is:  n / (1/x1 + 1/x2 + ... + 1/xn).  Assumes a '1D' list. 
 277   
 278  Usage:   lharmonicmean(inlist) 
 279  """ 
 280      sum = 0 
 281      for item in inlist: 
 282          sum = sum + 1.0/item 
 283      return len(inlist) / sum 
 284   
 285   
 286 -def lmean (inlist): 
 287      """ 
 288  Returns the arithematic mean of the values in the passed list. 
 289  Assumes a '1D' list, but will function on the 1st dim of an array(!). 
 290   
 291  Usage:   lmean(inlist) 
 292  """ 
 293      sum = 0 
 294      for item in inlist: 
 295          sum = sum + item 
 296      return sum/float(len(inlist)) 
 297   
 298   
 299 -def lmedian (inlist,numbins=1000): 
 300      """ 
 301  Returns the computed median value of a list of numbers, given the 
 302  number of bins to use for the histogram (more bins brings the computed value 
 303  closer to the median score, default number of bins = 1000).  See G.W. 
 304  Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics. 
 305   
 306  Usage:   lmedian (inlist, numbins=1000) 
 307  """ 
 308      (hist, smallest, binsize, extras) = histogram(inlist,numbins) # make histog 
 309      cumhist = cumsum(hist)              # make cumulative histogram 
 310      for i in range(len(cumhist)):        # get 1st(!) index holding 50%ile score 
 311          if cumhist[i]>=len(inlist)/2.0: 
 312              cfbin = i 
 313              break 
 314      LRL = smallest + binsize*cfbin        # get lower read limit of that bin 
 315      cfbelow = cumhist[cfbin-1] 
 316      freq = float(hist[cfbin])                # frequency IN the 50%ile bin 
 317      median = LRL + ((len(inlist)/2.0 - cfbelow)/float(freq))*binsize  # median formula 
 318      return median 
 319   
 320   
 321 -def lmedianscore (inlist): 
 322      """ 
 323  Returns the 'middle' score of the passed list.  If there is an even 
 324  number of scores, the mean of the 2 middle scores is returned. 
 325   
 326  Usage:   lmedianscore(inlist) 
 327  """ 
 328   
 329      newlist = copy.deepcopy(inlist) 
 330      newlist.sort() 
 331      if len(newlist) % 2 == 0:   # if even number of scores, average middle 2 
 332          index = len(newlist)/2  # integer division correct 
 333          median = float(newlist[index] + newlist[index-1]) /2 
 334      else: 
 335          index = len(newlist)/2  # int divsion gives mid value when count from 0 
 336          median = newlist[index] 
 337      return median 
 338   
 339   
 340 -def lmode(inlist): 
 341      """ 
 342  Returns a list of the modal (most common) score(s) in the passed 
 343  list.  If there is more than one such score, all are returned.  The 
 344  bin-count for the mode(s) is also returned. 
 345   
 346  Usage:   lmode(inlist) 
 347  Returns: bin-count for mode(s), a list of modal value(s) 
 348  """ 
 349   
 350      scores = pstat.unique(inlist) 
 351      scores.sort() 
 352      freq = [] 
 353      for item in scores: 
 354          freq.append(inlist.count(item)) 
 355      maxfreq = max(freq) 
 356      mode = [] 
 357      stillmore = 1 
 358      while stillmore: 
 359          try: 
 360              indx = freq.index(maxfreq) 
 361              mode.append(scores[indx]) 
 362              del freq[indx] 
 363              del scores[indx] 
 364          except ValueError: 
 365              stillmore=0 
 366      return maxfreq, mode 
 367   
 368   
 369  #################################### 
 370  ############  MOMENTS  ############# 
 371  #################################### 
 372   
 373 -def lmoment(inlist,moment=1): 
 374      """ 
 375  Calculates the nth moment about the mean for a sample (defaults to 
 376  the 1st moment).  Used to calculate coefficients of skewness and kurtosis. 
 377   
 378  Usage:   lmoment(inlist,moment=1) 
 379  Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r) 
 380  """ 
 381      if moment == 1: 
 382          return 0.0 
 383      else: 
 384          mn = mean(inlist) 
 385          n = len(inlist) 
 386          s = 0 
 387          for x in inlist: 
 388              s = s + (x-mn)**moment 
 389          return s/float(n) 
 390   
 391   
 392 -def lvariation(inlist): 
 393      """ 
 394  Returns the coefficient of variation, as defined in CRC Standard 
 395  Probability and Statistics, p.6. 
 396   
 397  Usage:   lvariation(inlist) 
 398  """ 
 399      return 100.0*samplestdev(inlist)/float(mean(inlist)) 
 400   
 401   
 402 -def lskew(inlist): 
 403      """ 
 404  Returns the skewness of a distribution, as defined in Numerical 
 405  Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) 
 406   
 407  Usage:   lskew(inlist) 
 408  """ 
 409      return moment(inlist,3)/pow(moment(inlist,2),1.5) 
 410   
 411   
 412 -def lkurtosis(inlist): 
 413      """ 
 414  Returns the kurtosis of a distribution, as defined in Numerical 
 415  Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) 
 416   
 417  Usage:   lkurtosis(inlist) 
 418  """ 
 419      return moment(inlist,4)/pow(moment(inlist,2),2.0) 
 420   
 421   
 422 -def ldescribe(inlist): 
 423      """ 
 424  Returns some descriptive statistics of the passed list (assumed to be 1D). 
 425   
 426  Usage:   ldescribe(inlist) 
 427  Returns: n, mean, standard deviation, skew, kurtosis 
 428  """ 
 429      n = len(inlist) 
 430      mm = (min(inlist),max(inlist)) 
 431      m = mean(inlist) 
 432      sd = stdev(inlist) 
 433      sk = skew(inlist) 
 434      kurt = kurtosis(inlist) 
 435      return n, mm, m, sd, sk, kurt 
 436   
 437   
 438  #################################### 
 439  #######  FREQUENCY STATS  ########## 
 440  #################################### 
 441   
 442 -def litemfreq(inlist): 
 443      """ 
 444  Returns a list of pairs.  Each pair consists of one of the scores in inlist 
 445  and it's frequency count.  Assumes a 1D list is passed. 
 446   
 447  Usage:   litemfreq(inlist) 
 448  Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) 
 449  """ 
 450      scores = pstat.unique(inlist) 
 451      scores.sort() 
 452      freq = [] 
 453      for item in scores: 
 454          freq.append(inlist.count(item)) 
 455      return pstat.abut(scores, freq) 
 456   
 457   
 458 -def lscoreatpercentile (inlist, percent): 
 459      """ 
 460  Returns the score at a given percentile relative to the distribution 
 461  given by inlist. 
 462   
 463  Usage:   lscoreatpercentile(inlist,percent) 
 464  """ 
 465      if percent > 1: 
 466          print "\nDividing percent>1 by 100 in lscoreatpercentile().\n" 
 467          percent = percent / 100.0 
 468      targetcf = percent*len(inlist) 
 469      h, lrl, binsize, extras = histogram(inlist) 
 470      cumhist = cumsum(copy.deepcopy(h)) 
 471      for i in range(len(cumhist)): 
 472          if cumhist[i] >= targetcf: 
 473              break 
 474      score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) 
 475      return score 
 476   
 477   
 478 -def lpercentileofscore (inlist, score,histbins=10,defaultlimits=None): 
 479      """ 
 480  Returns the percentile value of a score relative to the distribution 
 481  given by inlist.  Formula depends on the values used to histogram the data(!). 
 482   
 483  Usage:   lpercentileofscore(inlist,score,histbins=10,defaultlimits=None) 
 484  """ 
 485   
 486      h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits) 
 487      cumhist = cumsum(copy.deepcopy(h)) 
 488      i = int((score - lrl)/float(binsize)) 
 489      pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100 
 490      return pct 
 491   
 492   
 493 -def lhistogram (inlist,numbins=10,defaultreallimits=None,printextras=0): 
 494      """ 
 495  Returns (i) a list of histogram bin counts, (ii) the smallest value 
 496  of the histogram binning, and (iii) the bin width (the last 2 are not 
 497  necessarily integers).  Default number of bins is 10.  If no sequence object 
 498  is given for defaultreallimits, the routine picks (usually non-pretty) bins 
 499  spanning all the numbers in the inlist. 
 500   
 501  Usage:   lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0) 
 502  Returns: list of bin values, lowerreallimit, binsize, extrapoints 
 503  """ 
 504      if (defaultreallimits <> None): 
 505          if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1: # only one limit given, assumed to be lower one & upper is calc'd 
 506              lowerreallimit = defaultreallimits 
 507              upperreallimit = 1.0001 * max(inlist) 
 508          else: # assume both limits given 
 509              lowerreallimit = defaultreallimits[0] 
 510              upperreallimit = defaultreallimits[1] 
 511          binsize = (upperreallimit-lowerreallimit)/float(numbins) 
 512      else:     # no limits given for histogram, both must be calc'd 
 513          estbinwidth=(max(inlist)-min(inlist))/float(numbins) + 1 # 1=>cover all 
 514          binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins) 
 515          lowerreallimit = min(inlist) - binsize/2 #lower real limit,1st bin 
 516      bins = [0]*(numbins) 
 517      extrapoints = 0 
 518      for num in inlist: 
 519          try: 
 520              if (num-lowerreallimit) < 0: 
 521                  extrapoints = extrapoints + 1 
 522              else: 
 523                  bintoincrement = int((num-lowerreallimit)/float(binsize)) 
 524                  bins[bintoincrement] = bins[bintoincrement] + 1 
 525          except: 
 526              extrapoints = extrapoints + 1 
 527      if (extrapoints > 0 and printextras == 1): 
 528          print '\nPoints outside given histogram range =',extrapoints 
 529      return (bins, lowerreallimit, binsize, extrapoints) 
 530   
 531   
 532 -def lcumfreq(inlist,numbins=10,defaultreallimits=None): 
 533      """ 
 534  Returns a cumulative frequency histogram, using the histogram function. 
 535   
 536  Usage:   lcumfreq(inlist,numbins=10,defaultreallimits=None) 
 537  Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints 
 538  """ 
 539      h,l,b,e = histogram(inlist,numbins,defaultreallimits) 
 540      cumhist = cumsum(copy.deepcopy(h)) 
 541      return cumhist,l,b,e 
 542   
 543   
 544 -def lrelfreq(inlist,numbins=10,defaultreallimits=None): 
 545      """ 
 546  Returns a relative frequency histogram, using the histogram function. 
 547   
 548  Usage:   lrelfreq(inlist,numbins=10,defaultreallimits=None) 
 549  Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints 
 550  """ 
 551      h,l,b,e = histogram(inlist,numbins,defaultreallimits) 
 552      for i in range(len(h)): 
 553          h[i] = h[i]/float(len(inlist)) 
 554      return h,l,b,e 
 555   
 556   
 557  #################################### 
 558  #####  VARIABILITY FUNCTIONS  ###### 
 559  #################################### 
 560   
 561 -def lobrientransform(*args): 
 562      """ 
 563  Computes a transform on input data (any number of columns).  Used to 
 564  test for homogeneity of variance prior to running one-way stats.  From 
 565  Maxwell and Delaney, p.112. 
 566   
 567  Usage:   lobrientransform(*args) 
 568  Returns: transformed data for use in an ANOVA 
 569  """ 
 570      TINY = 1e-10 
 571      k = len(args) 
 572      n = [0.0]*k 
 573      v = [0.0]*k 
 574      m = [0.0]*k 
 575      nargs = [] 
 576      for i in range(k): 
 577          nargs.append(copy.deepcopy(args[i])) 
 578          n[i] = float(len(nargs[i])) 
 579          v[i] = var(nargs[i]) 
 580          m[i] = mean(nargs[i]) 
 581      for j in range(k): 
 582          for i in range(n[j]): 
 583              t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 
 584              t2 = 0.5*v[j]*(n[j]-1.0) 
 585              t3 = (n[j]-1.0)*(n[j]-2.0) 
 586              nargs[j][i] = (t1-t2) / float(t3) 
 587      check = 1 
 588      for j in range(k): 
 589          if v[j] - mean(nargs[j]) > TINY: 
 590              check = 0 
 591      if check <> 1: 
 592          raise ValueError, 'Problem in obrientransform.' 
 593      else: 
 594          return nargs 
 595   
 596   
 597 -def lsamplevar (inlist): 
 598      """ 
 599  Returns the variance of the values in the passed list using 
 600  N for the denominator (i.e., DESCRIBES the sample variance only). 
 601   
 602  Usage:   lsamplevar(inlist) 
 603  """ 
 604      n = len(inlist) 
 605      mn = mean(inlist) 
 606      deviations = [] 
 607      for item in inlist: 
 608          deviations.append(item-mn) 
 609      return ss(deviations)/float(n) 
 610   
 611   
 612 -def lsamplestdev (inlist): 
 613      """ 
 614  Returns the standard deviation of the values in the passed list using 
 615  N for the denominator (i.e., DESCRIBES the sample stdev only). 
 616   
 617  Usage:   lsamplestdev(inlist) 
 618  """ 
 619      return math.sqrt(samplevar(inlist)) 
 620   
 621   
 622 -def lvar (inlist): 
 623      """ 
 624  Returns the variance of the values in the passed list using N-1 
 625  for the denominator (i.e., for estimating population variance). 
 626   
 627  Usage:   lvar(inlist) 
 628  """ 
 629      n = len(inlist) 
 630      mn = mean(inlist) 
 631      deviations = [0]*len(inlist) 
 632      for i in range(len(inlist)): 
 633          deviations[i] = inlist[i] - mn 
 634      return ss(deviations)/float(n-1) 
 635   
 636   
 637 -def lstdev (inlist): 
 638      """ 
 639  Returns the standard deviation of the values in the passed list 
 640  using N-1 in the denominator (i.e., to estimate population stdev). 
 641   
 642  Usage:   lstdev(inlist) 
 643  """ 
 644      return math.sqrt(var(inlist)) 
 645   
 646   
 647 -def lsterr(inlist): 
 648      """ 
 649  Returns the standard error of the values in the passed list using N-1 
 650  in the denominator (i.e., to estimate population standard error). 
 651   
 652  Usage:   lsterr(inlist) 
 653  """ 
 654      return stdev(inlist) / float(math.sqrt(len(inlist))) 
 655   
 656   
 657 -def lsem (inlist): 
 658      """ 
 659  Returns the estimated standard error of the mean (sx-bar) of the 
 660  values in the passed list.  sem = stdev / sqrt(n) 
 661   
 662  Usage:   lsem(inlist) 
 663  """ 
 664      sd = stdev(inlist) 
 665      n = len(inlist) 
 666      return sd/math.sqrt(n) 
 667   
 668   
 669 -def lz (inlist, score): 
 670      """ 
 671  Returns the z-score for a given input score, given that score and the 
 672  list from which that score came.  Not appropriate for population calculations. 
 673   
 674  Usage:   lz(inlist, score) 
 675  """ 
 676      z = (score-mean(inlist))/samplestdev(inlist) 
 677      return z 
 678   
 679   
 680 -def lzs (inlist): 
 681      """ 
 682  Returns a list of z-scores, one for each score in the passed list. 
 683   
 684  Usage:   lzs(inlist) 
 685  """ 
 686      zscores = [] 
 687      for item in inlist: 
 688          zscores.append(z(inlist,item)) 
 689      return zscores 
 690   
 691   
 692  #################################### 
 693  #######  TRIMMING FUNCTIONS  ####### 
 694  #################################### 
 695   
 696 -def ltrimboth (l,proportiontocut): 
 697      """ 
 698  Slices off the passed proportion of items from BOTH ends of the passed 
 699  list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost' 
 700  10% of scores.  Assumes list is sorted by magnitude.  Slices off LESS if 
 701  proportion results in a non-integer slice index (i.e., conservatively 
 702  slices off proportiontocut). 
 703   
 704  Usage:   ltrimboth (l,proportiontocut) 
 705  Returns: trimmed version of list l 
 706  """ 
 707      lowercut = int(proportiontocut*len(l)) 
 708      uppercut = len(l) - lowercut 
 709      return l[lowercut:uppercut] 
 710   
 711   
 712 -def ltrim1 (l,proportiontocut,tail='right'): 
 713      """ 
 714  Slices off the passed proportion of items from ONE end of the passed 
 715  list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' 
 716  10% of scores).  Slices off LESS if proportion results in a non-integer 
 717  slice index (i.e., conservatively slices off proportiontocut). 
 718   
 719  Usage:   ltrim1 (l,proportiontocut,tail='right')  or set tail='left' 
 720  Returns: trimmed version of list l 
 721  """ 
 722      if tail == 'right': 
 723          lowercut = 0 
 724          uppercut = len(l) - int(proportiontocut*len(l)) 
 725      elif tail == 'left': 
 726          lowercut = int(proportiontocut*len(l)) 
 727          uppercut = len(l) 
 728      return l[lowercut:uppercut] 
 729   
 730   
 731  #################################### 
 732  #####  CORRELATION FUNCTIONS  ###### 
 733  #################################### 
 734   
 735 -def lpaired(x,y): 
 736      """ 
 737  Interactively determines the type of data and then runs the 
 738  appropriated statistic for paired group data. 
 739   
 740  Usage:   lpaired(x,y) 
 741  Returns: appropriate statistic name, value, and probability 
 742  """ 
 743      samples = '' 
 744      while samples not in ['i','r','I','R','c','C']: 
 745          print '\nIndependent or related samples, or correlation (i,r,c): ', 
 746          samples = raw_input() 
 747   
 748      if samples in ['i','I','r','R']: 
 749          print '\nComparing variances ...', 
 750  # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 
 751          r = obrientransform(x,y) 
 752          f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1)) 
 753          if p<0.05: 
 754              vartype='unequal, p='+str(round(p,4)) 
 755          else: 
 756              vartype='equal' 
 757          print vartype 
 758          if samples in ['i','I']: 
 759              if vartype[0]=='e': 
 760                  t,p = ttest_ind(x,y,0) 
 761                  print '\nIndependent samples t-test:  ', round(t,4),round(p,4) 
 762              else: 
 763                  if len(x)>20 or len(y)>20: 
 764                      z,p = ranksums(x,y) 
 765                      print '\nRank Sums test (NONparametric, n>20):  ', round(z,4),round(p,4) 
 766                  else: 
 767                      u,p = mannwhitneyu(x,y) 
 768                      print '\nMann-Whitney U-test (NONparametric, ns<20):  ', round(u,4),round(p,4) 
 769   
 770          else:  # RELATED SAMPLES 
 771              if vartype[0]=='e': 
 772                  t,p = ttest_rel(x,y,0) 
 773                  print '\nRelated samples t-test:  ', round(t,4),round(p,4) 
 774              else: 
 775                  t,p = ranksums(x,y) 
 776                  print '\nWilcoxon T-test (NONparametric):  ', round(t,4),round(p,4) 
 777      else:  # CORRELATION ANALYSIS 
 778          corrtype = '' 
 779          while corrtype not in ['c','C','r','R','d','D']: 
 780              print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', 
 781              corrtype = raw_input() 
 782          if corrtype in ['c','C']: 
 783              m,b,r,p,see = linregress(x,y) 
 784              print '\nLinear regression for continuous variables ...' 
 785              lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]] 
 786              pstat.printcc(lol) 
 787          elif corrtype in ['r','R']: 
 788              r,p = spearmanr(x,y) 
 789              print '\nCorrelation for ranked variables ...' 
 790              print "Spearman's r: ",round(r,4),round(p,4) 
 791          else: # DICHOTOMOUS 
 792              r,p = pointbiserialr(x,y) 
 793              print '\nAssuming x contains a dichotomous variable ...' 
 794              print 'Point Biserial r: ',round(r,4),round(p,4) 
 795      print '\n\n' 
 796      return None 
 797   
 798   
 799 -def lpearsonr(x,y): 
 800      """ 
 801  Calculates a Pearson correlation coefficient and the associated 
 802  probability value.  Taken from Heiman's Basic Statistics for the Behav. 
 803  Sci (2nd), p.195. 
 804   
 805  Usage:   lpearsonr(x,y)      where x and y are equal-length lists 
 806  Returns: Pearson's r value, two-tailed p-value 
 807  """ 
 808      TINY = 1.0e-30 
 809      if len(x) <> len(y): 
 810          raise ValueError, 'Input values not paired in pearsonr.  Aborting.' 
 811      n = len(x) 
 812      x = map(float,x) 
 813      y = map(float,y) 
 814      xmean = mean(x) 
 815      ymean = mean(y) 
 816      r_num = n*(summult(x,y)) - sum(x)*sum(y) 
 817      r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) 
 818      r = (r_num / r_den)  # denominator already a float 
 819      df = n-2 
 820      t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 
 821      prob = betai(0.5*df,0.5,df/float(df+t*t)) 
 822      return r, prob 
 823   
 824   
 825 -def lspearmanr(x,y): 
 826      """ 
 827  Calculates a Spearman rank-order correlation coefficient.  Taken 
 828  from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. 
 829   
 830  Usage:   lspearmanr(x,y)      where x and y are equal-length lists 
 831  Returns: Spearman's r, two-tailed p-value 
 832  """ 
 833      TINY = 1e-30 
 834      if len(x) <> len(y): 
 835          raise ValueError, 'Input values not paired in spearmanr.  Aborting.' 
 836      n = len(x) 
 837      rankx = rankdata(x) 
 838      ranky = rankdata(y) 
 839      dsq = sumdiffsquared(rankx,ranky) 
 840      rs = 1 - 6*dsq / float(n*(n**2-1)) 
 841      t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) 
 842      df = n-2 
 843      probrs = betai(0.5*df,0.5,df/(df+t*t))  # t already a float 
 844  # probability values for rs are from part 2 of the spearman function in 
 845  # Numerical Recipies, p.510.  They are close to tables, but not exact. (?) 
 846      return rs, probrs 
 847   
 848   
 849 -def lpointbiserialr(x,y): 
 850      """ 
 851  Calculates a point-biserial correlation coefficient and the associated 
 852  probability value.  Taken from Heiman's Basic Statistics for the Behav. 
 853  Sci (1st), p.194. 
 854   
 855  Usage:   lpointbiserialr(x,y)      where x,y are equal-length lists 
 856  Returns: Point-biserial r, two-tailed p-value 
 857  """ 
 858      TINY = 1e-30 
 859      if len(x) <> len(y): 
 860          raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr.  ABORTING.' 
 861      data = pstat.abut(x,y) 
 862      categories = pstat.unique(x) 
 863      if len(categories) <> 2: 
 864          raise ValueError, "Exactly 2 categories required for pointbiserialr()." 
 865      else:   # there are 2 categories, continue 
 866          codemap = pstat.abut(categories,range(2)) 
 867          recoded = pstat.recode(data,codemap,0) 
 868          x = pstat.linexand(data,0,categories[0]) 
 869          y = pstat.linexand(data,0,categories[1]) 
 870          xmean = mean(pstat.colex(x,1)) 
 871          ymean = mean(pstat.colex(y,1)) 
 872          n = len(data) 
 873          adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) 
 874          rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust 
 875          df = n-2 
 876          t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) 
 877          prob = betai(0.5*df,0.5,df/(df+t*t))  # t already a float 
 878          return rpb, prob 
 879   
 880   
 881 -def lkendalltau(x,y): 
 882      """ 
 883  Calculates Kendall's tau ... correlation of ordinal data.  Adapted 
 884  from function kendl1 in Numerical Recipies.  Needs good test-routine.@@@ 
 885   
 886  Usage:   lkendalltau(x,y) 
 887  Returns: Kendall's tau, two-tailed p-value 
 888  """ 
 889      n1 = 0 
 890      n2 = 0 
 891      iss = 0 
 892      for j in range(len(x)-1): 
 893          for k in range(j,len(y)): 
 894              a1 = x[j] - x[k] 
 895              a2 = y[j] - y[k] 
 896              aa = a1 * a2 
 897              if (aa):             # neither list has a tie 
 898                  n1 = n1 + 1 
 899                  n2 = n2 + 1 
 900                  if aa > 0: 
 901                      iss = iss + 1 
 902                  else: 
 903                      iss = iss -1 
 904              else: 
 905                  if (a1): 
 906                      n1 = n1 + 1 
 907                  else: 
 908                      n2 = n2 + 1 
 909      tau = iss / math.sqrt(n1*n2) 
 910      svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) 
 911      z = tau / math.sqrt(svar) 
 912      prob = erfcc(abs(z)/1.4142136) 
 913      return tau, prob 
 914   
 915   
 916 -def llinregress(x,y): 
 917      """ 
 918  Calculates a regression line on x,y pairs.   
 919   
 920  Usage:   llinregress(x,y)      x,y are equal-length lists of x-y coordinates 
 921  Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate 
 922  """ 
 923      TINY = 1.0e-20 
 924      if len(x) <> len(y): 
 925          raise ValueError, 'Input values not paired in linregress.  Aborting.' 
 926      n = len(x) 
 927      x = map(float,x) 
 928      y = map(float,y) 
 929      xmean = mean(x) 
 930      ymean = mean(y) 
 931      r_num = float(n*(summult(x,y)) - sum(x)*sum(y)) 
 932      r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) 
 933      r = r_num / r_den 
 934      z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY)) 
 935      df = n-2 
 936      t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 
 937      prob = betai(0.5*df,0.5,df/(df+t*t)) 
 938      slope = r_num / float(n*ss(x) - square_of_sums(x)) 
 939      intercept = ymean - slope*xmean 
 940      sterrest = math.sqrt(1-r*r)*samplestdev(y) 
 941      return slope, intercept, r, prob, sterrest 
 942   
 943   
 944  #################################### 
 945  #####  INFERENTIAL STATISTICS  ##### 
 946  #################################### 
 947   
 948 -def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'): 
 949      """ 
 950  Calculates the t-obtained for the independent samples T-test on ONE group 
 951  of scores a, given a population mean.  If printit=1, results are printed 
 952  to the screen.  If printit='filename', the results are output to 'filename' 
 953  using the given writemode (default=append).  Returns t-value, and prob. 
 954   
 955  Usage:   lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') 
 956  Returns: t-value, two-tailed prob 
 957  """ 
 958      x = mean(a) 
 959      v = var(a) 
 960      n = len(a) 
 961      df = n-1 
 962      svar = ((n-1)*v)/float(df) 
 963      t = (x-popmean)/math.sqrt(svar*(1.0/n)) 
 964      prob = betai(0.5*df,0.5,float(df)/(df+t*t)) 
 965   
 966      if printit <> 0: 
 967          statname = 'Single-sample T-test.' 
 968          outputpairedstats(printit,writemode, 
 969                            'Population','--',popmean,0,0,0, 
 970                            name,n,x,v,min(a),max(a), 
 971                            statname,t,prob) 
 972      return t,prob 
 973   
 974   
 975 -def lttest_ind (a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'): 
 976      """ 
 977  Calculates the t-obtained T-test on TWO INDEPENDENT samples of 
 978  scores a, and b.  From Numerical Recipies, p.483.  If printit=1, results 
 979  are printed to the screen.  If printit='filename', the results are output 
 980  to 'filename' using the given writemode (default=append).  Returns t-value, 
 981  and prob. 
 982   
 983  Usage:   lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a') 
 984  Returns: t-value, two-tailed prob 
 985  """ 
 986      x1 = mean(a) 
 987      x2 = mean(b) 
 988      v1 = stdev(a)**2 
 989      v2 = stdev(b)**2 
 990      n1 = len(a) 
 991      n2 = len(b) 
 992      df = n1+n2-2 
 993      svar = ((n1-1)*v1+(n2-1)*v2)/float(df) 
 994      t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2)) 
 995      prob = betai(0.5*df,0.5,df/(df+t*t)) 
 996   
 997      if printit <> 0: 
 998          statname = 'Independent samples T-test.' 
 999          outputpairedstats(printit,writemode, 
1000                            name1,n1,x1,v1,min(a),max(a), 
1001                            name2,n2,x2,v2,min(b),max(b), 
1002                            statname,t,prob) 
1003      return t,prob 
1004   
1005   
1006 -def lttest_rel (a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'): 
1007      """ 
1008  Calculates the t-obtained T-test on TWO RELATED samples of scores, 
1009  a and b.  From Numerical Recipies, p.483.  If printit=1, results are 
1010  printed to the screen.  If printit='filename', the results are output to 
1011  'filename' using the given writemode (default=append).  Returns t-value, 
1012  and prob. 
1013   
1014  Usage:   lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a') 
1015  Returns: t-value, two-tailed prob 
1016  """ 
1017      if len(a)<>len(b): 
1018          raise ValueError, 'Unequal length lists in ttest_rel.' 
1019      x1 = mean(a) 
1020      x2 = mean(b) 
1021      v1 = var(a) 
1022      v2 = var(b) 
1023      n = len(a) 
1024      cov = 0 
1025      for i in range(len(a)): 
1026          cov = cov + (a[i]-x1) * (b[i]-x2) 
1027      df = n-1 
1028      cov = cov / float(df) 
1029      sd = math.sqrt((v1+v2 - 2.0*cov)/float(n)) 
1030      t = (x1-x2)/sd 
1031      prob = betai(0.5*df,0.5,df/(df+t*t)) 
1032   
1033      if printit <> 0: 
1034          statname = 'Related samples T-test.' 
1035          outputpairedstats(printit,writemode, 
1036                            name1,n,x1,v1,min(a),max(a), 
1037                            name2,n,x2,v2,min(b),max(b), 
1038                            statname,t,prob) 
1039      return t, prob 
1040   
1041   
1042 -def lchisquare(f_obs,f_exp=None): 
1043      """ 
1044  Calculates a one-way chi square for list of observed frequencies and returns 
1045  the result.  If no expected frequencies are given, the total N is assumed to 
1046  be equally distributed across all groups. 
1047   
1048  Usage:   lchisquare(f_obs, f_exp=None)   f_obs = list of observed cell freq. 
1049  Returns: chisquare-statistic, associated p-value 
1050  """ 
1051      k = len(f_obs)                 # number of groups 
1052      if f_exp == None: 
1053          f_exp = [sum(f_obs)/float(k)] * len(f_obs) # create k bins with = freq. 
1054      chisq = 0 
1055      for i in range(len(f_obs)): 
1056          chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i]) 
1057      return chisq, chisqprob(chisq, k-1) 
1058   
1059   
1060 -def lks_2samp (data1,data2): 
1061      """ 
1062  Computes the Kolmogorov-Smirnof statistic on 2 samples.  From 
1063  Numerical Recipies in C, page 493. 
1064   
1065  Usage:   lks_2samp(data1,data2)   data1&2 are lists of values for 2 conditions 
1066  Returns: KS D-value, associated p-value 
1067  """ 
1068      j1 = 0 
1069      j2 = 0 
1070      fn1 = 0.0 
1071      fn2 = 0.0 
1072      n1 = len(data1) 
1073      n2 = len(data2) 
1074      en1 = n1 
1075      en2 = n2 
1076      d = 0.0 
1077      data1.sort() 
1078      data2.sort() 
1079      while j1 < n1 and j2 < n2: 
1080          d1=data1[j1] 
1081          d2=data2[j2] 
1082          if d1 <= d2: 
1083              fn1 = (j1)/float(en1) 
1084              j1 = j1 + 1 
1085          if d2 <= d1: 
1086              fn2 = (j2)/float(en2) 
1087              j2 = j2 + 1 
1088          dt = (fn2-fn1) 
1089          if math.fabs(dt) > math.fabs(d): 
1090              d = dt 
1091      try: 
1092          en = math.sqrt(en1*en2/float(en1+en2)) 
1093          prob = ksprob((en+0.12+0.11/en)*abs(d)) 
1094      except: 
1095          prob = 1.0 
1096      return d, prob 
1097   
1098   
1099 -def lmannwhitneyu(x,y): 
1100      """ 
1101  Calculates a Mann-Whitney U statistic on the provided scores and 
1102  returns the result.  Use only when the n in each condition is < 20 and 
1103  you have 2 independent samples of ranks.  NOTE: Mann-Whitney U is 
1104  significant if the u-obtained is LESS THAN or equal to the critical 
1105  value of U found in the tables.  Equivalent to Kruskal-Wallis H with 
1106  just 2 groups. 
1107   
1108  Usage:   lmannwhitneyu(data) 
1109  Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) 
1110  """ 
1111      n1 = len(x) 
1112      n2 = len(y) 
1113      ranked = rankdata(x+y) 
1114      rankx = ranked[0:n1]       # get the x-ranks 
1115      ranky = ranked[n1:]        # the rest are y-ranks 
1116      u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)  # calc U for x 
1117      u2 = n1*n2 - u1                            # remainder is U for y 
1118      bigu = max(u1,u2) 
1119      smallu = min(u1,u2) 
1120      T = math.sqrt(tiecorrect(ranked))  # correction factor for tied scores 
1121      if T == 0: 
1122          raise ValueError, 'All numbers are identical in lmannwhitneyu' 
1123      sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) 
1124      z = abs((bigu-n1*n2/2.0) / sd)  # normal approximation for prob calc 
1125      return smallu, 1.0 - zprob(z) 
1126   
1127   
1128 -def ltiecorrect(rankvals): 
1129      """ 
1130  Corrects for ties in Mann Whitney U and Kruskal Wallis H tests.  See 
1131  Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences. 
1132  New York: McGraw-Hill.  Code adapted from |Stat rankind.c code. 
1133   
1134  Usage:   ltiecorrect(rankvals) 
1135  Returns: T correction factor for U or H 
1136  """ 
1137      sorted,posn = shellsort(rankvals) 
1138      n = len(sorted) 
1139      T = 0.0 
1140      i = 0 
1141      while (i<n-1): 
1142          if sorted[i] == sorted[i+1]: 
1143              nties = 1 
1144              while (i<n-1) and (sorted[i] == sorted[i+1]): 
1145                  nties = nties +1 
1146                  i = i +1 
1147              T = T + nties**3 - nties 
1148          i = i+1 
1149      T = T / float(n**3-n) 
1150      return 1.0 - T 
1151   
1152   
1153 -def lranksums(x,y): 
1154      """ 
1155  Calculates the rank sums statistic on the provided scores and 
1156  returns the result.  Use only when the n in each condition is > 20 and you 
1157  have 2 independent samples of ranks. 
1158   
1159  Usage:   lranksums(x,y) 
1160  Returns: a z-statistic, two-tailed p-value 
1161  """ 
1162      n1 = len(x) 
1163      n2 = len(y) 
1164      alldata = x+y 
1165      ranked = rankdata(alldata) 
1166      x = ranked[:n1] 
1167      y = ranked[n1:] 
1168      s = sum(x) 
1169      expected = n1*(n1+n2+1) / 2.0 
1170      z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) 
1171      prob = 2*(1.0 -zprob(abs(z))) 
1172      return z, prob 
1173   
1174   
1175 -def lwilcoxont(x,y): 
1176      """ 
1177  Calculates the Wilcoxon T-test for related samples and returns the 
1178  result.  A non-parametric T-test. 
1179   
1180  Usage:   lwilcoxont(x,y) 
1181  Returns: a t-statistic, two-tail probability estimate 
1182  """ 
1183      if len(x) <> len(y): 
1184          raise ValueError, 'Unequal N in wilcoxont.  Aborting.' 
1185      d=[] 
1186      for i in range(len(x)): 
1187          diff = x[i] - y[i] 
1188          if diff <> 0: 
1189              d.append(diff) 
1190      count = len(d) 
1191      absd = map(abs,d) 
1192      absranked = rankdata(absd) 
1193      r_plus = 0.0 
1194      r_minus = 0.0 
1195      for i in range(len(absd)): 
1196          if d[i] < 0: 
1197              r_minus = r_minus + absranked[i] 
1198          else: 
1199              r_plus = r_plus + absranked[i] 
1200      wt = min(r_plus, r_minus) 
1201      mn = count * (count+1) * 0.25 
1202      se =  math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) 
1203      z = math.fabs(wt-mn) / se 
1204      prob = 2*(1.0 -zprob(abs(z))) 
1205      return wt, prob 
1206   
1207   
1208 -def lkruskalwallish(*args): 
1209      """ 
1210  The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more 
1211  groups, requiring at least 5 subjects in each group.  This function 
1212  calculates the Kruskal-Wallis H-test for 3 or more independent samples 
1213  and returns the result.   
1214   
1215  Usage:   lkruskalwallish(*args) 
1216  Returns: H-statistic (corrected for ties), associated p-value 
1217  """ 
1218      args = list(args) 
1219      n = [0]*len(args) 
1220      all = [] 
1221      n = map(len,args) 
1222      for i in range(len(args)): 
1223          all = all + args[i] 
1224      ranked = rankdata(all) 
1225      T = tiecorrect(ranked) 
1226      for i in range(len(args)): 
1227          args[i] = ranked[0:n[i]] 
1228          del ranked[0:n[i]] 
1229      rsums = [] 
1230      for i in range(len(args)): 
1231          rsums.append(sum(args[i])**2) 
1232          rsums[i] = rsums[i] / float(n[i]) 
1233      ssbn = sum(rsums) 
1234      totaln = sum(n) 
1235      h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) 
1236      df = len(args) - 1 
1237      if T == 0: 
1238          raise ValueError, 'All numbers are identical in lkruskalwallish' 
1239      h = h / float(T) 
1240      return h, chisqprob(h,df) 
1241   
1242   
1243 -def lfriedmanchisquare(*args): 
1244      """ 
1245  Friedman Chi-Square is a non-parametric, one-way within-subjects 
1246  ANOVA.  This function calculates the Friedman Chi-square test for repeated 
1247  measures and returns the result, along with the associated probability 
1248  value.  It assumes 3 or more repeated measures.  Only 3 levels requires a 
1249  minimum of 10 subjects in the study.  Four levels requires 5 subjects per 
1250  level(??). 
1251   
1252  Usage:   lfriedmanchisquare(*args) 
1253  Returns: chi-square statistic, associated p-value 
1254  """ 
1255      k = len(args) 
1256      if k < 3: 
1257          raise ValueError, 'Less than 3 levels.  Friedman test not appropriate.' 
1258      n = len(args[0]) 
1259      data = apply(pstat.abut,tuple(args)) 
1260      for i in range(len(data)): 
1261          data[i] = rankdata(data[i]) 
1262      ssbn = 0 
1263      for i in range(k): 
1264          ssbn = ssbn + sum(args[i])**2 
1265      chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) 
1266      return chisq, chisqprob(chisq,k-1) 
1267   
1268   
1269  #################################### 
1270  ####  PROBABILITY CALCULATIONS  #### 
1271  #################################### 
1272   
1273 -def lchisqprob(chisq,df): 
1274      """ 
1275  Returns the (1-tailed) probability value associated with the provided 
1276  chi-square value and df.  Adapted from chisq.c in Gary Perlman's |Stat. 
1277   
1278  Usage:   lchisqprob(chisq,df) 
1279  """ 
1280      BIG = 20.0 
1281      def ex(x): 
1282          BIG = 20.0 
1283          if x < -BIG: 
1284              return 0.0 
1285          else: 
1286              return math.exp(x) 
1287   
1288      if chisq <=0 or df < 1: 
1289          return 1.0 
1290      a = 0.5 * chisq 
1291      if df%2 == 0: 
1292          even = 1 
1293      else: 
1294          even = 0 
1295      if df > 1: 
1296          y = ex(-a) 
1297      if even: 
1298          s = y 
1299      else: 
1300          s = 2.0 * zprob(-math.sqrt(chisq)) 
1301      if (df > 2): 
1302          chisq = 0.5 * (df - 1.0) 
1303          if even: 
1304              z = 1.0 
1305          else: 
1306              z = 0.5 
1307          if a > BIG: 
1308              if even: 
1309                  e = 0.0 
1310              else: 
1311                  e = math.log(math.sqrt(math.pi)) 
1312              c = math.log(a) 
1313              while (z <= chisq): 
1314                  e = math.log(z) + e 
1315                  s = s + ex(c*z-a-e) 
1316                  z = z + 1.0 
1317              return s 
1318          else: 
1319              if even: 
1320                  e = 1.0 
1321              else: 
1322                  e = 1.0 / math.sqrt(math.pi) / math.sqrt(a) 
1323                  c = 0.0 
1324                  while (z <= chisq): 
1325                      e = e * (a/float(z)) 
1326                      c = c + e 
1327                      z = z + 1.0 
1328                  return (c*y+s) 
1329      else: 
1330          return s 
1331   
1332   
1333 -def lerfcc(x): 
1334      """ 
1335  Returns the complementary error function erfc(x) with fractional 
1336  error everywhere less than 1.2e-7.  Adapted from Numerical Recipies. 
1337   
1338  Usage:   lerfcc(x) 
1339  """ 
1340      z = abs(x) 
1341      t = 1.0 / (1.0+0.5*z) 
1342      ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) 
1343      if x >= 0: 
1344          return ans 
1345      else: 
1346          return 2.0 - ans 
1347   
1348   
1349 -def lzprob(z): 
1350      """ 
1351  Returns the area under the normal curve 'to the left of' the given z value. 
1352  Thus, :: 
1353      for z<0, zprob(z) = 1-tail probability 
1354      for z>0, 1.0-zprob(z) = 1-tail probability 
1355      for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 
1356  Adapted from z.c in Gary Perlman's |Stat. 
1357   
1358  Usage:   lzprob(z) 
1359  """ 
1360      Z_MAX = 6.0    # maximum meaningful z-value 
1361      if z == 0.0: 
1362          x = 0.0 
1363      else: 
1364          y = 0.5 * math.fabs(z) 
1365          if y >= (Z_MAX*0.5): 
1366              x = 1.0 
1367          elif (y < 1.0): 
1368              w = y*y 
1369              x = ((((((((0.000124818987 * w 
1370                          -0.001075204047) * w +0.005198775019) * w 
1371                        -0.019198292004) * w +0.059054035642) * w 
1372                      -0.151968751364) * w +0.319152932694) * w 
1373                    -0.531923007300) * w +0.797884560593) * y * 2.0 
1374          else: 
1375              y = y - 2.0 
1376              x = (((((((((((((-0.000045255659 * y 
1377                               +0.000152529290) * y -0.000019538132) * y 
1378                             -0.000676904986) * y +0.001390604284) * y 
1379                           -0.000794620820) * y -0.002034254874) * y 
1380                         +0.006549791214) * y -0.010557625006) * y 
1381                       +0.011630447319) * y -0.009279453341) * y 
1382                     +0.005353579108) * y -0.002141268741) * y 
1383                   +0.000535310849) * y +0.999936657524 
1384      if z > 0.0: 
1385          prob = ((x+1.0)*0.5) 
1386      else: 
1387          prob = ((1.0-x)*0.5) 
1388      return prob 
1389   
1390   
1391 -def lksprob(alam): 
1392      """ 
1393  Computes a Kolmolgorov-Smirnov t-test significance level.  Adapted from 
1394  Numerical Recipies. 
1395   
1396  Usage:   lksprob(alam) 
1397  """ 
1398      fac = 2.0 
1399      sum = 0.0 
1400      termbf = 0.0 
1401      a2 = -2.0*alam*alam 
1402      for j in range(1,201): 
1403          term = fac*math.exp(a2*j*j) 
1404          sum = sum + term 
1405          if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum): 
1406              return sum 
1407          fac = -fac 
1408          termbf = math.fabs(term) 
1409      return 1.0             # Get here only if fails to converge; was 0.0!! 
1410   
1411   
1412 -def lfprob (dfnum, dfden, F): 
1413      """ 
1414  Returns the (1-tailed) significance level (p-value) of an F 
1415  statistic given the degrees of freedom for the numerator (dfR-dfF) and 
1416  the degrees of freedom for the denominator (dfF). 
1417   
1418  Usage:   lfprob(dfnum, dfden, F)   where usually dfnum=dfbn, dfden=dfwn 
1419  """ 
1420      p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F)) 
1421      return p 
1422   
1423   
1424 -def lbetacf(a,b,x): 
1425      """ 
1426  This function evaluates the continued fraction form of the incomplete 
1427  Beta function, betai.  (Adapted from: Numerical Recipies in C.) 
1428   
1429  Usage:   lbetacf(a,b,x) 
1430  """ 
1431      ITMAX = 200 
1432      EPS = 3.0e-7 
1433   
1434      bm = az = am = 1.0 
1435      qab = a+b 
1436      qap = a+1.0 
1437      qam = a-1.0 
1438      bz = 1.0-qab*x/qap 
1439      for i in range(ITMAX+1): 
1440          em = float(i+1) 
1441          tem = em + em 
1442          d = em*(b-em)*x/((qam+tem)*(a+tem)) 
1443          ap = az + d*am 
1444          bp = bz+d*bm 
1445          d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) 
1446          app = ap+d*az 
1447          bpp = bp+d*bz 
1448          aold = az 
1449          am = ap/bpp 
1450          bm = bp/bpp 
1451          az = app/bpp 
1452          bz = 1.0 
1453          if (abs(az-aold)<(EPS*abs(az))): 
1454              return az 
1455      print 'a or b too big, or ITMAX too small in Betacf.' 
1456   
1457   
1458 -def lgammln(xx): 
1459      """ 
1460  Returns the gamma function of xx.:: 
1461      Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. 
1462  (Adapted from: Numerical Recipies in C.) 
1463   
1464  Usage:   lgammln(xx) 
1465  """ 
1466   
1467      coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 
1468               0.120858003e-2, -0.536382e-5] 
1469      x = xx - 1.0 
1470      tmp = x + 5.5 
1471      tmp = tmp - (x+0.5)*math.log(tmp) 
1472      ser = 1.0 
1473      for j in range(len(coeff)): 
1474          x = x + 1 
1475          ser = ser + coeff[j]/x 
1476      return -tmp + math.log(2.50662827465*ser) 
1477   
1478   
1479 -def lbetai(a,b,x): 
1480      """ 
1481  Returns the incomplete beta function:: 
1482   
1483      I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) 
1484   
1485  where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma 
1486  function of a.  The continued fraction formulation is implemented here, 
1487  using the betacf function.  (Adapted from: Numerical Recipies in C.) 
1488   
1489  Usage:   lbetai(a,b,x) 
1490  """ 
1491      if (x<0.0 or x>1.0): 
1492          raise ValueError, 'Bad x in lbetai' 
1493      if (x==0.0 or x==1.0): 
1494          bt = 0.0 
1495      else: 
1496          bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b* 
1497                        math.log(1.0-x)) 
1498      if (x<(a+1.0)/(a+b+2.0)): 
1499          return bt*betacf(a,b,x)/float(a) 
1500      else: 
1501          return 1.0-bt*betacf(b,a,1.0-x)/float(b) 
1502   
1503   
1504  #################################### 
1505  #######  ANOVA CALCULATIONS  ####### 
1506  #################################### 
1507   
1508 -def lF_oneway(*lists): 
1509      """ 
1510  Performs a 1-way ANOVA, returning an F-value and probability given 
1511  any number of groups.  From Heiman, pp.394-7. 
1512   
1513  Usage:   F_oneway(*lists)    where *lists is any number of lists, one per treatment group 
1514  Returns: F value, one-tailed p-value 
1515  """ 
1516      a = len(lists)           # ANOVA on 'a' groups, each in it's own list 
1517      means = [0]*a 
1518      vars = [0]*a 
1519      ns = [0]*a 
1520      alldata = [] 
1521      tmp = map(N.array,lists) 
1522      means = map(amean,tmp) 
1523      vars = map(avar,tmp) 
1524      ns = map(len,lists) 
1525      for i in range(len(lists)): 
1526          alldata = alldata + lists[i] 
1527      alldata = N.array(alldata) 
1528      bign = len(alldata) 
1529      sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) 
1530      ssbn = 0 
1531      for list in lists: 
1532          ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list)) 
1533      ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) 
1534      sswn = sstot-ssbn 
1535      dfbn = a-1 
1536      dfwn = bign - a 
1537      msb = ssbn/float(dfbn) 
1538      msw = sswn/float(dfwn) 
1539      f = msb/msw 
1540      prob = fprob(dfbn,dfwn,f) 
1541      return f, prob 
1542   
1543   
1544 -def lF_value (ER,EF,dfnum,dfden): 
1545      """ 
1546  Returns an F-statistic given the following:: 
1547          ER  = error associated with the null hypothesis (the Restricted model) 
1548          EF  = error associated with the alternate hypothesis (the Full model) 
1549          dfR-dfF = degrees of freedom of the numerator 
1550          dfF = degrees of freedom associated with the denominator/Full model 
1551   
1552  Usage:   lF_value(ER,EF,dfnum,dfden) 
1553  """ 
1554      return ((ER-EF)/float(dfnum) / (EF/float(dfden))) 
1555   
1556   
1557  #################################### 
1558  ########  SUPPORT FUNCTIONS  ####### 
1559  #################################### 
1560   
1561 -def writecc (listoflists,file,writetype='w',extra=2): 
1562      """ 
1563  Writes a list of lists to a file in columns, customized by the max 
1564  size of items within the columns (max size of items in col, +2 characters) 
1565  to specified file.  File-overwrite is the default. 
1566   
1567  Usage:   writecc (listoflists,file,writetype='w',extra=2) 
1568  Returns: None 
1569  """ 
1570      if type(listoflists[0]) not in [ListType,TupleType]: 
1571          listoflists = [listoflists] 
1572      outfile = open(file,writetype) 
1573      rowstokill = [] 
1574      list2print = copy.deepcopy(listoflists) 
1575      for i in range(len(listoflists)): 
1576          if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes': 
1577              rowstokill = rowstokill + [i] 
1578      rowstokill.reverse() 
1579      for row in rowstokill: 
1580          del list2print[row] 
1581      maxsize = [0]*len(list2print[0]) 
1582      for col in range(len(list2print[0])): 
1583          items = pstat.colex(list2print,col) 
1584          items = map(pstat.makestr,items) 
1585          maxsize[col] = max(map(len,items)) + extra 
1586      for row in listoflists: 
1587          if row == ['\n'] or row == '\n': 
1588              outfile.write('\n') 
1589          elif row == ['dashes'] or row == 'dashes': 
1590              dashes = [0]*len(maxsize) 
1591              for j in range(len(maxsize)): 
1592                  dashes[j] = '-'*(maxsize[j]-2) 
1593              outfile.write(pstat.lineincustcols(dashes,maxsize)) 
1594          else: 
1595              outfile.write(pstat.lineincustcols(row,maxsize)) 
1596          outfile.write('\n') 
1597      outfile.close() 
1598      return None 
1599   
1600   
1601 -def lincr(l,cap):        # to increment a list up to a max-list of 'cap' 
1602      """ 
1603  Simulate a counting system from an n-dimensional list. 
1604   
1605  Usage:   lincr(l,cap)   l=list to increment, cap=max values for each list pos'n 
1606  Returns: next set of values for list l, OR -1 (if overflow) 
1607  """ 
1608      l[0] = l[0] + 1     # e.g., [0,0,0] --> [2,4,3] (=cap) 
1609      for i in range(len(l)): 
1610          if l[i] > cap[i] and i < len(l)-1: # if carryover AND not done 
1611              l[i] = 0 
1612              l[i+1] = l[i+1] + 1 
1613          elif l[i] > cap[i] and i == len(l)-1: # overflow past last column, must be finished 
1614              l = -1 
1615      return l 
1616   
1617   
1618 -def lsum (inlist): 
1619      """ 
1620  Returns the sum of the items in the passed list. 
1621   
1622  Usage:   lsum(inlist) 
1623  """ 
1624      s = 0 
1625      for item in inlist: 
1626          s = s + item 
1627      return s 
1628   
1629   
1630 -def lcumsum (inlist): 
1631      """ 
1632  Returns a list consisting of the cumulative sum of the items in the 
1633  passed list. 
1634   
1635  Usage:   lcumsum(inlist) 
1636  """ 
1637      newlist = copy.deepcopy(inlist) 
1638      for i in range(1,len(newlist)): 
1639          newlist[i] = newlist[i] + newlist[i-1] 
1640      return newlist 
1641   
1642   
1643 -def lss(inlist): 
1644      """ 
1645  Squares each value in the passed list, adds up these squares and 
1646  returns the result. 
1647   
1648  Usage:   lss(inlist) 
1649  """ 
1650      ss = 0 
1651      for item in inlist: 
1652          ss = ss + item*item 
1653      return ss 
1654   
1655   
1656 -def lsummult (list1,list2): 
1657      """ 
1658  Multiplies elements in list1 and list2, element by element, and 
1659  returns the sum of all resulting multiplications.  Must provide equal 
1660  length lists. 
1661   
1662  Usage:   lsummult(list1,list2) 
1663  """ 
1664      if len(list1) <> len(list2): 
1665          raise ValueError, "Lists not equal length in summult." 
1666      s = 0 
1667      for item1,item2 in pstat.abut(list1,list2): 
1668          s = s + item1*item2 
1669      return s 
1670   
1671   
1672 -def lsumdiffsquared(x,y): 
1673      """ 
1674  Takes pairwise differences of the values in lists x and y, squares 
1675  these differences, and returns the sum of these squares. 
1676   
1677  Usage:   lsumdiffsquared(x,y) 
1678  Returns: sum[(x[i]-y[i])**2] 
1679  """ 
1680      sds = 0 
1681      for i in range(len(x)): 
1682          sds = sds + (x[i]-y[i])**2 
1683      return sds 
1684   
1685   
1686 -def lsquare_of_sums(inlist): 
1687      """ 
1688  Adds the values in the passed list, squares the sum, and returns 
1689  the result. 
1690   
1691  Usage:   lsquare_of_sums(inlist) 
1692  Returns: sum(inlist[i])**2 
1693  """ 
1694      s = sum(inlist) 
1695      return float(s)*s 
1696   
1697   
1698 -def lshellsort(inlist): 
1699      """ 
1700  Shellsort algorithm.  Sorts a 1D-list. 
1701   
1702  Usage:   lshellsort(inlist) 
1703  Returns: sorted-inlist, sorting-index-vector (for original list) 
1704  """ 
1705      n = len(inlist) 
1706      svec = copy.deepcopy(inlist) 
1707      ivec = range(n) 
1708      gap = n/2   # integer division needed 
1709      while gap >0: 
1710          for i in range(gap,n): 
1711              for j in range(i-gap,-1,-gap): 
1712                  while j>=0 and svec[j]>svec[j+gap]: 
1713                      temp        = svec[j] 
1714                      svec[j]     = svec[j+gap] 
1715                      svec[j+gap] = temp 
1716                      itemp       = ivec[j] 
1717                      ivec[j]     = ivec[j+gap] 
1718                      ivec[j+gap] = itemp 
1719          gap = gap / 2  # integer division needed 
1720  # svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]] 
1721      return svec, ivec 
1722   
1723   
1724 -def lrankdata(inlist): 
1725      """ 
1726  Ranks the data in inlist, dealing with ties appropritely.  Assumes 
1727  a 1D inlist.  Adapted from Gary Perlman's |Stat ranksort. 
1728   
1729  Usage:   lrankdata(inlist) 
1730  Returns: a list of length equal to inlist, containing rank scores 
1731  """ 
1732      n = len(inlist) 
1733      svec, ivec = shellsort(inlist) 
1734      sumranks = 0 
1735      dupcount = 0 
1736      newlist = [0]*n 
1737      for i in range(n): 
1738          sumranks = sumranks + i 
1739          dupcount = dupcount + 1 
1740          if i==n-1 or svec[i] <> svec[i+1]: 
1741              averank = sumranks / float(dupcount) + 1 
1742              for j in range(i-dupcount+1,i+1): 
1743                  newlist[ivec[j]] = averank 
1744              sumranks = 0 
1745              dupcount = 0 
1746      return newlist 
1747   
1748   
1749 -def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob): 
1750      """ 
1751  Prints or write to a file stats for two groups, using the name, n, 
1752  mean, sterr, min and max for each group, as well as the statistic name, 
1753  its value, and the associated p-value. 
1754   
1755  Usage:: 
1756           outputpairedstats(fname,writemode, 
1757                             name1,n1,mean1,stderr1,min1,max1, 
1758                             name2,n2,mean2,stderr2,min2,max2, 
1759                             statname,stat,prob) 
1760  Returns: None 
1761  """ 
1762      suffix = ''                       # for *s after the p-value 
1763      try: 
1764          x = prob.shape 
1765          prob = prob[0] 
1766      except: 
1767          pass 
1768      if  prob < 0.001:  suffix = '  ***' 
1769      elif prob < 0.01:  suffix = '  **' 
1770      elif prob < 0.05:  suffix = '  *' 
1771      title = [['Name','N','Mean','SD','Min','Max']] 
1772      lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1], 
1773                    [name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]] 
1774      if type(fname)<>StringType or len(fname)==0: 
1775          print 
1776          print statname 
1777          print 
1778          pstat.printcc(lofl) 
1779          print 
1780          try: 
1781              if stat.shape == (): 
1782                  stat = stat[0] 
1783              if prob.shape == (): 
1784                  prob = prob[0] 
1785          except: 
1786              pass 
1787          print 'Test statistic = ',round(stat,3),'   p = ',round(prob,3),suffix 
1788          print 
1789      else: 
1790          file = open(fname,writemode) 
1791          file.write('\n'+statname+'\n\n') 
1792          file.close() 
1793          writecc(lofl,fname,'a') 
1794          file = open(fname,'a') 
1795          try: 
1796              if stat.shape == (): 
1797                  stat = stat[0] 
1798              if prob.shape == (): 
1799                  prob = prob[0] 
1800          except: 
1801              pass 
1802          file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),'   p = ',round(prob,4),suffix,'\n\n'])) 
1803          file.close() 
1804      return None 
1805   
1806   
1807 -def lfindwithin (data): 
1808      """ 
1809  Returns an integer representing a binary vector, where 1=within- 
1810  subject factor, 0=between.  Input equals the entire data 2D list (i.e., 
1811  column 0=random factor, column -1=measured values (those two are skipped). 
1812  Note: input data is in |Stat format ... a list of lists ("2D list") with  
1813  one row per measured value, first column=subject identifier, last column= 
1814  score, one in-between column per factor (these columns contain level 
1815  designations on each factor).  See also stats.anova.__doc__. 
1816   
1817  Usage:   lfindwithin(data)     data in |Stat format 
1818  """ 
1819   
1820      numfact = len(data[0])-1 
1821      withinvec = 0 
1822      for col in range(1,numfact): 
1823          examplelevel = pstat.unique(pstat.colex(data,col))[0] 
1824          rows = pstat.linexand(data,col,examplelevel)  # get 1 level of this factor 
1825          factsubjs = pstat.unique(pstat.colex(rows,0)) 
1826          allsubjs = pstat.unique(pstat.colex(data,0)) 
1827          if len(factsubjs) == len(allsubjs):  # fewer Ss than scores on this factor? 
1828              withinvec = withinvec + (1 << col) 
1829      return withinvec 
1830   
1831   
1832  ######################################################### 
1833  ######################################################### 
1834  ####### DISPATCH LISTS AND TUPLES TO ABOVE FCNS ######### 
1835  ######################################################### 
1836  ######################################################### 
1837   
1838  ## CENTRAL TENDENCY: 
1839  geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), ) 
1840  harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), ) 
1841  mean = Dispatch ( (lmean, (ListType, TupleType)), ) 
1842  median = Dispatch ( (lmedian, (ListType, TupleType)), ) 
1843  medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), ) 
1844  mode = Dispatch ( (lmode, (ListType, TupleType)), ) 
1845   
1846  ## MOMENTS: 
1847  moment = Dispatch ( (lmoment, (ListType, TupleType)), ) 
1848  variation = Dispatch ( (lvariation, (ListType, TupleType)), ) 
1849  skew = Dispatch ( (lskew, (ListType, TupleType)), ) 
1850  kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), ) 
1851  describe = Dispatch ( (ldescribe, (ListType, TupleType)), ) 
1852   
1853  ## FREQUENCY STATISTICS: 
1854  itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), ) 
1855  scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), ) 
1856  percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), ) 
1857  histogram = Dispatch ( (lhistogram, (ListType, TupleType)), ) 
1858  cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), ) 
1859  relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), ) 
1860   
1861  ## VARIABILITY: 
1862  obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), ) 
1863  samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), ) 
1864  samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), ) 
1865  var = Dispatch ( (lvar, (ListType, TupleType)), ) 
1866  stdev = Dispatch ( (lstdev, (ListType, TupleType)), ) 
1867  sterr = Dispatch ( (lsterr, (ListType, TupleType)), ) 
1868  sem = Dispatch ( (lsem, (ListType, TupleType)), ) 
1869  z = Dispatch ( (lz, (ListType, TupleType)), ) 
1870  zs = Dispatch ( (lzs, (ListType, TupleType)), ) 
1871   
1872  ## TRIMMING FCNS: 
1873  trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), ) 
1874  trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), ) 
1875   
1876  ## CORRELATION FCNS: 
1877  paired = Dispatch ( (lpaired, (ListType, TupleType)), ) 
1878  pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), ) 
1879  spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), ) 
1880  pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), ) 
1881  kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), ) 
1882  linregress = Dispatch ( (llinregress, (ListType, TupleType)), ) 
1883   
1884  ## INFERENTIAL STATS: 
1885  ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), ) 
1886  ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), ) 
1887  ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), ) 
1888  chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), ) 
1889  ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), ) 
1890  mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), ) 
1891  ranksums = Dispatch ( (lranksums, (ListType, TupleType)), ) 
1892  tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), ) 
1893  wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), ) 
1894  kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), ) 
1895  friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), ) 
1896   
1897  ## PROBABILITY CALCS: 
1898  chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), ) 
1899  zprob = Dispatch ( (lzprob, (IntType, FloatType)), ) 
1900  ksprob = Dispatch ( (lksprob, (IntType, FloatType)), ) 
1901  fprob = Dispatch ( (lfprob, (IntType, FloatType)), ) 
1902  betacf = Dispatch ( (lbetacf, (IntType, FloatType)), ) 
1903  betai = Dispatch ( (lbetai, (IntType, FloatType)), ) 
1904  erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), ) 
1905  gammln = Dispatch ( (lgammln, (IntType, FloatType)), ) 
1906   
1907  ## ANOVA FUNCTIONS: 
1908  F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), ) 
1909  F_value = Dispatch ( (lF_value, (ListType, TupleType)), ) 
1910   
1911  ## SUPPORT FUNCTIONS: 
1912  incr = Dispatch ( (lincr, (ListType, TupleType)), ) 
1913  sum = Dispatch ( (lsum, (ListType, TupleType)), ) 
1914  cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), ) 
1915  ss = Dispatch ( (lss, (ListType, TupleType)), ) 
1916  summult = Dispatch ( (lsummult, (ListType, TupleType)), ) 
1917  square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), ) 
1918  sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), ) 
1919  shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), ) 
1920  rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), ) 
1921  findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), ) 
1922   
1923   
1924  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1925  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1926  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1927  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1928  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1929  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1930  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1931  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1932  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1933  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1934  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1935  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1936  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1937  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1938  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1939  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1940  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1941  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1942  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1943   
1944  try:                         # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE 
1945   import Numeric 
1946   N = Numeric 
1947   import LinearAlgebra 
1948   LA = LinearAlgebra 
1949   
1950   
1951  ##################################### 
1952  ########  ACENTRAL TENDENCY  ######## 
1953  ##################################### 
1954   
1955 - def ageometricmean (inarray,dimension=None,keepdims=0): 
1956      """ 
1957  Calculates the geometric mean of the values in the passed array. 
1958  That is:  n-th root of (x1 * x2 * ... * xn).  Defaults to ALL values in 
1959  the passed array.  Use dimension=None to flatten array first.  REMEMBER: if 
1960  dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 
1961  if dimension is a sequence, it collapses over all specified dimensions.  If 
1962  keepdims is set to 1, the resulting array will have as many dimensions as 
1963  inarray, with only 1 'level' per dim that was collapsed over. 
1964   
1965  Usage:   ageometricmean(inarray,dimension=None,keepdims=0) 
1966  Returns: geometric mean computed over dim(s) listed in dimension 
1967  """ 
1968      inarray = N.array(inarray,N.Float) 
1969      if dimension == None: 
1970          inarray = N.ravel(inarray) 
1971          size = len(inarray) 
1972          mult = N.power(inarray,1.0/size) 
1973          mult = N.multiply.reduce(mult) 
1974      elif type(dimension) in [IntType,FloatType]: 
1975          size = inarray.shape[dimension] 
1976          mult = N.power(inarray,1.0/size) 
1977          mult = N.multiply.reduce(mult,dimension) 
1978          if keepdims == 1: 
1979              shp = list(inarray.shape) 
1980              shp[dimension] = 1 
1981              sum = N.reshape(sum,shp) 
1982      else: # must be a SEQUENCE of dims to average over 
1983          dims = list(dimension) 
1984          dims.sort() 
1985          dims.reverse() 
1986          size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float) 
1987          mult = N.power(inarray,1.0/size) 
1988          for dim in dims: 
1989              mult = N.multiply.reduce(mult,dim) 
1990          if keepdims == 1: 
1991              shp = list(inarray.shape) 
1992              for dim in dims: 
1993                  shp[dim] = 1 
1994              mult = N.reshape(mult,shp) 
1995      return mult 
1996   
1997   
1998 - def aharmonicmean (inarray,dimension=None,keepdims=0): 
1999      """ 
2000  Calculates the harmonic mean of the values in the passed array. 
2001  That is:  n / (1/x1 + 1/x2 + ... + 1/xn).  Defaults to ALL values in 
2002  the passed array.  Use dimension=None to flatten array first.  REMEMBER: if 
2003  dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 
2004  if dimension is a sequence, it collapses over all specified dimensions.  If 
2005  keepdims is set to 1, the resulting array will have as many dimensions as 
2006  inarray, with only 1 'level' per dim that was collapsed over. 
2007   
2008  Usage:   aharmonicmean(inarray,dimension=None,keepdims=0) 
2009  Returns: harmonic mean computed over dim(s) in dimension 
2010  """ 
2011      inarray = inarray.astype(N.Float) 
2012      if dimension == None: 
2013          inarray = N.ravel(inarray) 
2014          size = len(inarray) 
2015          s = N.add.reduce(1.0 / inarray) 
2016      elif type(dimension) in [IntType,FloatType]: 
2017          size = float(inarray.shape[dimension]) 
2018          s = N.add.reduce(1.0/inarray, dimension) 
2019          if keepdims == 1: 
2020              shp = list(inarray.shape) 
2021              shp[dimension] = 1 
2022              s = N.reshape(s,shp) 
2023      else: # must be a SEQUENCE of dims to average over 
2024          dims = list(dimension) 
2025          dims.sort() 
2026          nondims = [] 
2027          for i in range(len(inarray.shape)): 
2028              if i not in dims: 
2029                  nondims.append(i) 
2030          tinarray = N.transpose(inarray,nondims+dims) # put keep-dims first 
2031          idx = [0] *len(nondims) 
2032          if idx == []: 
2033              size = len(N.ravel(inarray)) 
2034              s = asum(1.0 / inarray) 
2035              if keepdims == 1: 
2036                  s = N.reshape([s],N.ones(len(inarray.shape))) 
2037          else: 
2038              idx[0] = -1 
2039              loopcap = N.array(tinarray.shape[0:len(nondims)]) -1 
2040              s = N.zeros(loopcap+1,N.Float) 
2041              while incr(idx,loopcap) <> -1: 
2042                  s[idx] = asum(1.0/tinarray[idx]) 
2043              size = N.multiply.reduce(N.take(inarray.shape,dims)) 
2044              if keepdims == 1: 
2045                  shp = list(inarray.shape) 
2046                  for dim in dims: 
2047                      shp[dim] = 1 
2048                  s = N.reshape(s,shp) 
2049      return size / s 
2050   
2051   
2052 - def amean (inarray,dimension=None,keepdims=0): 
2053      """ 
2054  Calculates the arithmatic mean of the values in the passed array. 
2055  That is:  1/n * (x1 + x2 + ... + xn).  Defaults to ALL values in the 
2056  passed array.  Use dimension=None to flatten array first.  REMEMBER: if 
2057  dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 
2058  if dimension is a sequence, it collapses over all specified dimensions.  If 
2059  keepdims is set to 1, the resulting array will have as many dimensions as 
2060  inarray, with only 1 'level' per dim that was collapsed over. 
2061   
2062  Usage:   amean(inarray,dimension=None,keepdims=0) 
2063  Returns: arithematic mean calculated over dim(s) in dimension 
2064  """ 
2065      if inarray.typecode() in ['l','s','b']: 
2066          inarray = inarray.astype(N.Float) 
2067      if dimension == None: 
2068          inarray = N.ravel(inarray) 
2069          sum = N.add.reduce(inarray) 
2070          denom = float(len(inarray)) 
2071      elif type(dimension) in [IntType,FloatType]: 
2072          sum = asum(inarray,dimension) 
2073          denom = float(inarray.shape[dimension]) 
2074          if keepdims == 1: 
2075              shp = list(inarray.shape) 
2076              shp[dimension] = 1 
2077              sum = N.reshape(sum,shp) 
2078      else: # must be a TUPLE of dims to average over 
2079          dims = list(dimension) 
2080          dims.sort() 
2081          dims.reverse() 
2082          sum = inarray *1.0 
2083          for dim in dims: 
2084              sum = N.add.reduce(sum,dim) 
2085          denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float) 
2086          if keepdims == 1: 
2087              shp = list(inarray.shape) 
2088              for dim in dims: 
2089                  shp[dim] = 1 
2090              sum = N.reshape(sum,shp) 
2091      return sum/denom 
2092   
2093   
2094 - def amedian (inarray,numbins=1000): 
2095      """ 
2096  Calculates the COMPUTED median value of an array of numbers, given the 
2097  number of bins to use for the histogram (more bins approaches finding the 
2098  precise median value of the array; default number of bins = 1000).  From 
2099  G.W. Heiman's Basic Stats, or CRC Probability & Statistics. 
2100  NOTE:  THIS ROUTINE ALWAYS uses the entire passed array (flattens it first). 
2101   
2102  Usage:   amedian(inarray,numbins=1000) 
2103  Returns: median calculated over ALL values in inarray 
2104  """ 
2105      inarray = N.ravel(inarray) 
2106      (hist, smallest, binsize, extras) = ahistogram(inarray,numbins) 
2107      cumhist = N.cumsum(hist)            # make cumulative histogram 
2108      otherbins = N.greater_equal(cumhist,len(inarray)/2.0) 
2109      otherbins = list(otherbins)         # list of 0/1s, 1s start at median bin 
2110      cfbin = otherbins.index(1)                # get 1st(!) index holding 50%ile score 
2111      LRL = smallest + binsize*cfbin        # get lower read limit of that bin 
2112      cfbelow = N.add.reduce(hist[0:cfbin])        # cum. freq. below bin 
2113      freq = hist[cfbin]                        # frequency IN the 50%ile bin 
2114      median = LRL + ((len(inarray)/2.0-cfbelow)/float(freq))*binsize # MEDIAN 
2115      return median 
2116   
2117   
2118 - def amedianscore (inarray,dimension=None): 
2119      """ 
2120  Returns the 'middle' score of the passed array.  If there is an even 
2121  number of scores, the mean of the 2 middle scores is returned.  Can function 
2122  with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can 
2123  be None, to pre-flatten the array, or else dimension must equal 0). 
2124   
2125  Usage:   amedianscore(inarray,dimension=None) 
2126  Returns: 'middle' score of the array, or the mean of the 2 middle scores 
2127  """ 
2128      if dimension == None: 
2129          inarray = N.ravel(inarray) 
2130          dimension = 0 
2131      inarray = N.sort(inarray,dimension) 
2132      if inarray.shape[dimension] % 2 == 0:   # if even number of elements 
2133          indx = inarray.shape[dimension]/2   # integer division correct 
2134          median = N.asarray(inarray[indx]+inarray[indx-1]) / 2.0 
2135      else: 
2136          indx = inarray.shape[dimension] / 2 # integer division correct 
2137          median = N.take(inarray,[indx],dimension) 
2138          if median.shape == (1,): 
2139              median = median[0] 
2140      return median 
2141   
2142   
2143 - def amode(a, dimension=None): 
2144      """ 
2145  Returns an array of the modal (most common) score in the passed array. 
2146  If there is more than one such score, ONLY THE FIRST is returned. 
2147  The bin-count for the modal values is also returned.  Operates on whole 
2148  array (dimension=None), or on a given dimension. 
2149   
2150  Usage:   amode(a, dimension=None) 
2151  Returns: array of bin-counts for mode(s), array of corresponding modal values 
2152  """ 
2153   
2154      if dimension == None: 
2155          a = N.ravel(a) 
2156          dimension = 0 
2157      scores = pstat.aunique(N.ravel(a))       # get ALL unique values 
2158      testshape = list(a.shape) 
2159      testshape[dimension] = 1 
2160      oldmostfreq = N.zeros(testshape) 
2161      oldcounts = N.zeros(testshape) 
2162      for score in scores: 
2163          template = N.equal(a,score) 
2164          counts = asum(template,dimension,1) 
2165          mostfrequent = N.where(N.greater(counts,oldcounts),score,oldmostfreq) 
2166          oldcounts = N.where(N.greater(counts,oldcounts),counts,oldcounts) 
2167          oldmostfreq = mostfrequent 
2168      return oldcounts, mostfrequent 
2169   
2170   
2171 - def atmean(a,limits=None,inclusive=(1,1)): 
2172       """ 
2173  Returns the arithmetic mean of all values in an array, ignoring values 
2174  strictly outside the sequence passed to 'limits'.   Note: either limit 
2175  in the sequence, or the value of limits itself, can be set to None.  The 
2176  inclusive list/tuple determines whether the lower and upper limiting bounds 
2177  (respectively) are open/exclusive (0) or closed/inclusive (1). 
2178   
2179  Usage:   atmean(a,limits=None,inclusive=(1,1)) 
2180  """ 
2181       if a.typecode() in ['l','s','b']: 
2182           a = a.astype(N.Float) 
2183       if limits == None: 
2184           return mean(a) 
2185       assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atmean" 
2186       if inclusive[0]:         lowerfcn = N.greater_equal 
2187       else:               lowerfcn = N.greater 
2188       if inclusive[1]:         upperfcn = N.less_equal 
2189       else:               upperfcn = N.less 
2190       if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 
2191           raise ValueError, "No array values within given limits (atmean)." 
2192       elif limits[0]==None and limits[1]<>None: 
2193           mask = upperfcn(a,limits[1]) 
2194       elif limits[0]<>None and limits[1]==None: 
2195           mask = lowerfcn(a,limits[0]) 
2196       elif limits[0]<>None and limits[1]<>None: 
2197           mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 
2198       s = float(N.add.reduce(N.ravel(a*mask))) 
2199       n = float(N.add.reduce(N.ravel(mask))) 
2200       return s/n 
2201   
2202   
2203 - def atvar(a,limits=None,inclusive=(1,1)): 
2204       """ 
2205  Returns the sample variance of values in an array, (i.e., using N-1), 
2206  ignoring values strictly outside the sequence passed to 'limits'.   
2207  Note: either limit in the sequence, or the value of limits itself, 
2208  can be set to None.  The inclusive list/tuple determines whether the lower 
2209  and upper limiting bounds (respectively) are open/exclusive (0) or 
2210  closed/inclusive (1). 
2211   
2212  Usage:   atvar(a,limits=None,inclusive=(1,1)) 
2213  """ 
2214       a = a.astype(N.Float) 
2215       if limits == None or limits == [None,None]: 
2216           term1 = N.add.reduce(N.ravel(a*a)) 
2217           n = float(len(N.ravel(a))) - 1 
2218           term2 = N.add.reduce(N.ravel(a))**2 / n 
2219           return (term1 - term2) / n 
2220       assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atvar" 
2221       if inclusive[0]:         lowerfcn = N.greater_equal 
2222       else:               lowerfcn = N.greater 
2223       if inclusive[1]:         upperfcn = N.less_equal 
2224       else:               upperfcn = N.less 
2225       if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 
2226           raise ValueError, "No array values within given limits (atvar)." 
2227       elif limits[0]==None and limits[1]<>None: 
2228           mask = upperfcn(a,limits[1]) 
2229       elif limits[0]<>None and limits[1]==None: 
2230           mask = lowerfcn(a,limits[0]) 
2231       elif limits[0]<>None and limits[1]<>None: 
2232           mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 
2233       term1 = N.add.reduce(N.ravel(a*a*mask)) 
2234       n = float(N.add.reduce(N.ravel(mask))) - 1 
2235       term2 = N.add.reduce(N.ravel(a*mask))**2 / n 
2236       return (term1 - term2) / n 
2237   
2238   
2239 - def atmin(a,lowerlimit=None,dimension=None,inclusive=1): 
2240       """ 
2241  Returns the minimum value of a, along dimension, including only values less 
2242  than (or equal to, if inclusive=1) lowerlimit.  If the limit is set to None, 
2243  all values in the array are used. 
2244   
2245  Usage:   atmin(a,lowerlimit=None,dimension=None,inclusive=1) 
2246  """ 
2247       if inclusive:         lowerfcn = N.greater 
2248       else:               lowerfcn = N.greater_equal 
2249       if dimension == None: 
2250           a = N.ravel(a) 
2251           dimension = 0 
2252       if lowerlimit == None: 
2253           lowerlimit = N.minimum.reduce(N.ravel(a))-11 
2254       biggest = N.maximum.reduce(N.ravel(a)) 
2255       ta = N.where(lowerfcn(a,lowerlimit),a,biggest) 
2256       return N.minimum.reduce(ta,dimension) 
2257   
2258   
2259 - def atmax(a,upperlimit,dimension=None,inclusive=1): 
2260       """ 
2261  Returns the maximum value of a, along dimension, including only values greater 
2262  than (or equal to, if inclusive=1) upperlimit.  If the limit is set to None, 
2263  a limit larger than the max value in the array is used. 
2264   
2265  Usage:   atmax(a,upperlimit,dimension=None,inclusive=1) 
2266  """ 
2267       if inclusive:         upperfcn = N.less 
2268       else:               upperfcn = N.less_equal 
2269       if dimension == None: 
2270           a = N.ravel(a) 
2271           dimension = 0 
2272       if upperlimit == None: 
2273           upperlimit = N.maximum.reduce(N.ravel(a))+1 
2274       smallest = N.minimum.reduce(N.ravel(a)) 
2275       ta = N.where(upperfcn(a,upperlimit),a,smallest) 
2276       return N.maximum.reduce(ta,dimension) 
2277   
2278   
2279 - def atstdev(a,limits=None,inclusive=(1,1)): 
2280       """ 
2281  Returns the standard deviation of all values in an array, ignoring values 
2282  strictly outside the sequence passed to 'limits'.   Note: either limit 
2283  in the sequence, or the value of limits itself, can be set to None.  The 
2284  inclusive list/tuple determines whether the lower and upper limiting bounds 
2285  (respectively) are open/exclusive (0) or closed/inclusive (1). 
2286   
2287  Usage:   atstdev(a,limits=None,inclusive=(1,1)) 
2288  """ 
2289       return N.sqrt(tvar(a,limits,inclusive)) 
2290   
2291   
2292 - def atsem(a,limits=None,inclusive=(1,1)): 
2293       """ 
2294  Returns the standard error of the mean for the values in an array, 
2295  (i.e., using N for the denominator), ignoring values strictly outside 
2296  the sequence passed to 'limits'.   Note: either limit in the sequence, 
2297  or the value of limits itself, can be set to None.  The inclusive list/tuple 
2298  determines whether the lower and upper limiting bounds (respectively) are 
2299  open/exclusive (0) or closed/inclusive (1). 
2300   
2301  Usage:   atsem(a,limits=None,inclusive=(1,1)) 
2302  """ 
2303       sd = tstdev(a,limits,inclusive) 
2304       if limits == None or limits == [None,None]: 
2305           n = float(len(N.ravel(a))) 
2306       assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atsem" 
2307       if inclusive[0]:         lowerfcn = N.greater_equal 
2308       else:               lowerfcn = N.greater 
2309       if inclusive[1]:         upperfcn = N.less_equal 
2310       else:               upperfcn = N.less 
2311       if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 
2312           raise ValueError, "No array values within given limits (atsem)." 
2313       elif limits[0]==None and limits[1]<>None: 
2314           mask = upperfcn(a,limits[1]) 
2315       elif limits[0]<>None and limits[1]==None: 
2316           mask = lowerfcn(a,limits[0]) 
2317       elif limits[0]<>None and limits[1]<>None: 
2318           mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 
2319       term1 = N.add.reduce(N.ravel(a*a*mask)) 
2320       n = float(N.add.reduce(N.ravel(mask))) 
2321       return sd/math.sqrt(n) 
2322   
2323   
2324  ##################################### 
2325  ############  AMOMENTS  ############# 
2326  ##################################### 
2327   
2328 - def amoment(a,moment=1,dimension=None): 
2329      """ 
2330  Calculates the nth moment about the mean for a sample (defaults to the 
2331  1st moment).  Generally used to calculate coefficients of skewness and 
2332  kurtosis.  Dimension can equal None (ravel array first), an integer 
2333  (the dimension over which to operate), or a sequence (operate over 
2334  multiple dimensions). 
2335   
2336  Usage:   amoment(a,moment=1,dimension=None) 
2337  Returns: appropriate moment along given dimension 
2338  """ 
2339      if dimension == None: 
2340          a = N.ravel(a) 
2341          dimension = 0 
2342      if moment == 1: 
2343          return 0.0 
2344      else: 
2345          mn = amean(a,dimension,1)  # 1=keepdims 
2346          s = N.power((a-mn),moment) 
2347          return amean(s,dimension) 
2348   
2349   
2350 - def avariation(a,dimension=None): 
2351      """ 
2352  Returns the coefficient of variation, as defined in CRC Standard 
2353  Probability and Statistics, p.6. Dimension can equal None (ravel array 
2354  first), an integer (the dimension over which to operate), or a 
2355  sequence (operate over multiple dimensions). 
2356   
2357  Usage:   avariation(a,dimension=None) 
2358  """ 
2359      return 100.0*asamplestdev(a,dimension)/amean(a,dimension) 
2360   
2361   
2362 - def askew(a,dimension=None):  
2363      """  
2364  Returns the skewness of a distribution (normal ==> 0.0; >0 means extra 
2365  weight in left tail).  Use askewtest() to see if it's close enough. 
2366  Dimension can equal None (ravel array first), an integer (the 
2367  dimension over which to operate), or a sequence (operate over multiple 
2368  dimensions). 
2369   
2370  Usage:   askew(a, dimension=None) 
2371  Returns: skew of vals in a along dimension, returning ZERO where all vals equal 
2372  """ 
2373      denom = N.power(amoment(a,2,dimension),1.5) 
2374      zero = N.equal(denom,0) 
2375      if type(denom) == N.ArrayType and asum(zero) <> 0: 
2376          print "Number of zeros in askew: ",asum(zero) 
2377      denom = denom + zero  # prevent divide-by-zero 
2378      return N.where(zero, 0, amoment(a,3,dimension)/denom) 
2379   
2380   
2381 - def akurtosis(a,dimension=None): 
2382      """ 
2383  Returns the kurtosis of a distribution (normal ==> 3.0; >3 means 
2384  heavier in the tails, and usually more peaked).  Use akurtosistest() 
2385  to see if it's close enough.  Dimension can equal None (ravel array 
2386  first), an integer (the dimension over which to operate), or a 
2387  sequence (operate over multiple dimensions). 
2388   
2389  Usage:   akurtosis(a,dimension=None) 
2390  Returns: kurtosis of values in a along dimension, and ZERO where all vals equal 
2391  """ 
2392      denom = N.power(amoment(a,2,dimension),2) 
2393      zero = N.equal(denom,0) 
2394      if type(denom) == N.ArrayType and asum(zero) <> 0: 
2395          print "Number of zeros in akurtosis: ",asum(zero) 
2396      denom = denom + zero  # prevent divide-by-zero 
2397      return N.where(zero,0,amoment(a,4,dimension)/denom) 
2398   
2399   
2400 - def adescribe(inarray,dimension=None): 
2401       """ 
2402  Returns several descriptive statistics of the passed array.  Dimension 
2403  can equal None (ravel array first), an integer (the dimension over 
2404  which to operate), or a sequence (operate over multiple dimensions). 
2405   
2406  Usage:   adescribe(inarray,dimension=None) 
2407  Returns: n, (min,max), mean, standard deviation, skew, kurtosis 
2408  """ 
2409       if dimension == None: 
2410           inarray = N.ravel(inarray) 
2411           dimension = 0 
2412       n = inarray.shape[dimension] 
2413       mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray)) 
2414       m = amean(inarray,dimension) 
2415       sd = astdev(inarray,dimension) 
2416       skew = askew(inarray,dimension) 
2417       kurt = akurtosis(inarray,dimension) 
2418       return n, mm, m, sd, skew, kurt 
2419   
2420   
2421  ##################################### 
2422  ########  NORMALITY TESTS  ########## 
2423  ##################################### 
2424   
2425 - def askewtest(a,dimension=None): 
2426      """ 
2427  Tests whether the skew is significantly different from a normal 
2428  distribution.  Dimension can equal None (ravel array first), an 
2429  integer (the dimension over which to operate), or a sequence (operate 
2430  over multiple dimensions). 
2431   
2432  Usage:   askewtest(a,dimension=None) 
2433  Returns: z-score and 2-tail z-probability 
2434  """ 
2435      if dimension == None: 
2436          a = N.ravel(a) 
2437          dimension = 0 
2438      b2 = askew(a,dimension) 
2439      n = float(a.shape[dimension]) 
2440      y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)) ) 
2441      beta2 = ( 3.0*(n*n+27*n-70)*(n+1)*(n+3) ) / ( (n-2.0)*(n+5)*(n+7)*(n+9) ) 
2442      W2 = -1 + N.sqrt(2*(beta2-1)) 
2443      delta = 1/N.sqrt(N.log(N.sqrt(W2))) 
2444      alpha = N.sqrt(2/(W2-1)) 
2445      y = N.where(N.equal(y,0),1,y) 
2446      Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1)) 
2447      return Z, (1.0-zprob(Z))*2 
2448   
2449   
2450 - def akurtosistest(a,dimension=None): 
2451      """ 
2452  Tests whether a dataset has normal kurtosis (i.e., 
2453  kurtosis=3(n-1)/(n+1)) Valid only for n>20.  Dimension can equal None 
2454  (ravel array first), an integer (the dimension over which to operate), 
2455  or a sequence (operate over multiple dimensions). 
2456   
2457  Usage:   akurtosistest(a,dimension=None) 
2458  Returns: z-score and 2-tail z-probability, returns 0 for bad pixels 
2459  """ 
2460      if dimension == None: 
2461          a = N.ravel(a) 
2462          dimension = 0 
2463      n = float(a.shape[dimension]) 
2464      if n<20: 
2465          print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n 
2466      b2 = akurtosis(a,dimension) 
2467      E = 3.0*(n-1) /(n+1) 
2468      varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5)) 
2469      x = (b2-E)/N.sqrt(varb2) 
2470      sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))/ 
2471                                                         (n*(n-2)*(n-3))) 
2472      A = 6.0 + 8.0/sqrtbeta1 *(2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2))) 
2473      term1 = 1 -2/(9.0*A) 
2474      denom = 1 +x*N.sqrt(2/(A-4.0)) 
2475      denom = N.where(N.less(denom,0), 99, denom) 
2476      term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0)) 
2477      Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A)) 
2478      Z = N.where(N.equal(denom,99), 0, Z) 
2479      return Z, (1.0-zprob(Z))*2 
2480   
2481   
2482 - def anormaltest(a,dimension=None): 
2483      """ 
2484  Tests whether skew and/OR kurtosis of dataset differs from normal 
2485  curve.  Can operate over multiple dimensions.  Dimension can equal 
2486  None (ravel array first), an integer (the dimension over which to 
2487  operate), or a sequence (operate over multiple dimensions). 
2488   
2489  Usage:   anormaltest(a,dimension=None) 
2490  Returns: z-score and 2-tail probability 
2491  """ 
2492      if dimension == None: 
2493          a = N.ravel(a) 
2494          dimension = 0 
2495      s,p = askewtest(a,dimension) 
2496      k,p = akurtosistest(a,dimension) 
2497      k2 = N.power(s,2) + N.power(k,2) 
2498      return k2, achisqprob(k2,2) 
2499   
2500   
2501  ##################################### 
2502  ######  AFREQUENCY FUNCTIONS  ####### 
2503  ##################################### 
2504   
2505 - def aitemfreq(a): 
2506      """ 
2507  Returns a 2D array of item frequencies.  Column 1 contains item values, 
2508  column 2 contains their respective counts.  Assumes a 1D array is passed. 
2509   
2510  Usage:   aitemfreq(a) 
2511  Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) 
2512  """ 
2513      scores = pstat.aunique(a) 
2514      scores = N.sort(scores) 
2515      freq = N.zeros(len(scores)) 
2516      for i in range(len(scores)): 
2517          freq[i] = N.add.reduce(N.equal(a,scores[i])) 
2518      return N.array(pstat.aabut(scores, freq)) 
2519   
2520   
2521 - def ascoreatpercentile (inarray, percent): 
2522      """ 
2523  Usage:   ascoreatpercentile(inarray,percent)   0<percent<100 
2524  Returns: score at given percentile, relative to inarray distribution 
2525  """ 
2526      percent = percent / 100.0 
2527      targetcf = percent*len(inarray) 
2528      h, lrl, binsize, extras = histogram(inarray) 
2529      cumhist = cumsum(h*1) 
2530      for i in range(len(cumhist)): 
2531          if cumhist[i] >= targetcf: 
2532              break 
2533      score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) 
2534      return score 
2535   
2536   
2537 - def apercentileofscore (inarray,score,histbins=10,defaultlimits=None): 
2538      """ 
2539  Note: result of this function depends on the values used to histogram 
2540  the data(!). 
2541   
2542  Usage:   apercentileofscore(inarray,score,histbins=10,defaultlimits=None) 
2543  Returns: percentile-position of score (0-100) relative to inarray 
2544  """ 
2545      h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits) 
2546      cumhist = cumsum(h*1) 
2547      i = int((score - lrl)/float(binsize)) 
2548      pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100 
2549      return pct 
2550   
2551   
2552 - def ahistogram (inarray,numbins=10,defaultlimits=None,printextras=1): 
2553      """ 
2554  Returns (i) an array of histogram bin counts, (ii) the smallest value 
2555  of the histogram binning, and (iii) the bin width (the last 2 are not 
2556  necessarily integers).  Default number of bins is 10.  Defaultlimits 
2557  can be None (the routine picks bins spanning all the numbers in the 
2558  inarray) or a 2-sequence (lowerlimit, upperlimit).  Returns all of the 
2559  following: array of bin values, lowerreallimit, binsize, extrapoints. 
2560   
2561  Usage:   ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1) 
2562  Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range) 
2563  """ 
2564      inarray = N.ravel(inarray)               # flatten any >1D arrays 
2565      if (defaultlimits <> None): 
2566          lowerreallimit = defaultlimits[0] 
2567          upperreallimit = defaultlimits[1] 
2568          binsize = (upperreallimit-lowerreallimit) / float(numbins) 
2569      else: 
2570          Min = N.minimum.reduce(inarray) 
2571          Max = N.maximum.reduce(inarray) 
2572          estbinwidth = float(Max - Min)/float(numbins) + 1 
2573          binsize = (Max-Min+estbinwidth)/float(numbins) 
2574          lowerreallimit = Min - binsize/2.0  #lower real limit,1st bin 
2575      bins = N.zeros(numbins) 
2576      extrapoints = 0 
2577      for num in inarray: 
2578          try: 
2579              if (num-lowerreallimit) < 0: 
2580                  extrapoints = extrapoints + 1 
2581              else: 
2582                  bintoincrement = int((num-lowerreallimit) / float(binsize)) 
2583                  bins[bintoincrement] = bins[bintoincrement] + 1 
2584          except:                           # point outside lower/upper limits 
2585              extrapoints = extrapoints + 1 
2586      if (extrapoints > 0 and printextras == 1): 
2587          print '\nPoints outside given histogram range =',extrapoints 
2588      return (bins, lowerreallimit, binsize, extrapoints) 
2589   
2590   
2591 - def acumfreq(a,numbins=10,defaultreallimits=None): 
2592      """ 
2593  Returns a cumulative frequency histogram, using the histogram function. 
2594  Defaultreallimits can be None (use all data), or a 2-sequence containing 
2595  lower and upper limits on values to include. 
2596   
2597  Usage:   acumfreq(a,numbins=10,defaultreallimits=None) 
2598  Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints 
2599  """ 
2600      h,l,b,e = histogram(a,numbins,defaultreallimits) 
2601      cumhist = cumsum(h*1) 
2602      return cumhist,l,b,e 
2603   
2604   
2605 - def arelfreq(a,numbins=10,defaultreallimits=None): 
2606      """ 
2607  Returns a relative frequency histogram, using the histogram function. 
2608  Defaultreallimits can be None (use all data), or a 2-sequence containing 
2609  lower and upper limits on values to include. 
2610   
2611  Usage:   arelfreq(a,numbins=10,defaultreallimits=None) 
2612  Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints 
2613  """ 
2614      h,l,b,e = histogram(a,numbins,defaultreallimits) 
2615      h = N.array(h/float(a.shape[0])) 
2616      return h,l,b,e 
2617   
2618   
2619  ##################################### 
2620  ######  AVARIABILITY FUNCTIONS  ##### 
2621  ##################################### 
2622   
2623 - def aobrientransform(*args): 
2624      """ 
2625  Computes a transform on input data (any number of columns).  Used to 
2626  test for homogeneity of variance prior to running one-way stats.  Each 
2627  array in *args is one level of a factor.  If an F_oneway() run on the 
2628  transformed data and found significant, variances are unequal.   From 
2629  Maxwell and Delaney, p.112. 
2630   
2631  Usage:   aobrientransform(*args)    *args = 1D arrays, one per level of factor 
2632  Returns: transformed data for use in an ANOVA 
2633  """ 
2634      TINY = 1e-10 
2635      k = len(args) 
2636      n = N.zeros(k,N.Float) 
2637      v = N.zeros(k,N.Float) 
2638      m = N.zeros(k,N.Float) 
2639      nargs = [] 
2640      for i in range(k): 
2641          nargs.append(args[i].astype(N.Float)) 
2642          n[i] = float(len(nargs[i])) 
2643          v[i] = var(nargs[i]) 
2644          m[i] = mean(nargs[i]) 
2645      for j in range(k): 
2646          for i in range(n[j]): 
2647              t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 
2648              t2 = 0.5*v[j]*(n[j]-1.0) 
2649              t3 = (n[j]-1.0)*(n[j]-2.0) 
2650              nargs[j][i] = (t1-t2) / float(t3) 
2651      check = 1 
2652      for j in range(k): 
2653          if v[j] - mean(nargs[j]) > TINY: 
2654              check = 0 
2655      if check <> 1: 
2656          raise ValueError, 'Lack of convergence in obrientransform.' 
2657      else: 
2658          return N.array(nargs) 
2659   
2660   
2661 - def asamplevar (inarray,dimension=None,keepdims=0): 
2662      """ 
2663  Returns the sample standard deviation of the values in the passed 
2664  array (i.e., using N).  Dimension can equal None (ravel array first), 
2665  an integer (the dimension over which to operate), or a sequence 
2666  (operate over multiple dimensions).  Set keepdims=1 to return an array 
2667  with the same number of dimensions as inarray. 
2668   
2669  Usage:   asamplevar(inarray,dimension=None,keepdims=0) 
2670  """ 
2671      if dimension == None: 
2672          inarray = N.ravel(inarray) 
2673          dimension = 0 
2674      if dimension == 1: 
2675          mn = amean(inarray,dimension)[:,N.NewAxis] 
2676      else: 
2677          mn = amean(inarray,dimension,keepdims=1) 
2678      deviations = inarray - mn  
2679      if type(dimension) == ListType: 
2680          n = 1 
2681          for d in dimension: 
2682              n = n*inarray.shape[d] 
2683      else: 
2684          n = inarray.shape[dimension] 
2685      svar = ass(deviations,dimension,keepdims) / float(n) 
2686      return svar 
2687   
2688   
2689 - def asamplestdev (inarray, dimension=None, keepdims=0): 
2690      """ 
2691  Returns the sample standard deviation of the values in the passed 
2692  array (i.e., using N).  Dimension can equal None (ravel array first), 
2693  an integer (the dimension over which to operate), or a sequence 
2694  (operate over multiple dimensions).  Set keepdims=1 to return an array 
2695  with the same number of dimensions as inarray. 
2696   
2697  Usage:   asamplestdev(inarray,dimension=None,keepdims=0) 
2698  """ 
2699      return N.sqrt(asamplevar(inarray,dimension,keepdims)) 
2700   
2701   
2702 - def asignaltonoise(instack,dimension=0): 
2703      """ 
2704  Calculates signal-to-noise.  Dimension can equal None (ravel array 
2705  first), an integer (the dimension over which to operate), or a 
2706  sequence (operate over multiple dimensions). 
2707   
2708  Usage:   asignaltonoise(instack,dimension=0): 
2709  Returns: array containing the value of (mean/stdev) along dimension, or 0 when stdev=0 
2710  """ 
2711      m = mean(instack,dimension) 
2712      sd = stdev(instack,dimension) 
2713      return N.where(N.equal(sd,0),0,m/sd) 
2714   
2715   
2716 - def avar (inarray, dimension=None,keepdims=0): 
2717      """ 
2718  Returns the estimated population variance of the values in the passed 
2719  array (i.e., N-1).  Dimension can equal None (ravel array first), an 
2720  integer (the dimension over which to operate), or a sequence (operate 
2721  over multiple dimensions).  Set keepdims=1 to return an array with the 
2722  same number of dimensions as inarray. 
2723   
2724  Usage:   avar(inarray,dimension=None,keepdims=0) 
2725  """ 
2726      if dimension == None: 
2727          inarray = N.ravel(inarray) 
2728          dimension = 0 
2729      mn = amean(inarray,dimension,1) 
2730      deviations = inarray - mn 
2731      if type(dimension) == ListType: 
2732          n = 1 
2733          for d in dimension: 
2734              n = n*inarray.shape[d] 
2735      else: 
2736          n = inarray.shape[dimension] 
2737      var = ass(deviations,dimension,keepdims)/float(n-1) 
2738      return var 
2739   
2740   
2741 - def astdev (inarray, dimension=None, keepdims=0): 
2742      """ 
2743  Returns the estimated population standard deviation of the values in 
2744  the passed array (i.e., N-1).  Dimension can equal None (ravel array 
2745  first), an integer (the dimension over which to operate), or a 
2746  sequence (operate over multiple dimensions).  Set keepdims=1 to return 
2747  an array with the same number of dimensions as inarray. 
2748   
2749  Usage:   astdev(inarray,dimension=None,keepdims=0) 
2750  """ 
2751      return N.sqrt(avar(inarray,dimension,keepdims)) 
2752   
2753   
2754 - def asterr (inarray, dimension=None, keepdims=0): 
2755      """ 
2756  Returns the estimated population standard error of the values in the 
2757  passed array (i.e., N-1).  Dimension can equal None (ravel array 
2758  first), an integer (the dimension over which to operate), or a 
2759  sequence (operate over multiple dimensions).  Set keepdims=1 to return 
2760  an array with the same number of dimensions as inarray. 
2761   
2762  Usage:   asterr(inarray,dimension=None,keepdims=0) 
2763  """ 
2764      if dimension == None: 
2765          inarray = N.ravel(inarray) 
2766          dimension = 0 
2767      return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension])) 
2768   
2769   
2770 - def asem (inarray, dimension=None, keepdims=0): 
2771      """ 
2772  Returns the standard error of the mean (i.e., using N) of the values 
2773  in the passed array.  Dimension can equal None (ravel array first), an 
2774  integer (the dimension over which to operate), or a sequence (operate 
2775  over multiple dimensions).  Set keepdims=1 to return an array with the 
2776  same number of dimensions as inarray. 
2777   
2778  Usage:   asem(inarray,dimension=None, keepdims=0) 
2779  """ 
2780      if dimension == None: 
2781          inarray = N.ravel(inarray) 
2782          dimension = 0 
2783      if type(dimension) == ListType: 
2784          n = 1 
2785          for d in dimension: 
2786              n = n*inarray.shape[d] 
2787      else: 
2788          n = inarray.shape[dimension] 
2789      s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1) 
2790      return s 
2791   
2792   
2793 - def az (a, score): 
2794      """ 
2795  Returns the z-score of a given input score, given thearray from which 
2796  that score came.  Not appropriate for population calculations, nor for 
2797  arrays > 1D. 
2798   
2799  Usage:   az(a, score) 
2800  """ 
2801      z = (score-amean(a)) / asamplestdev(a) 
2802      return z 
2803   
2804   
2805 - def azs (a): 
2806      """ 
2807  Returns a 1D array of z-scores, one for each score in the passed array, 
2808  computed relative to the passed array. 
2809   
2810  Usage:   azs(a) 
2811  """ 
2812      zscores = [] 
2813      for item in a: 
2814          zscores.append(z(a,item)) 
2815      return N.array(zscores) 
2816   
2817   
2818 - def azmap (scores, compare, dimension=0): 
2819      """ 
2820  Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to 
2821  array passed to compare (e.g., [time,x,y]).  Assumes collapsing over dim 0 
2822  of the compare array. 
2823   
2824  Usage:   azs(scores, compare, dimension=0) 
2825  """ 
2826      mns = amean(compare,dimension) 
2827      sstd = asamplestdev(compare,0) 
2828      return (scores - mns) / sstd 
2829   
2830   
2831  ##################################### 
2832  #######  ATRIMMING FUNCTIONS  ####### 
2833  ##################################### 
2834   
2835 - def around(a,digits=1): 
2836       """ 
2837  Rounds all values in array a to 'digits' decimal places. 
2838   
2839  Usage:   around(a,digits) 
2840  Returns: a, where each value is rounded to 'digits' decimals 
2841  """ 
2842       def ar(x,d=digits): 
2843           return round(x,d) 
2844   
2845       if type(a) <> N.ArrayType: 
2846           try: 
2847               a = N.array(a) 
2848           except: 
2849               a = N.array(a,'O') 
2850       shp = a.shape 
2851       if a.typecode() in ['f','F','d','D']: 
2852           b = N.ravel(a) 
2853           b = N.array(map(ar,b)) 
2854           b.shape = shp 
2855       elif a.typecode() in ['o','O']: 
2856           b = N.ravel(a)*1 
2857           for i in range(len(b)): 
2858               if type(b[i]) == FloatType: 
2859                   b[i] = round(b[i],digits) 
2860           b.shape = shp 
2861       else:  # not a float, double or Object array 
2862           b = a*1 
2863       return b 
2864   
2865   
2866 - def athreshold(a,threshmin=None,threshmax=None,newval=0): 
2867      """ 
2868  Like Numeric.clip() except that values <threshmid or >threshmax are replaced 
2869  by newval instead of by threshmin/threshmax (respectively). 
2870   
2871  Usage:   athreshold(a,threshmin=None,threshmax=None,newval=0) 
2872  Returns: a, with values <threshmin or >threshmax replaced with newval 
2873  """ 
2874      mask = N.zeros(a.shape) 
2875      if threshmin <> None: 
2876          mask = mask + N.where(N.less(a,threshmin),1,0) 
2877      if threshmax <> None: 
2878          mask = mask + N.where(N.greater(a,threshmax),1,0) 
2879      mask = N.clip(mask,0,1) 
2880      return N.where(mask,newval,a) 
2881   
2882   
2883 - def atrimboth (a,proportiontocut): 
2884      """ 
2885  Slices off the passed proportion of items from BOTH ends of the passed 
2886  array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 
2887  'rightmost' 10% of scores.  You must pre-sort the array if you want 
2888  "proper" trimming.  Slices off LESS if proportion results in a 
2889  non-integer slice index (i.e., conservatively slices off 
2890  proportiontocut). 
2891   
2892  Usage:   atrimboth (a,proportiontocut) 
2893  Returns: trimmed version of array a 
2894  """ 
2895      lowercut = int(proportiontocut*len(a)) 
2896      uppercut = len(a) - lowercut 
2897      return a[lowercut:uppercut] 
2898   
2899   
2900 - def atrim1 (a,proportiontocut,tail='right'): 
2901      """ 
2902  Slices off the passed proportion of items from ONE end of the passed 
2903  array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' 
2904  10% of scores).  Slices off LESS if proportion results in a non-integer 
2905  slice index (i.e., conservatively slices off proportiontocut). 
2906   
2907  Usage:   atrim1(a,proportiontocut,tail='right')  or set tail='left' 
2908  Returns: trimmed version of array a 
2909  """ 
2910      if string.lower(tail) == 'right': 
2911          lowercut = 0 
2912          uppercut = len(a) - int(proportiontocut*len(a)) 
2913      elif string.lower(tail) == 'left': 
2914          lowercut = int(proportiontocut*len(a)) 
2915          uppercut = len(a) 
2916      return a[lowercut:uppercut] 
2917   
2918   
2919  ##################################### 
2920  #####  ACORRELATION FUNCTIONS  ###### 
2921  ##################################### 
2922   
2923 - def acovariance(X): 
2924      """ 
2925  Computes the covariance matrix of a matrix X.  Requires a 2D matrix input. 
2926   
2927  Usage:   acovariance(X) 
2928  Returns: covariance matrix of X 
2929  """ 
2930      if len(X.shape) <> 2: 
2931          raise TypeError, "acovariance requires 2D matrices" 
2932      n = X.shape[0] 
2933      mX = amean(X,0) 
2934      return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX) 
2935   
2936   
2937 - def acorrelation(X): 
2938      """ 
2939  Computes the correlation matrix of a matrix X.  Requires a 2D matrix input. 
2940   
2941  Usage:   acorrelation(X) 
2942  Returns: correlation matrix of X 
2943  """ 
2944      C = acovariance(X) 
2945      V = N.diagonal(C) 
2946      return C / N.sqrt(N.multiply.outer(V,V)) 
2947   
2948   
2949 - def apaired(x,y): 
2950      """ 
2951  Interactively determines the type of data in x and y, and then runs the 
2952  appropriated statistic for paired group data. 
2953   
2954  Usage:   apaired(x,y)     x,y = the two arrays of values to be compared 
2955  Returns: appropriate statistic name, value, and probability 
2956  """ 
2957      samples = '' 
2958      while samples not in ['i','r','I','R','c','C']: 
2959          print '\nIndependent or related samples, or correlation (i,r,c): ', 
2960          samples = raw_input() 
2961   
2962      if samples in ['i','I','r','R']: 
2963          print '\nComparing variances ...', 
2964  # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 
2965          r = obrientransform(x,y) 
2966          f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1)) 
2967          if p<0.05: 
2968              vartype='unequal, p='+str(round(p,4)) 
2969          else: 
2970              vartype='equal' 
2971          print vartype 
2972          if samples in ['i','I']: 
2973              if vartype[0]=='e': 
2974                  t,p = ttest_ind(x,y,None,0) 
2975                  print '\nIndependent samples t-test:  ', round(t,4),round(p,4) 
2976              else: 
2977                  if len(x)>20 or len(y)>20: 
2978                      z,p = ranksums(x,y) 
2979                      print '\nRank Sums test (NONparametric, n>20):  ', round(z,4),round(p,4) 
2980                  else: 
2981                      u,p = mannwhitneyu(x,y) 
2982                      print '\nMann-Whitney U-test (NONparametric, ns<20):  ', round(u,4),round(p,4) 
2983   
2984          else:  # RELATED SAMPLES 
2985              if vartype[0]=='e': 
2986                  t,p = ttest_rel(x,y,0) 
2987                  print '\nRelated samples t-test:  ', round(t,4),round(p,4) 
2988              else: 
2989                  t,p = ranksums(x,y) 
2990                  print '\nWilcoxon T-test (NONparametric):  ', round(t,4),round(p,4) 
2991      else:  # CORRELATION ANALYSIS 
2992          corrtype = '' 
2993          while corrtype not in ['c','C','r','R','d','D']: 
2994              print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', 
2995              corrtype = raw_input() 
2996          if corrtype in ['c','C']: 
2997              m,b,r,p,see = linregress(x,y) 
2998              print '\nLinear regression for continuous variables ...' 
2999              lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]] 
3000              pstat.printcc(lol) 
3001          elif corrtype in ['r','R']: 
3002              r,p = spearmanr(x,y) 
3003              print '\nCorrelation for ranked variables ...' 
3004              print "Spearman's r: ",round(r,4),round(p,4) 
3005          else: # DICHOTOMOUS 
3006              r,p = pointbiserialr(x,y) 
3007              print '\nAssuming x contains a dichotomous variable ...' 
3008              print 'Point Biserial r: ',round(r,4),round(p,4) 
3009      print '\n\n' 
3010      return None 
3011   
3012   
3013 - def apearsonr(x,y,verbose=1): 
3014      """ 
3015  Calculates a Pearson correlation coefficient and returns p.  Taken 
3016  from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195. 
3017   
3018  Usage:   apearsonr(x,y,verbose=1)      where x,y are equal length arrays 
3019  Returns: Pearson's r, two-tailed p-value 
3020  """ 
3021      TINY = 1.0e-20 
3022      n = len(x) 
3023      xmean = amean(x) 
3024      ymean = amean(y) 
3025      r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) 
3026      r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) 
3027      r = (r_num / r_den) 
3028      df = n-2 
3029      t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 
3030      prob = abetai(0.5*df,0.5,df/(df+t*t),verbose) 
3031      return r,prob 
3032   
3033   
3034 - def aspearmanr(x,y): 
3035      """ 
3036  Calculates a Spearman rank-order correlation coefficient.  Taken 
3037  from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. 
3038   
3039  Usage:   aspearmanr(x,y)      where x,y are equal-length arrays 
3040  Returns: Spearman's r, two-tailed p-value 
3041  """ 
3042      TINY = 1e-30 
3043      n = len(x) 
3044      rankx = rankdata(x) 
3045      ranky = rankdata(y) 
3046      dsq = N.add.reduce((rankx-ranky)**2) 
3047      rs = 1 - 6*dsq / float(n*(n**2-1)) 
3048      t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) 
3049      df = n-2 
3050      probrs = abetai(0.5*df,0.5,df/(df+t*t)) 
3051  # probability values for rs are from part 2 of the spearman function in 
3052  # Numerical Recipies, p.510.  They close to tables, but not exact.(?) 
3053      return rs, probrs 
3054   
3055   
3056 - def apointbiserialr(x,y): 
3057      """ 
3058  Calculates a point-biserial correlation coefficient and the associated 
3059  probability value.  Taken from Heiman's Basic Statistics for the Behav. 
3060  Sci (1st), p.194. 
3061   
3062  Usage:   apointbiserialr(x,y)      where x,y are equal length arrays 
3063  Returns: Point-biserial r, two-tailed p-value 
3064  """ 
3065      TINY = 1e-30 
3066      categories = pstat.aunique(x) 
3067      data = pstat.aabut(x,y) 
3068      if len(categories) <> 2: 
3069          raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()." 
3070      else:   # there are 2 categories, continue 
3071          codemap = pstat.aabut(categories,N.arange(2)) 
3072          recoded = pstat.arecode(data,codemap,0) 
3073          x = pstat.alinexand(data,0,categories[0]) 
3074          y = pstat.alinexand(data,0,categories[1]) 
3075          xmean = amean(pstat.acolex(x,1)) 
3076          ymean = amean(pstat.acolex(y,1)) 
3077          n = len(data) 
3078          adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) 
3079          rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust 
3080          df = n-2 
3081          t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) 
3082          prob = abetai(0.5*df,0.5,df/(df+t*t)) 
3083          return rpb, prob 
3084   
3085   
3086 - def akendalltau(x,y): 
3087      """ 
3088  Calculates Kendall's tau ... correlation of ordinal data.  Adapted 
3089  from function kendl1 in Numerical Recipies.  Needs good test-cases.@@@ 
3090   
3091  Usage:   akendalltau(x,y) 
3092  Returns: Kendall's tau, two-tailed p-value 
3093  """ 
3094      n1 = 0 
3095      n2 = 0 
3096      iss = 0 
3097      for j in range(len(x)-1): 
3098          for k in range(j,len(y)): 
3099              a1 = x[j] - x[k] 
3100              a2 = y[j] - y[k] 
3101              aa = a1 * a2 
3102              if (aa):             # neither array has a tie 
3103                  n1 = n1 + 1 
3104                  n2 = n2 + 1 
3105                  if aa > 0: 
3106                      iss = iss + 1 
3107                  else: 
3108                      iss = iss -1 
3109              else: 
3110                  if (a1): 
3111                      n1 = n1 + 1 
3112                  else: 
3113                      n2 = n2 + 1 
3114      tau = iss / math.sqrt(n1*n2) 
3115      svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) 
3116      z = tau / math.sqrt(svar) 
3117      prob = erfcc(abs(z)/1.4142136) 
3118      return tau, prob 
3119   
3120   
3121 - def alinregress(*args): 
3122      """ 
3123  Calculates a regression line on two arrays, x and y, corresponding to x,y 
3124  pairs.  If a single 2D array is passed, alinregress finds dim with 2 levels 
3125  and splits data into x,y pairs along that dim. 
3126   
3127  Usage:   alinregress(*args)    args=2 equal-length arrays, or one 2D array 
3128  Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate 
3129  """ 
3130      TINY = 1.0e-20 
3131      if len(args) == 1:  # more than 1D array? 
3132          args = args[0] 
3133          if len(args) == 2: 
3134              x = args[0] 
3135              y = args[1] 
3136          else: 
3137              x = args[:,0] 
3138              y = args[:,1] 
3139      else: 
3140          x = args[0] 
3141          y = args[1] 
3142      n = len(x) 
3143      xmean = amean(x) 
3144      ymean = amean(y) 
3145      r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) 
3146      r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) 
3147      r = r_num / r_den 
3148      z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY)) 
3149      df = n-2 
3150      t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 
3151      prob = abetai(0.5*df,0.5,df/(df+t*t)) 
3152      slope = r_num / (float(n)*ass(x) - asquare_of_sums(x)) 
3153      intercept = ymean - slope*xmean 
3154      sterrest = math.sqrt(1-r*r)*asamplestdev(y) 
3155      return slope, intercept, r, prob, sterrest 
3156   
3157   
3158  ##################################### 
3159  #####  AINFERENTIAL STATISTICS  ##### 
3160  ##################################### 
3161   
3162 - def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'): 
3163      """ 
3164  Calculates the t-obtained for the independent samples T-test on ONE group 
3165  of scores a, given a population mean.  If printit=1, results are printed 
3166  to the screen.  If printit='filename', the results are output to 'filename' 
3167  using the given writemode (default=append).  Returns t-value, and prob. 
3168   
3169  Usage:   attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') 
3170  Returns: t-value, two-tailed prob 
3171  """ 
3172      if type(a) != N.ArrayType: 
3173          a = N.array(a) 
3174      x = amean(a) 
3175      v = avar(a) 
3176      n = len(a) 
3177      df = n-1 
3178      svar = ((n-1)*v) / float(df) 
3179      t = (x-popmean)/math.sqrt(svar*(1.0/n)) 
3180      prob = abetai(0.5*df,0.5,df/(df+t*t)) 
3181   
3182      if printit <> 0: 
3183          statname = 'Single-sample T-test.' 
3184          outputpairedstats(printit,writemode, 
3185                            'Population','--',popmean,0,0,0, 
3186                            name,n,x,v,N.minimum.reduce(N.ravel(a)), 
3187                            N.maximum.reduce(N.ravel(a)), 
3188                            statname,t,prob) 
3189      return t,prob 
3190   
3191   
3192 - def attest_ind (a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'): 
3193      """ 
3194  Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores 
3195  a, and b.  From Numerical Recipies, p.483.  If printit=1, results are 
3196  printed to the screen.  If printit='filename', the results are output 
3197  to 'filename' using the given writemode (default=append).  Dimension 
3198  can equal None (ravel array first), or an integer (the dimension over 
3199  which to operate on a and b). 
3200   
3201  Usage:: 
3202           attest_ind (a,b,dimension=None,printit=0, 
3203                       Name1='Samp1',Name2='Samp2',writemode='a') 
3204  Returns: t-value, two-tailed p-value 
3205  """ 
3206      if dimension == None: 
3207          a = N.ravel(a) 
3208          b = N.ravel(b) 
3209          dimension = 0 
3210      x1 = amean(a,dimension) 
3211      x2 = amean(b,dimension) 
3212      v1 = avar(a,dimension) 
3213      v2 = avar(b,dimension) 
3214      n1 = a.shape[dimension] 
3215      n2 = b.shape[dimension] 
3216      df = n1+n2-2 
3217      svar = ((n1-1)*v1+(n2-1)*v2) / float(df) 
3218      zerodivproblem = N.equal(svar,0) 
3219      t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2))  # N-D COMPUTATION HERE!!!!!! 
3220      t = N.where(zerodivproblem,1.0,t)           # replace NaN t-values with 1.0 
3221      probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 
3222   
3223      if type(t) == N.ArrayType: 
3224          probs = N.reshape(probs,t.shape) 
3225      if len(probs) == 1: 
3226          probs = probs[0] 
3227           
3228      if printit <> 0: 
3229          if type(t) == N.ArrayType: 
3230              t = t[0] 
3231          if type(probs) == N.ArrayType: 
3232              probs = probs[0] 
3233          statname = 'Independent samples T-test.' 
3234          outputpairedstats(printit,writemode, 
3235                            name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)), 
3236                            N.maximum.reduce(N.ravel(a)), 
3237                            name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)), 
3238                            N.maximum.reduce(N.ravel(b)), 
3239                            statname,t,probs) 
3240          return 
3241      return t, probs 
3242   
3243   
3244 - def attest_rel (a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'): 
3245      """ 
3246  Calculates the t-obtained T-test on TWO RELATED samples of scores, a 
3247  and b.  From Numerical Recipies, p.483.  If printit=1, results are 
3248  printed to the screen.  If printit='filename', the results are output 
3249  to 'filename' using the given writemode (default=append).  Dimension 
3250  can equal None (ravel array first), or an integer (the dimension over 
3251  which to operate on a and b). 
3252   
3253  Usage:: 
3254           attest_rel(a,b,dimension=None,printit=0, 
3255                      name1='Samp1',name2='Samp2',writemode='a') 
3256  Returns: t-value, two-tailed p-value 
3257  """ 
3258      if dimension == None: 
3259          a = N.ravel(a) 
3260          b = N.ravel(b) 
3261          dimension = 0 
3262      if len(a)<>len(b): 
3263          raise ValueError, 'Unequal length arrays.' 
3264      x1 = amean(a,dimension) 
3265      x2 = amean(b,dimension) 
3266      v1 = avar(a,dimension) 
3267      v2 = avar(b,dimension) 
3268      n = a.shape[dimension] 
3269      df = float(n-1) 
3270      d = (a-b).astype('d') 
3271   
3272      denom = N.sqrt((n*N.add.reduce(d*d,dimension) - N.add.reduce(d,dimension)**2) /df) 
3273      zerodivproblem = N.equal(denom,0) 
3274      t = N.add.reduce(d,dimension) / denom      # N-D COMPUTATION HERE!!!!!! 
3275      t = N.where(zerodivproblem,1.0,t)          # replace NaN t-values with 1.0 
3276      t = N.where(zerodivproblem,1.0,t)           # replace NaN t-values with 1.0 
3277      probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 
3278      if type(t) == N.ArrayType: 
3279          probs = N.reshape(probs,t.shape) 
3280      if len(probs) == 1: 
3281          probs = probs[0] 
3282   
3283      if printit <> 0: 
3284          statname = 'Related samples T-test.' 
3285          outputpairedstats(printit,writemode, 
3286                            name1,n,x1,v1,N.minimum.reduce(N.ravel(a)), 
3287                            N.maximum.reduce(N.ravel(a)), 
3288                            name2,n,x2,v2,N.minimum.reduce(N.ravel(b)), 
3289                            N.maximum.reduce(N.ravel(b)), 
3290                            statname,t,probs) 
3291          return 
3292      return t, probs 
3293   
3294   
3295 - def achisquare(f_obs,f_exp=None): 
3296      """ 
3297  Calculates a one-way chi square for array of observed frequencies and returns 
3298  the result.  If no expected frequencies are given, the total N is assumed to 
3299  be equally distributed across all groups. 
3300   
3301  Usage:   achisquare(f_obs, f_exp=None)   f_obs = array of observed cell freq. 
3302  Returns: chisquare-statistic, associated p-value 
3303  """ 
3304   
3305      k = len(f_obs) 
3306      if f_exp == None: 
3307          f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.Float) 
3308      f_exp = f_exp.astype(N.Float) 
3309      chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp) 
3310      return chisq, chisqprob(chisq, k-1) 
3311   
3312   
3313 - def aks_2samp (data1,data2): 
3314      """ 
3315  Computes the Kolmogorov-Smirnof statistic on 2 samples.  Modified from 
3316  Numerical Recipies in C, page 493.  Returns KS D-value, prob.  Not ufunc- 
3317  like. 
3318   
3319  Usage:   aks_2samp(data1,data2)  where data1 and data2 are 1D arrays 
3320  Returns: KS D-value, p-value 
3321  """ 
3322      j1 = 0    # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE 
3323      j2 = 0    # N.zeros(data2.shape[1:]) 
3324      fn1 = 0.0 # N.zeros(data1.shape[1:],N.Float) 
3325      fn2 = 0.0 # N.zeros(data2.shape[1:],N.Float) 
3326      n1 = data1.shape[0] 
3327      n2 = data2.shape[0] 
3328      en1 = n1*1 
3329      en2 = n2*1 
3330      d = N.zeros(data1.shape[1:],N.Float) 
3331      data1 = N.sort(data1,0) 
3332      data2 = N.sort(data2,0) 
3333      while j1 < n1 and j2 < n2: 
3334          d1=data1[j1] 
3335          d2=data2[j2] 
3336          if d1 <= d2: 
3337              fn1 = (j1)/float(en1) 
3338              j1 = j1 + 1 
3339          if d2 <= d1: 
3340              fn2 = (j2)/float(en2) 
3341              j2 = j2 + 1 
3342          dt = (fn2-fn1) 
3343          if abs(dt) > abs(d): 
3344              d = dt 
3345      try: 
3346          en = math.sqrt(en1*en2/float(en1+en2)) 
3347          prob = aksprob((en+0.12+0.11/en)*N.fabs(d)) 
3348      except: 
3349          prob = 1.0 
3350      return d, prob 
3351   
3352   
3353 - def amannwhitneyu(x,y): 
3354      """ 
3355  Calculates a Mann-Whitney U statistic on the provided scores and 
3356  returns the result.  Use only when the n in each condition is < 20 and 
3357  you have 2 independent samples of ranks.  REMEMBER: Mann-Whitney U is 
3358  significant if the u-obtained is LESS THAN or equal to the critical 
3359  value of U. 
3360   
3361  Usage:   amannwhitneyu(x,y)     where x,y are arrays of values for 2 conditions 
3362  Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) 
3363  """ 
3364      n1 = len(x) 
3365      n2 = len(y) 
3366      ranked = rankdata(N.concatenate((x,y))) 
3367      rankx = ranked[0:n1]       # get the x-ranks 
3368      ranky = ranked[n1:]        # the rest are y-ranks 
3369      u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)  # calc U for x 
3370      u2 = n1*n2 - u1                            # remainder is U for y 
3371      bigu = max(u1,u2) 
3372      smallu = min(u1,u2) 
3373      T = math.sqrt(tiecorrect(ranked))  # correction factor for tied scores 
3374      if T == 0: 
3375          raise ValueError, 'All numbers are identical in amannwhitneyu' 
3376      sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) 
3377      z = abs((bigu-n1*n2/2.0) / sd)  # normal approximation for prob calc 
3378      return smallu, 1.0 - zprob(z) 
3379   
3380   
3381 - def atiecorrect(rankvals): 
3382      """ 
3383  Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests. 
3384  See Siegel, S. (1956) Nonparametric Statistics for the Behavioral 
3385  Sciences.  New York: McGraw-Hill.  Code adapted from |Stat rankind.c 
3386  code. 
3387   
3388  Usage:   atiecorrect(rankvals) 
3389  Returns: T correction factor for U or H 
3390  """ 
3391      sorted,posn = ashellsort(N.array(rankvals)) 
3392      n = len(sorted) 
3393      T = 0.0 
3394      i = 0 
3395      while (i<n-1): 
3396          if sorted[i] == sorted[i+1]: 
3397              nties = 1 
3398              while (i<n-1) and (sorted[i] == sorted[i+1]): 
3399                  nties = nties +1 
3400                  i = i +1 
3401              T = T + nties**3 - nties 
3402          i = i+1 
3403      T = T / float(n**3-n) 
3404      return 1.0 - T 
3405   
3406   
3407 - def aranksums(x,y): 
3408      """ 
3409  Calculates the rank sums statistic on the provided scores and returns 
3410  the result. 
3411   
3412  Usage:   aranksums(x,y)     where x,y are arrays of values for 2 conditions 
3413  Returns: z-statistic, two-tailed p-value 
3414  """ 
3415      n1 = len(x) 
3416      n2 = len(y) 
3417      alldata = N.concatenate((x,y)) 
3418      ranked = arankdata(alldata) 
3419      x = ranked[:n1] 
3420      y = ranked[n1:] 
3421      s = sum(x) 
3422      expected = n1*(n1+n2+1) / 2.0 
3423      z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) 
3424      prob = 2*(1.0 -zprob(abs(z))) 
3425      return z, prob 
3426   
3427   
3428 - def awilcoxont(x,y): 
3429      """ 
3430  Calculates the Wilcoxon T-test for related samples and returns the 
3431  result.  A non-parametric T-test. 
3432   
3433  Usage:   awilcoxont(x,y)     where x,y are equal-length arrays for 2 conditions 
3434  Returns: t-statistic, two-tailed p-value 
3435  """ 
3436      if len(x) <> len(y): 
3437          raise ValueError, 'Unequal N in awilcoxont.  Aborting.' 
3438      d = x-y 
3439      d = N.compress(N.not_equal(d,0),d) # Keep all non-zero differences 
3440      count = len(d) 
3441      absd = abs(d) 
3442      absranked = arankdata(absd) 
3443      r_plus = 0.0 
3444      r_minus = 0.0 
3445      for i in range(len(absd)): 
3446          if d[i] < 0: 
3447              r_minus = r_minus + absranked[i] 
3448          else: 
3449              r_plus = r_plus + absranked[i] 
3450      wt = min(r_plus, r_minus) 
3451      mn = count * (count+1) * 0.25 
3452      se =  math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) 
3453      z = math.fabs(wt-mn) / se 
3454      z = math.fabs(wt-mn) / se 
3455      prob = 2*(1.0 -zprob(abs(z))) 
3456      return wt, prob 
3457   
3458   
3459 - def akruskalwallish(*args): 
3460      """ 
3461  The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more 
3462  groups, requiring at least 5 subjects in each group.  This function 
3463  calculates the Kruskal-Wallis H and associated p-value for 3 or more 
3464  independent samples. 
3465   
3466  Usage:   akruskalwallish(*args)     args are separate arrays for 3+ conditions 
3467  Returns: H-statistic (corrected for ties), associated p-value 
3468  """ 
3469      assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()" 
3470      args = list(args) 
3471      n = [0]*len(args) 
3472      n = map(len,args) 
3473      all = [] 
3474      for i in range(len(args)): 
3475          all = all + args[i].tolist() 
3476      ranked = rankdata(all) 
3477      T = tiecorrect(ranked) 
3478      for i in range(len(args)): 
3479          args[i] = ranked[0:n[i]] 
3480          del ranked[0:n[i]] 
3481      rsums = [] 
3482      for i in range(len(args)): 
3483          rsums.append(sum(args[i])**2) 
3484          rsums[i] = rsums[i] / float(n[i]) 
3485      ssbn = sum(rsums) 
3486      totaln = sum(n) 
3487      h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) 
3488      df = len(args) - 1 
3489      if T == 0: 
3490          raise ValueError, 'All numbers are identical in akruskalwallish' 
3491      h = h / float(T) 
3492      return h, chisqprob(h,df) 
3493   
3494   
3495 - def afriedmanchisquare(*args): 
3496      """ 
3497  Friedman Chi-Square is a non-parametric, one-way within-subjects 
3498  ANOVA.  This function calculates the Friedman Chi-square test for 
3499  repeated measures and returns the result, along with the associated 
3500  probability value.  It assumes 3 or more repeated measures.  Only 3 
3501  levels requires a minimum of 10 subjects in the study.  Four levels 
3502  requires 5 subjects per level(??). 
3503   
3504  Usage:   afriedmanchisquare(*args)   args are separate arrays for 2+ conditions 
3505  Returns: chi-square statistic, associated p-value 
3506  """ 
3507      k = len(args) 
3508      if k < 3: 
3509          raise ValueError, '\nLess than 3 levels.  Friedman test not appropriate.\n' 
3510      n = len(args[0]) 
3511      data = apply(pstat.aabut,args) 
3512      data = data.astype(N.Float) 
3513      for i in range(len(data)): 
3514          data[i] = arankdata(data[i]) 
3515      ssbn = asum(asum(args,1)**2) 
3516      chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) 
3517      return chisq, chisqprob(chisq,k-1) 
3518   
3519   
3520  ##################################### 
3521  ####  APROBABILITY CALCULATIONS  #### 
3522  ##################################### 
3523   
3524 - def achisqprob(chisq,df): 
3525      """ 
3526  Returns the (1-tail) probability value associated with the provided chi-square 
3527  value and df.  Heavily modified from chisq.c in Gary Perlman's |Stat.  Can 
3528  handle multiple dimensions. 
3529   
3530  Usage:   achisqprob(chisq,df)    chisq=chisquare stat., df=degrees of freedom 
3531  """ 
3532      BIG = 200.0 
3533      def ex(x): 
3534          BIG = 200.0 
3535          exponents = N.where(N.less(x,-BIG),-BIG,x) 
3536          return N.exp(exponents) 
3537   
3538      if type(chisq) == N.ArrayType: 
3539          arrayflag = 1 
3540      else: 
3541          arrayflag = 0 
3542          chisq = N.array([chisq]) 
3543      if df < 1: 
3544          return N.ones(chisq.shape,N.float) 
3545      probs = N.zeros(chisq.shape,N.Float) 
3546      probs = N.where(N.less_equal(chisq,0),1.0,probs)  # set prob=1 for chisq<0 
3547      a = 0.5 * chisq 
3548      if df > 1: 
3549          y = ex(-a) 
3550      if df%2 == 0: 
3551          even = 1 
3552          s = y*1 
3553          s2 = s*1 
3554      else: 
3555          even = 0 
3556          s = 2.0 * azprob(-N.sqrt(chisq)) 
3557          s2 = s*1 
3558      if (df > 2): 
3559          chisq = 0.5 * (df - 1.0) 
3560          if even: 
3561              z = N.ones(probs.shape,N.Float) 
3562          else: 
3563              z = 0.5 *N.ones(probs.shape,N.Float) 
3564          if even: 
3565              e = N.zeros(probs.shape,N.Float) 
3566          else: 
3567              e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.Float) 
3568          c = N.log(a) 
3569          mask = N.zeros(probs.shape) 
3570          a_big = N.greater(a,BIG) 
3571          a_big_frozen = -1 *N.ones(probs.shape,N.Float) 
3572          totalelements = N.multiply.reduce(N.array(probs.shape)) 
3573          while asum(mask)<>totalelements: 
3574              e = N.log(z) + e 
3575              s = s + ex(c*z-a-e) 
3576              z = z + 1.0 
3577              print z, e, s 
3578              newmask = N.greater(z,chisq) 
3579              a_big_frozen = N.where(newmask*N.equal(mask,0)*a_big, s, a_big_frozen) 
3580              mask = N.clip(newmask+mask,0,1) 
3581          if even: 
3582              z = N.ones(probs.shape,N.Float) 
3583              e = N.ones(probs.shape,N.Float) 
3584          else: 
3585              z = 0.5 *N.ones(probs.shape,N.Float) 
3586              e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.Float) 
3587          c = 0.0 
3588          mask = N.zeros(probs.shape) 
3589          a_notbig_frozen = -1 *N.ones(probs.shape,N.Float) 
3590          while asum(mask)<>totalelements: 
3591              e = e * (a/z.astype(N.Float)) 
3592              c = c + e 
3593              z = z + 1.0 
3594              print '#2', z, e, c, s, c*y+s2 
3595              newmask = N.greater(z,chisq) 
3596              a_notbig_frozen = N.where(newmask*N.equal(mask,0)*(1-a_big), 
3597                                        c*y+s2, a_notbig_frozen) 
3598              mask = N.clip(newmask+mask,0,1) 
3599          probs = N.where(N.equal(probs,1),1, 
3600                          N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen)) 
3601          return probs 
3602      else: 
3603          return s 
3604   
3605   
3606 - def aerfcc(x): 
3607      """ 
3608  Returns the complementary error function erfc(x) with fractional error 
3609  everywhere less than 1.2e-7.  Adapted from Numerical Recipies.  Can 
3610  handle multiple dimensions. 
3611   
3612  Usage:   aerfcc(x) 
3613  """ 
3614      z = abs(x) 
3615      t = 1.0 / (1.0+0.5*z) 
3616      ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) 
3617      return N.where(N.greater_equal(x,0), ans, 2.0-ans) 
3618   
3619   
3620 - def azprob(z): 
3621      """ 
3622  Returns the area under the normal curve 'to the left of' the given z value. 
3623  Thus, :: 
3624      for z<0, zprob(z) = 1-tail probability 
3625      for z>0, 1.0-zprob(z) = 1-tail probability 
3626      for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 
3627  Adapted from z.c in Gary Perlman's |Stat.  Can handle multiple dimensions. 
3628   
3629  Usage:   azprob(z)    where z is a z-value 
3630  """ 
3631      def yfunc(y): 
3632          x = (((((((((((((-0.000045255659 * y 
3633                           +0.000152529290) * y -0.000019538132) * y 
3634                         -0.000676904986) * y +0.001390604284) * y 
3635                       -0.000794620820) * y -0.002034254874) * y 
3636                     +0.006549791214) * y -0.010557625006) * y 
3637                   +0.011630447319) * y -0.009279453341) * y 
3638                 +0.005353579108) * y -0.002141268741) * y 
3639               +0.000535310849) * y +0.999936657524 
3640          return x 
3641   
3642      def wfunc(w): 
3643          x = ((((((((0.000124818987 * w 
3644                      -0.001075204047) * w +0.005198775019) * w 
3645                    -0.019198292004) * w +0.059054035642) * w 
3646                  -0.151968751364) * w +0.319152932694) * w 
3647                -0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0 
3648          return x 
3649   
3650      Z_MAX = 6.0    # maximum meaningful z-value 
3651      x = N.zeros(z.shape,N.Float) # initialize 
3652      y = 0.5 * N.fabs(z) 
3653      x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0)) # get x's 
3654      x = N.where(N.greater(y,Z_MAX*0.5),1.0,x)          # kill those with big Z 
3655      prob = N.where(N.greater(z,0),(x+1)*0.5,(1-x)*0.5) 
3656      return prob 
3657   
3658   
3659 - def aksprob(alam): 
3660       """ 
3661  Returns the probability value for a K-S statistic computed via ks_2samp. 
3662  Adapted from Numerical Recipies.  Can handle multiple dimensions. 
3663   
3664  Usage:   aksprob(alam) 
3665  """ 
3666       if type(alam) == N.ArrayType: 
3667           frozen = -1 *N.ones(alam.shape,N.Float64) 
3668           alam = alam.astype(N.Float64) 
3669           arrayflag = 1 
3670       else: 
3671           frozen = N.array(-1.) 
3672           alam = N.array(alam,N.Float64) 
3673       mask = N.zeros(alam.shape) 
3674       fac = 2.0 *N.ones(alam.shape,N.Float) 
3675       sum = N.zeros(alam.shape,N.Float) 
3676       termbf = N.zeros(alam.shape,N.Float) 
3677       a2 = N.array(-2.0*alam*alam,N.Float64) 
3678       totalelements = N.multiply.reduce(N.array(mask.shape)) 
3679       for j in range(1,201): 
3680           if asum(mask) == totalelements: 
3681               break 
3682           exponents = (a2*j*j) 
3683           overflowmask = N.less(exponents,-746) 
3684           frozen = N.where(overflowmask,0,frozen) 
3685           mask = mask+overflowmask 
3686           term = fac*N.exp(exponents) 
3687           sum = sum + term 
3688           newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) + 
3689                             N.less(abs(term),1.0e-8*sum), 1, 0) 
3690           frozen = N.where(newmask*N.equal(mask,0), sum, frozen) 
3691           mask = N.clip(mask+newmask,0,1) 
3692           fac = -fac 
3693           termbf = abs(term) 
3694       if arrayflag: 
3695           return N.where(N.equal(frozen,-1), 1.0, frozen)  # 1.0 if doesn't converge 
3696       else: 
3697           return N.where(N.equal(frozen,-1), 1.0, frozen)[0]  # 1.0 if doesn't converge 
3698   
3699   
3700 - def afprob (dfnum, dfden, F): 
3701      """ 
3702  Returns the 1-tailed significance level (p-value) of an F statistic 
3703  given the degrees of freedom for the numerator (dfR-dfF) and the degrees 
3704  of freedom for the denominator (dfF).  Can handle multiple dims for F. 
3705   
3706  Usage:   afprob(dfnum, dfden, F)   where usually dfnum=dfbn, dfden=dfwn 
3707  """ 
3708      if type(F) == N.ArrayType: 
3709          return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F)) 
3710      else: 
3711          return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F)) 
3712   
3713   
3714 - def abetacf(a,b,x,verbose=1): 
3715      """ 
3716  Evaluates the continued fraction form of the incomplete Beta function, 
3717  betai.  (Adapted from: Numerical Recipies in C.)  Can handle multiple 
3718  dimensions for x. 
3719   
3720  Usage:   abetacf(a,b,x,verbose=1) 
3721  """ 
3722      ITMAX = 200 
3723      EPS = 3.0e-7 
3724   
3725      arrayflag = 1 
3726      if type(x) == N.ArrayType: 
3727          frozen = N.ones(x.shape,N.Float) *-1  #start out w/ -1s, should replace all 
3728      else: 
3729          arrayflag = 0 
3730          frozen = N.array([-1]) 
3731          x = N.array([x]) 
3732      mask = N.zeros(x.shape) 
3733      bm = az = am = 1.0 
3734      qab = a+b 
3735      qap = a+1.0 
3736      qam = a-1.0 
3737      bz = 1.0-qab*x/qap 
3738      for i in range(ITMAX+1): 
3739          if N.sum(N.ravel(N.equal(frozen,-1)))==0: 
3740              break 
3741          em = float(i+1) 
3742          tem = em + em 
3743          d = em*(b-em)*x/((qam+tem)*(a+tem)) 
3744          ap = az + d*am 
3745          bp = bz+d*bm 
3746          d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) 
3747          app = ap+d*az 
3748          bpp = bp+d*bz 
3749          aold = az*1 
3750          am = ap/bpp 
3751          bm = bp/bpp 
3752          az = app/bpp 
3753          bz = 1.0 
3754          newmask = N.less(abs(az-aold),EPS*abs(az)) 
3755          frozen = N.where(newmask*N.equal(mask,0), az, frozen) 
3756          mask = N.clip(mask+newmask,0,1) 
3757      noconverge = asum(N.equal(frozen,-1)) 
3758      if noconverge <> 0 and verbose: 
3759          print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements' 
3760      if arrayflag: 
3761          return frozen 
3762      else: 
3763          return frozen[0] 
3764   
3765   
3766 - def agammln(xx): 
3767      """ 
3768  Returns the gamma function of xx.:: 
3769      Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. 
3770  Adapted from: Numerical Recipies in C.  Can handle multiple dims ... but 
3771  probably doesn't normally have to. 
3772   
3773  Usage:   agammln(xx) 
3774  """ 
3775      coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 
3776               0.120858003e-2, -0.536382e-5] 
3777      x = xx - 1.0 
3778      tmp = x + 5.5 
3779      tmp = tmp - (x+0.5)*N.log(tmp) 
3780      ser = 1.0 
3781      for j in range(len(coeff)): 
3782          x = x + 1 
3783          ser = ser + coeff[j]/x 
3784      return -tmp + N.log(2.50662827465*ser) 
3785   
3786   
3787 - def abetai(a,b,x,verbose=1): 
3788      """ 
3789  Returns the incomplete beta function:: 
3790   
3791      I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) 
3792   
3793  where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma 
3794  function of a.  The continued fraction formulation is implemented 
3795  here, using the betacf function.  (Adapted from: Numerical Recipies in 
3796  C.)  Can handle multiple dimensions. 
3797   
3798  Usage:   abetai(a,b,x,verbose=1) 
3799  """ 
3800      TINY = 1e-15 
3801      if type(a) == N.ArrayType: 
3802          if asum(N.less(x,0)+N.greater(x,1)) <> 0: 
3803              raise ValueError, 'Bad x in abetai' 
3804      x = N.where(N.equal(x,0),TINY,x) 
3805      x = N.where(N.equal(x,1.0),1-TINY,x) 
3806   
3807      bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1) 
3808      exponents = ( gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b* 
3809                    N.log(1.0-x) ) 
3810      # 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW 
3811      exponents = N.where(N.less(exponents,-740),-740,exponents) 
3812      bt = N.exp(exponents) 
3813      if type(x) == N.ArrayType: 
3814          ans = N.where(N.less(x,(a+1)/(a+b+2.0)), 
3815                        bt*abetacf(a,b,x,verbose)/float(a), 
3816                        1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)) 
3817      else: 
3818          if x<(a+1)/(a+b+2.0): 
3819              ans = bt*abetacf(a,b,x,verbose)/float(a) 
3820          else: 
3821              ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b) 
3822      return ans 
3823   
3824   
3825  ##################################### 
3826  #######  AANOVA CALCULATIONS  ####### 
3827  ##################################### 
3828   
3829   import LinearAlgebra, operator 
3830   LA = LinearAlgebra 
3831   
3832 - def aglm(data,para): 
3833      """ 
3834  Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken 
3835  from:: 
3836      Peterson et al. Statistical limitations in functional neuroimaging 
3837      I. Non-inferential methods and statistical models.  Phil Trans Royal Soc 
3838      Lond B 354: 1239-1260. 
3839   
3840  Usage:   aglm(data,para) 
3841  Returns: statistic, p-value ??? 
3842  """ 
3843      if len(para) <> len(data): 
3844          print "data and para must be same length in aglm" 
3845          return 
3846      n = len(para) 
3847      p = pstat.aunique(para) 
3848      x = N.zeros((n,len(p)))  # design matrix 
3849      for l in range(len(p)): 
3850          x[:,l] = N.equal(para,p[l]) 
3851      b = N.dot(N.dot(LA.inverse(N.dot(N.transpose(x),x)),  # i.e., b=inv(X'X)X'Y 
3852                      N.transpose(x)), 
3853                data) 
3854      diffs = (data - N.dot(x,b)) 
3855      s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs) 
3856   
3857      if len(p) == 2:  # ttest_ind 
3858          c = N.array([1,-1]) 
3859          df = n-2 
3860          fact = asum(1.0/asum(x,0))  # i.e., 1/n1 + 1/n2 + 1/n3 ... 
3861          t = N.dot(c,b) / N.sqrt(s_sq*fact) 
3862          probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 
3863          return t, probs 
3864   
3865   
3866 - def aF_oneway(*args): 
3867      """ 
3868  Performs a 1-way ANOVA, returning an F-value and probability given 
3869  any number of groups.  From Heiman, pp.394-7. 
3870   
3871  Usage:   aF_oneway (*args)    where *args is 2 or more arrays, one per treatment group 
3872  Returns: f-value, probability 
3873  """ 
3874      na = len(args)            # ANOVA on 'na' groups, each in it's own array 
3875      means = [0]*na 
3876      vars = [0]*na 
3877      ns = [0]*na 
3878      alldata = [] 
3879      tmp = map(N.array,args) 
3880      means = map(amean,tmp) 
3881      vars = map(avar,tmp) 
3882      ns = map(len,args) 
3883      alldata = N.concatenate(args) 
3884      bign = len(alldata) 
3885      sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) 
3886      ssbn = 0 
3887      for a in args: 
3888          ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a)) 
3889      ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) 
3890      sswn = sstot-ssbn 
3891      dfbn = na-1 
3892      dfwn = bign - na 
3893      msb = ssbn/float(dfbn) 
3894      msw = sswn/float(dfwn) 
3895      f = msb/msw 
3896      prob = fprob(dfbn,dfwn,f) 
3897      return f, prob 
3898   
3899   
3900 - def aF_value (ER,EF,dfR,dfF): 
3901      """ 
3902  Returns an F-statistic given the following:: 
3903          ER  = error associated with the null hypothesis (the Restricted model) 
3904          EF  = error associated with the alternate hypothesis (the Full model) 
3905          dfR = degrees of freedom the Restricted model 
3906          dfF = degrees of freedom associated with the Restricted model 
3907  """ 
3908      return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF))) 
3909   
3910   
3911 - def outputfstats(Enum, Eden, dfnum, dfden, f, prob): 
3912       Enum = round(Enum,3) 
3913       Eden = round(Eden,3) 
3914       dfnum = round(Enum,3) 
3915       dfden = round(dfden,3) 
3916       f = round(f,3) 
3917       prob = round(prob,3) 
3918       suffix = ''                       # for *s after the p-value 
3919       if  prob < 0.001:  suffix = '  ***' 
3920       elif prob < 0.01:  suffix = '  **' 
3921       elif prob < 0.05:  suffix = '  *' 
3922       title = [['EF/ER','DF','Mean Square','F-value','prob','']] 
3923       lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix], 
3924                     [Eden, dfden, round(Eden/float(dfden),3),'','','']] 
3925       pstat.printcc(lofl) 
3926       return 
3927   
3928   
3929 - def F_value_multivariate(ER, EF, dfnum, dfden): 
3930       """ 
3931  Returns an F-statistic given the following:: 
3932          ER  = error associated with the null hypothesis (the Restricted model) 
3933          EF  = error associated with the alternate hypothesis (the Full model) 
3934          dfR = degrees of freedom the Restricted model 
3935          dfF = degrees of freedom associated with the Restricted model 
3936  where ER and EF are matrices from a multivariate F calculation. 
3937  """ 
3938       if type(ER) in [IntType, FloatType]: 
3939           ER = N.array([[ER]]) 
3940       if type(EF) in [IntType, FloatType]: 
3941           EF = N.array([[EF]]) 
3942       n_um = (LA.determinant(ER) - LA.determinant(EF)) / float(dfnum) 
3943       d_en = LA.determinant(EF) / float(dfden) 
3944       return n_um / d_en 
3945   
3946   
3947  ##################################### 
3948  #######  ASUPPORT FUNCTIONS  ######## 
3949  ##################################### 
3950   
3951 - def asign(a): 
3952      """ 
3953  Usage:   asign(a) 
3954  Returns: array shape of a, with -1 where a<0 and +1 where a>=0 
3955  """ 
3956      a = N.asarray(a) 
3957      if ((type(a) == type(1.4)) or (type(a) == type(1))): 
3958          return a-a-N.less(a,0)+N.greater(a,0) 
3959      else: 
3960          return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0) 
3961   
3962   
3963 - def asum (a, dimension=None,keepdims=0): 
3964       """ 
3965  An alternative to the Numeric.add.reduce function, which allows one to 
3966  (1) collapse over multiple dimensions at once, and/or (2) to retain 
3967  all dimensions in the original array (squashing one down to size. 
3968  Dimension can equal None (ravel array first), an integer (the 
3969  dimension over which to operate), or a sequence (operate over multiple 
3970  dimensions).  If keepdims=1, the resulting array will have as many 
3971  dimensions as the input array. 
3972   
3973  Usage:   asum(a, dimension=None, keepdims=0) 
3974  Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1 
3975  """ 
3976       if type(a) == N.ArrayType and a.typecode() in ['l','s','b']: 
3977           a = a.astype(N.Float) 
3978       if dimension == None: 
3979           s = N.sum(N.ravel(a)) 
3980       elif type(dimension) in [IntType,FloatType]: 
3981           s = N.add.reduce(a, dimension) 
3982           if keepdims == 1: 
3983               shp = list(a.shape) 
3984               shp[dimension] = 1 
3985               s = N.reshape(s,shp) 
3986       else: # must be a SEQUENCE of dims to sum over 
3987          dims = list(dimension) 
3988          dims.sort() 
3989          dims.reverse() 
3990          s = a *1.0 
3991          for dim in dims: 
3992              s = N.add.reduce(s,dim) 
3993          if keepdims == 1: 
3994              shp = list(a.shape) 
3995              for dim in dims: 
3996                  shp[dim] = 1 
3997              s = N.reshape(s,shp) 
3998       return s 
3999   
4000   
4001 - def acumsum (a,dimension=None): 
4002      """ 
4003  Returns an array consisting of the cumulative sum of the items in the 
4004  passed array.  Dimension can equal None (ravel array first), an 
4005  integer (the dimension over which to operate), or a sequence (operate 
4006  over multiple dimensions, but this last one just barely makes sense). 
4007   
4008  Usage:   acumsum(a,dimension=None) 
4009  """ 
4010      if dimension == None: 
4011          a = N.ravel(a) 
4012          dimension = 0 
4013      if type(dimension) in [ListType, TupleType, N.ArrayType]: 
4014          dimension = list(dimension) 
4015          dimension.sort() 
4016          dimension.reverse() 
4017          for d in dimension: 
4018              a = N.add.accumulate(a,d) 
4019          return a 
4020      else: 
4021          return N.add.accumulate(a,dimension) 
4022   
4023   
4024 - def ass(inarray, dimension=None, keepdims=0): 
4025      """ 
4026  Squares each value in the passed array, adds these squares & returns 
4027  the result.  Unfortunate function name. :-) Defaults to ALL values in 
4028  the array.  Dimension can equal None (ravel array first), an integer 
4029  (the dimension over which to operate), or a sequence (operate over 
4030  multiple dimensions).  Set keepdims=1 to maintain the original number 
4031  of dimensions. 
4032   
4033  Usage:   ass(inarray, dimension=None, keepdims=0) 
4034  Returns: sum-along-'dimension' for (inarray*inarray) 
4035  """ 
4036      if dimension == None: 
4037          inarray = N.ravel(inarray) 
4038          dimension = 0 
4039      return asum(inarray*inarray,dimension,keepdims) 
4040   
4041   
4042 - def asummult (array1,array2,dimension=None,keepdims=0): 
4043      """ 
4044  Multiplies elements in array1 and array2, element by element, and 
4045  returns the sum (along 'dimension') of all resulting multiplications. 
4046  Dimension can equal None (ravel array first), an integer (the 
4047  dimension over which to operate), or a sequence (operate over multiple 
4048  dimensions).  A trivial function, but included for completeness. 
4049   
4050  Usage:   asummult(array1,array2,dimension=None,keepdims=0) 
4051  """ 
4052      if dimension == None: 
4053          array1 = N.ravel(array1) 
4054          array2 = N.ravel(array2) 
4055          dimension = 0 
4056      return asum(array1*array2,dimension,keepdims) 
4057   
4058   
4059 - def asquare_of_sums(inarray, dimension=None, keepdims=0): 
4060      """ 
4061  Adds the values in the passed array, squares that sum, and returns the 
4062  result.  Dimension can equal None (ravel array first), an integer (the 
4063  dimension over which to operate), or a sequence (operate over multiple 
4064  dimensions).  If keepdims=1, the returned array will have the same 
4065  NUMBER of dimensions as the original. 
4066   
4067  Usage:   asquare_of_sums(inarray, dimension=None, keepdims=0) 
4068  Returns: the square of the sum over dim(s) in dimension 
4069  """ 
4070      if dimension == None: 
4071          inarray = N.ravel(inarray) 
4072          dimension = 0 
4073      s = asum(inarray,dimension,keepdims) 
4074      if type(s) == N.ArrayType: 
4075          return s.astype(N.Float)*s 
4076      else: 
4077          return float(s)*s 
4078   
4079   
4080 - def asumdiffsquared(a,b, dimension=None, keepdims=0): 
4081      """ 
4082  Takes pairwise differences of the values in arrays a and b, squares 
4083  these differences, and returns the sum of these squares.  Dimension 
4084  can equal None (ravel array first), an integer (the dimension over 
4085  which to operate), or a sequence (operate over multiple dimensions). 
4086  keepdims=1 means the return shape = len(a.shape) = len(b.shape) 
4087   
4088  Usage:   asumdiffsquared(a,b) 
4089  Returns: sum[ravel(a-b)**2] 
4090  """ 
4091      if dimension == None: 
4092          inarray = N.ravel(a) 
4093          dimension = 0 
4094      return asum((a-b)**2,dimension,keepdims) 
4095   
4096   
4097 - def ashellsort(inarray): 
4098      """ 
4099  Shellsort algorithm.  Sorts a 1D-array. 
4100   
4101  Usage:   ashellsort(inarray) 
4102  Returns: sorted-inarray, sorting-index-vector (for original array) 
4103  """ 
4104      n = len(inarray) 
4105      svec = inarray *1.0 
4106      ivec = range(n) 
4107      gap = n/2   # integer division needed 
4108      while gap >0: 
4109          for i in range(gap,n): 
4110              for j in range(i-gap,-1,-gap): 
4111                  while j>=0 and svec[j]>svec[j+gap]: 
4112                      temp        = svec[j] 
4113                      svec[j]     = svec[j+gap] 
4114                      svec[j+gap] = temp 
4115                      itemp       = ivec[j] 
4116                      ivec[j]     = ivec[j+gap] 
4117                      ivec[j+gap] = itemp 
4118          gap = gap / 2  # integer division needed 
4119  #    svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]] 
4120      return svec, ivec 
4121   
4122   
4123 - def arankdata(inarray): 
4124      """ 
4125  Ranks the data in inarray, dealing with ties appropritely.  Assumes 
4126  a 1D inarray.  Adapted from Gary Perlman's |Stat ranksort. 
4127   
4128  Usage:   arankdata(inarray) 
4129  Returns: array of length equal to inarray, containing rank scores 
4130  """ 
4131      n = len(inarray) 
4132      svec, ivec = ashellsort(inarray) 
4133      sumranks = 0 
4134      dupcount = 0 
4135      newarray = N.zeros(n,N.Float) 
4136      for i in range(n): 
4137          sumranks = sumranks + i 
4138          dupcount = dupcount + 1 
4139          if i==n-1 or svec[i] <> svec[i+1]: 
4140              averank = sumranks / float(dupcount) + 1 
4141              for j in range(i-dupcount+1,i+1): 
4142                  newarray[ivec[j]] = averank 
4143              sumranks = 0 
4144              dupcount = 0 
4145      return newarray 
4146   
4147   
4148 - def afindwithin(data): 
4149      """ 
4150  Returns a binary vector, 1=within-subject factor, 0=between.  Input 
4151  equals the entire data array (i.e., column 1=random factor, last 
4152  column = measured values. 
4153   
4154  Usage:   afindwithin(data)     data in |Stat format 
4155  """ 
4156      numfact = len(data[0])-2 
4157      withinvec = [0]*numfact 
4158      for col in range(1,numfact+1): 
4159          rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0])  # get 1 level of this factor 
4160          if len(pstat.unique(pstat.colex(rows,0))) < len(rows):   # if fewer subjects than scores on this factor 
4161              withinvec[col-1] = 1 
4162      return withinvec 
4163   
4164   
4165   ######################################################### 
4166   ######################################################### 
4167   ######  RE-DEFINE DISPATCHES TO INCLUDE ARRAYS  ######### 
4168   ######################################################### 
4169   ######################################################### 
4170   
4171  ## CENTRAL TENDENCY: 
4172   geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), 
4173                              (ageometricmean, (N.ArrayType,)) ) 
4174   harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), 
4175                             (aharmonicmean, (N.ArrayType,)) ) 
4176   mean = Dispatch ( (lmean, (ListType, TupleType)), 
4177                     (amean, (N.ArrayType,)) ) 
4178   median = Dispatch ( (lmedian, (ListType, TupleType)), 
4179                       (amedian, (N.ArrayType,)) ) 
4180   medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), 
4181                            (amedianscore, (N.ArrayType,)) ) 
4182   mode = Dispatch ( (lmode, (ListType, TupleType)), 
4183                     (amode, (N.ArrayType,)) ) 
4184   tmean = Dispatch ( (atmean, (N.ArrayType,)) ) 
4185   tvar = Dispatch ( (atvar, (N.ArrayType,)) ) 
4186   tstdev = Dispatch ( (atstdev, (N.ArrayType,)) ) 
4187   tsem = Dispatch ( (atsem, (N.ArrayType,)) ) 
4188   
4189  ## VARIATION: 
4190   moment = Dispatch ( (lmoment, (ListType, TupleType)), 
4191                       (amoment, (N.ArrayType,)) ) 
4192   variation = Dispatch ( (lvariation, (ListType, TupleType)), 
4193                          (avariation, (N.ArrayType,)) ) 
4194   skew = Dispatch ( (lskew, (ListType, TupleType)), 
4195                     (askew, (N.ArrayType,)) ) 
4196   kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), 
4197                         (akurtosis, (N.ArrayType,)) ) 
4198   describe = Dispatch ( (ldescribe, (ListType, TupleType)), 
4199                         (adescribe, (N.ArrayType,)) ) 
4200   
4201  ## DISTRIBUTION TESTS 
4202   
4203   skewtest = Dispatch ( (askewtest, (ListType, TupleType)), 
4204                         (askewtest, (N.ArrayType,)) ) 
4205   kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)), 
4206                             (akurtosistest, (N.ArrayType,)) ) 
4207   normaltest = Dispatch ( (anormaltest, (ListType, TupleType)), 
4208                           (anormaltest, (N.ArrayType,)) ) 
4209   
4210  ## FREQUENCY STATS: 
4211   itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), 
4212                         (aitemfreq, (N.ArrayType,)) ) 
4213   scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), 
4214                                  (ascoreatpercentile, (N.ArrayType,)) ) 
4215   percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), 
4216                                   (apercentileofscore, (N.ArrayType,)) ) 
4217   histogram = Dispatch ( (lhistogram, (ListType, TupleType)), 
4218                          (ahistogram, (N.ArrayType,)) ) 
4219   cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), 
4220                        (acumfreq, (N.ArrayType,)) ) 
4221   relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), 
4222                        (arelfreq, (N.ArrayType,)) ) 
4223    
4224  ## VARIABILITY: 
4225   obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), 
4226                                (aobrientransform, (N.ArrayType,)) ) 
4227   samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), 
4228                          (asamplevar, (N.ArrayType,)) ) 
4229   samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), 
4230                            (asamplestdev, (N.ArrayType,)) ) 
4231   signaltonoise = Dispatch( (asignaltonoise, (N.ArrayType,)),) 
4232   var = Dispatch ( (lvar, (ListType, TupleType)), 
4233                    (avar, (N.ArrayType,)) ) 
4234   stdev = Dispatch ( (lstdev, (ListType, TupleType)), 
4235                      (astdev, (N.ArrayType,)) ) 
4236   sterr = Dispatch ( (lsterr, (ListType, TupleType)), 
4237                      (asterr, (N.ArrayType,)) ) 
4238   sem = Dispatch ( (lsem, (ListType, TupleType)), 
4239                    (asem, (N.ArrayType,)) ) 
4240   z = Dispatch ( (lz, (ListType, TupleType)), 
4241                  (az, (N.ArrayType,)) ) 
4242   zs = Dispatch ( (lzs, (ListType, TupleType)), 
4243                   (azs, (N.ArrayType,)) ) 
4244    
4245  ## TRIMMING FCNS: 
4246   threshold = Dispatch( (athreshold, (N.ArrayType,)),) 
4247   trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), 
4248                         (atrimboth, (N.ArrayType,)) ) 
4249   trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), 
4250                      (atrim1, (N.ArrayType,)) ) 
4251    
4252  ## CORRELATION FCNS: 
4253   paired = Dispatch ( (lpaired, (ListType, TupleType)), 
4254                       (apaired, (N.ArrayType,)) ) 
4255   pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), 
4256                         (apearsonr, (N.ArrayType,)) ) 
4257   spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), 
4258                          (aspearmanr, (N.ArrayType,)) ) 
4259   pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), 
4260                               (apointbiserialr, (N.ArrayType,)) ) 
4261   kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), 
4262                           (akendalltau, (N.ArrayType,)) ) 
4263   linregress = Dispatch ( (llinregress, (ListType, TupleType)), 
4264                           (alinregress, (N.ArrayType,)) ) 
4265    
4266  ## INFERENTIAL STATS: 
4267   ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), 
4268                            (attest_1samp, (N.ArrayType,)) ) 
4269   ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), 
4270                          (attest_ind, (N.ArrayType,)) ) 
4271   ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), 
4272                          (attest_rel, (N.ArrayType,)) ) 
4273   chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), 
4274                          (achisquare, (N.ArrayType,)) ) 
4275   ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), 
4276                         (aks_2samp, (N.ArrayType,)) ) 
4277   mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), 
4278                             (amannwhitneyu, (N.ArrayType,)) ) 
4279   tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), 
4280                           (atiecorrect, (N.ArrayType,)) ) 
4281   ranksums = Dispatch ( (lranksums, (ListType, TupleType)), 
4282                         (aranksums, (N.ArrayType,)) ) 
4283   wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), 
4284                          (awilcoxont, (N.ArrayType,)) ) 
4285   kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), 
4286                               (akruskalwallish, (N.ArrayType,)) ) 
4287   friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), 
4288                                  (afriedmanchisquare, (N.ArrayType,)) ) 
4289    
4290  ## PROBABILITY CALCS: 
4291   chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), 
4292                          (achisqprob, (N.ArrayType,)) ) 
4293   zprob = Dispatch ( (lzprob, (IntType, FloatType)), 
4294                      (azprob, (N.ArrayType,)) ) 
4295   ksprob = Dispatch ( (lksprob, (IntType, FloatType)), 
4296                       (aksprob, (N.ArrayType,)) ) 
4297   fprob = Dispatch ( (lfprob, (IntType, FloatType)), 
4298                      (afprob, (N.ArrayType,)) ) 
4299   betacf = Dispatch ( (lbetacf, (IntType, FloatType)), 
4300                       (abetacf, (N.ArrayType,)) ) 
4301   betai = Dispatch ( (lbetai, (IntType, FloatType)), 
4302                      (abetai, (N.ArrayType,)) ) 
4303   erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), 
4304                      (aerfcc, (N.ArrayType,)) ) 
4305   gammln = Dispatch ( (lgammln, (IntType, FloatType)), 
4306                       (agammln, (N.ArrayType,)) ) 
4307    
4308  ## ANOVA FUNCTIONS: 
4309   F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), 
4310                         (aF_oneway, (N.ArrayType,)) ) 
4311   F_value = Dispatch ( (lF_value, (ListType, TupleType)), 
4312                        (aF_value, (N.ArrayType,)) ) 
4313   
4314  ## SUPPORT FUNCTIONS: 
4315   incr = Dispatch ( (lincr, (ListType, TupleType, N.ArrayType)), ) 
4316   sum = Dispatch ( (lsum, (ListType, TupleType)), 
4317                    (asum, (N.ArrayType,)) ) 
4318   cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), 
4319                       (acumsum, (N.ArrayType,)) ) 
4320   ss = Dispatch ( (lss, (ListType, TupleType)), 
4321                   (ass, (N.ArrayType,)) ) 
4322   summult = Dispatch ( (lsummult, (ListType, TupleType)), 
4323                        (asummult, (N.ArrayType,)) ) 
4324   square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), 
4325                               (asquare_of_sums, (N.ArrayType,)) ) 
4326   sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), 
4327                               (asumdiffsquared, (N.ArrayType,)) ) 
4328   shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), 
4329                          (ashellsort, (N.ArrayType,)) ) 
4330   rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), 
4331                         (arankdata, (N.ArrayType,)) ) 
4332   findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), 
4333                           (afindwithin, (N.ArrayType,)) ) 
4334   
4335  ######################  END OF NUMERIC FUNCTION BLOCK  ##################### 
4336   
4337  ######################  END OF STATISTICAL FUNCTIONS  ###################### 
4338   
4339  except ImportError: 
4340   pass 
4341