1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 """
25 stats.py module
26
27 (Requires pstat.py module.)
28
29 ::
30 #################################################
31 ####### Written by: Gary Strangman ###########
32 ####### Last modified: Dec 28, 2000 ###########
33 #################################################
34
35 A collection of basic statistical functions for python. The function
36 names appear below.
37
38 IMPORTANT: There are really *3* sets of functions. The first set has an 'l'
39 prefix, which can be used with list or tuple arguments. The second set has
40 an 'a' prefix, which can accept NumPy array arguments. These latter
41 functions are defined only when NumPy is available on the system. The third
42 type has NO prefix (i.e., has the name that appears below). Functions of
43 this set are members of a "Dispatch" class, c/o David Ascher. This class
44 allows different functions to be called depending on the type of the passed
45 arguments. Thus, stats.mean is a member of the Dispatch class and
46 stats.mean(range(20)) will call stats.lmean(range(20)) while
47 stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)).
48 This is a handy way to keep consistent function names when different
49 argument types require different functions to be called. Having
50 implementated the Dispatch class, however, means that to get info on
51 a given function, you must use the REAL function name ... that is
52 "print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine,
53 while "print stats.mean.__doc__" will print the doc for the Dispatch
54 class. NUMPY FUNCTIONS ('a' prefix) generally have more argument options
55 but should otherwise be consistent with the corresponding list functions.
56
57 Disclaimers: The function list is obviously incomplete and, worse, the
58 functions are not optimized. All functions have been tested (some more
59 so than others), but they are far from bulletproof. Thus, as with any
60 free software, no warranty or guarantee is expressed or implied. :-) A
61 few extra functions that don't appear in the list below can be found by
62 interested treasure-hunters. These functions don't necessarily have
63 both list and array versions but were deemed useful::
64
65 CENTRAL TENDENCY: geometricmean
66 harmonicmean
67 mean
68 median
69 medianscore
70 mode
71
72 MOMENTS: moment
73 variation
74 skew
75 kurtosis
76 skewtest (for Numpy arrays only)
77 kurtosistest (for Numpy arrays only)
78 normaltest (for Numpy arrays only)
79
80 ALTERED VERSIONS: tmean (for Numpy arrays only)
81 tvar (for Numpy arrays only)
82 tmin (for Numpy arrays only)
83 tmax (for Numpy arrays only)
84 tstdev (for Numpy arrays only)
85 tsem (for Numpy arrays only)
86 describe
87
88 FREQUENCY STATS: itemfreq
89 scoreatpercentile
90 percentileofscore
91 histogram
92 cumfreq
93 relfreq
94
95 VARIABILITY: obrientransform
96 samplevar
97 samplestdev
98 signaltonoise (for Numpy arrays only)
99 var
100 stdev
101 sterr
102 sem
103 z
104 zs
105 zmap (for Numpy arrays only)
106
107 TRIMMING FCNS: threshold (for Numpy arrays only)
108 trimboth
109 trim1
110 round (round all vals to 'n' decimals; Numpy only)
111
112 CORRELATION FCNS: covariance (for Numpy arrays only)
113 correlation (for Numpy arrays only)
114 paired
115 pearsonr
116 spearmanr
117 pointbiserialr
118 kendalltau
119 linregress
120
121 INFERENTIAL STATS: ttest_1samp
122 ttest_ind
123 ttest_rel
124 chisquare
125 ks_2samp
126 mannwhitneyu
127 ranksums
128 wilcoxont
129 kruskalwallish
130 friedmanchisquare
131
132 PROBABILITY CALCS: chisqprob
133 erfcc
134 zprob
135 ksprob
136 fprob
137 betacf
138 gammln
139 betai
140
141 ANOVA FUNCTIONS: F_oneway
142 F_value
143
144 SUPPORT FUNCTIONS: writecc
145 incr
146 sign (for Numpy arrays only)
147 sum
148 cumsum
149 ss
150 summult
151 sumdiffsquared
152 square_of_sums
153 shellsort
154 rankdata
155 outputpairedstats
156 findwithin
157 """
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215 import pstat
216 import math, string, copy
217 from types import *
218
219 __version__ = 0.5
220
221
222
223
225 """
226 The Dispatch class, care of David Ascher, allows different functions to
227 be called depending on the argument types. This way, there can be one
228 function name regardless of the argument type. To access function doc
229 in stats.py module, prefix the function with an 'l' or 'a' for list or
230 array arguments, respectively. That is, print stats.lmean.__doc__ or
231 print stats.amean.__doc__ or whatever.
232 """
233
235 self._dispatch = {}
236 for func, types in tuples:
237 for t in types:
238 if t in self._dispatch.keys():
239 raise ValueError, "can't have two dispatches on "+str(t)
240 self._dispatch[t] = func
241 self._types = self._dispatch.keys()
242
244 if type(arg1) not in self._types:
245 raise TypeError, "don't know how to dispatch %s arguments" % type(arg1)
246 return apply(self._dispatch[type(arg1)], (arg1,) + args, kw)
247
248
249
250
251
252
253
254
255
256
257
258
260 """
261 Calculates the geometric mean of the values in the passed list.
262 That is: n-th root of (x1 * x2 * ... * xn). Assumes a '1D' list.
263
264 Usage: lgeometricmean(inlist)
265 """
266 mult = 1.0
267 one_over_n = 1.0/len(inlist)
268 for item in inlist:
269 mult = mult * pow(item,one_over_n)
270 return mult
271
272
274 """
275 Calculates the harmonic mean of the values in the passed list.
276 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Assumes a '1D' list.
277
278 Usage: lharmonicmean(inlist)
279 """
280 sum = 0
281 for item in inlist:
282 sum = sum + 1.0/item
283 return len(inlist) / sum
284
285
287 """
288 Returns the arithematic mean of the values in the passed list.
289 Assumes a '1D' list, but will function on the 1st dim of an array(!).
290
291 Usage: lmean(inlist)
292 """
293 sum = 0
294 for item in inlist:
295 sum = sum + item
296 return sum/float(len(inlist))
297
298
319
320
338
339
341 """
342 Returns a list of the modal (most common) score(s) in the passed
343 list. If there is more than one such score, all are returned. The
344 bin-count for the mode(s) is also returned.
345
346 Usage: lmode(inlist)
347 Returns: bin-count for mode(s), a list of modal value(s)
348 """
349
350 scores = pstat.unique(inlist)
351 scores.sort()
352 freq = []
353 for item in scores:
354 freq.append(inlist.count(item))
355 maxfreq = max(freq)
356 mode = []
357 stillmore = 1
358 while stillmore:
359 try:
360 indx = freq.index(maxfreq)
361 mode.append(scores[indx])
362 del freq[indx]
363 del scores[indx]
364 except ValueError:
365 stillmore=0
366 return maxfreq, mode
367
368
369
370
371
372
374 """
375 Calculates the nth moment about the mean for a sample (defaults to
376 the 1st moment). Used to calculate coefficients of skewness and kurtosis.
377
378 Usage: lmoment(inlist,moment=1)
379 Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r)
380 """
381 if moment == 1:
382 return 0.0
383 else:
384 mn = mean(inlist)
385 n = len(inlist)
386 s = 0
387 for x in inlist:
388 s = s + (x-mn)**moment
389 return s/float(n)
390
391
393 """
394 Returns the coefficient of variation, as defined in CRC Standard
395 Probability and Statistics, p.6.
396
397 Usage: lvariation(inlist)
398 """
399 return 100.0*samplestdev(inlist)/float(mean(inlist))
400
401
403 """
404 Returns the skewness of a distribution, as defined in Numerical
405 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
406
407 Usage: lskew(inlist)
408 """
409 return moment(inlist,3)/pow(moment(inlist,2),1.5)
410
411
413 """
414 Returns the kurtosis of a distribution, as defined in Numerical
415 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
416
417 Usage: lkurtosis(inlist)
418 """
419 return moment(inlist,4)/pow(moment(inlist,2),2.0)
420
421
423 """
424 Returns some descriptive statistics of the passed list (assumed to be 1D).
425
426 Usage: ldescribe(inlist)
427 Returns: n, mean, standard deviation, skew, kurtosis
428 """
429 n = len(inlist)
430 mm = (min(inlist),max(inlist))
431 m = mean(inlist)
432 sd = stdev(inlist)
433 sk = skew(inlist)
434 kurt = kurtosis(inlist)
435 return n, mm, m, sd, sk, kurt
436
437
438
439
440
441
443 """
444 Returns a list of pairs. Each pair consists of one of the scores in inlist
445 and it's frequency count. Assumes a 1D list is passed.
446
447 Usage: litemfreq(inlist)
448 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
449 """
450 scores = pstat.unique(inlist)
451 scores.sort()
452 freq = []
453 for item in scores:
454 freq.append(inlist.count(item))
455 return pstat.abut(scores, freq)
456
457
459 """
460 Returns the score at a given percentile relative to the distribution
461 given by inlist.
462
463 Usage: lscoreatpercentile(inlist,percent)
464 """
465 if percent > 1:
466 print "\nDividing percent>1 by 100 in lscoreatpercentile().\n"
467 percent = percent / 100.0
468 targetcf = percent*len(inlist)
469 h, lrl, binsize, extras = histogram(inlist)
470 cumhist = cumsum(copy.deepcopy(h))
471 for i in range(len(cumhist)):
472 if cumhist[i] >= targetcf:
473 break
474 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
475 return score
476
477
479 """
480 Returns the percentile value of a score relative to the distribution
481 given by inlist. Formula depends on the values used to histogram the data(!).
482
483 Usage: lpercentileofscore(inlist,score,histbins=10,defaultlimits=None)
484 """
485
486 h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits)
487 cumhist = cumsum(copy.deepcopy(h))
488 i = int((score - lrl)/float(binsize))
489 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100
490 return pct
491
492
493 -def lhistogram (inlist,numbins=10,defaultreallimits=None,printextras=0):
494 """
495 Returns (i) a list of histogram bin counts, (ii) the smallest value
496 of the histogram binning, and (iii) the bin width (the last 2 are not
497 necessarily integers). Default number of bins is 10. If no sequence object
498 is given for defaultreallimits, the routine picks (usually non-pretty) bins
499 spanning all the numbers in the inlist.
500
501 Usage: lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0)
502 Returns: list of bin values, lowerreallimit, binsize, extrapoints
503 """
504 if (defaultreallimits <> None):
505 if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1:
506 lowerreallimit = defaultreallimits
507 upperreallimit = 1.0001 * max(inlist)
508 else:
509 lowerreallimit = defaultreallimits[0]
510 upperreallimit = defaultreallimits[1]
511 binsize = (upperreallimit-lowerreallimit)/float(numbins)
512 else:
513 estbinwidth=(max(inlist)-min(inlist))/float(numbins) + 1
514 binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins)
515 lowerreallimit = min(inlist) - binsize/2
516 bins = [0]*(numbins)
517 extrapoints = 0
518 for num in inlist:
519 try:
520 if (num-lowerreallimit) < 0:
521 extrapoints = extrapoints + 1
522 else:
523 bintoincrement = int((num-lowerreallimit)/float(binsize))
524 bins[bintoincrement] = bins[bintoincrement] + 1
525 except:
526 extrapoints = extrapoints + 1
527 if (extrapoints > 0 and printextras == 1):
528 print '\nPoints outside given histogram range =',extrapoints
529 return (bins, lowerreallimit, binsize, extrapoints)
530
531
532 -def lcumfreq(inlist,numbins=10,defaultreallimits=None):
533 """
534 Returns a cumulative frequency histogram, using the histogram function.
535
536 Usage: lcumfreq(inlist,numbins=10,defaultreallimits=None)
537 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
538 """
539 h,l,b,e = histogram(inlist,numbins,defaultreallimits)
540 cumhist = cumsum(copy.deepcopy(h))
541 return cumhist,l,b,e
542
543
544 -def lrelfreq(inlist,numbins=10,defaultreallimits=None):
545 """
546 Returns a relative frequency histogram, using the histogram function.
547
548 Usage: lrelfreq(inlist,numbins=10,defaultreallimits=None)
549 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
550 """
551 h,l,b,e = histogram(inlist,numbins,defaultreallimits)
552 for i in range(len(h)):
553 h[i] = h[i]/float(len(inlist))
554 return h,l,b,e
555
556
557
558
559
560
595
596
598 """
599 Returns the variance of the values in the passed list using
600 N for the denominator (i.e., DESCRIBES the sample variance only).
601
602 Usage: lsamplevar(inlist)
603 """
604 n = len(inlist)
605 mn = mean(inlist)
606 deviations = []
607 for item in inlist:
608 deviations.append(item-mn)
609 return ss(deviations)/float(n)
610
611
613 """
614 Returns the standard deviation of the values in the passed list using
615 N for the denominator (i.e., DESCRIBES the sample stdev only).
616
617 Usage: lsamplestdev(inlist)
618 """
619 return math.sqrt(samplevar(inlist))
620
621
623 """
624 Returns the variance of the values in the passed list using N-1
625 for the denominator (i.e., for estimating population variance).
626
627 Usage: lvar(inlist)
628 """
629 n = len(inlist)
630 mn = mean(inlist)
631 deviations = [0]*len(inlist)
632 for i in range(len(inlist)):
633 deviations[i] = inlist[i] - mn
634 return ss(deviations)/float(n-1)
635
636
638 """
639 Returns the standard deviation of the values in the passed list
640 using N-1 in the denominator (i.e., to estimate population stdev).
641
642 Usage: lstdev(inlist)
643 """
644 return math.sqrt(var(inlist))
645
646
648 """
649 Returns the standard error of the values in the passed list using N-1
650 in the denominator (i.e., to estimate population standard error).
651
652 Usage: lsterr(inlist)
653 """
654 return stdev(inlist) / float(math.sqrt(len(inlist)))
655
656
658 """
659 Returns the estimated standard error of the mean (sx-bar) of the
660 values in the passed list. sem = stdev / sqrt(n)
661
662 Usage: lsem(inlist)
663 """
664 sd = stdev(inlist)
665 n = len(inlist)
666 return sd/math.sqrt(n)
667
668
669 -def lz (inlist, score):
670 """
671 Returns the z-score for a given input score, given that score and the
672 list from which that score came. Not appropriate for population calculations.
673
674 Usage: lz(inlist, score)
675 """
676 z = (score-mean(inlist))/samplestdev(inlist)
677 return z
678
679
681 """
682 Returns a list of z-scores, one for each score in the passed list.
683
684 Usage: lzs(inlist)
685 """
686 zscores = []
687 for item in inlist:
688 zscores.append(z(inlist,item))
689 return zscores
690
691
692
693
694
695
697 """
698 Slices off the passed proportion of items from BOTH ends of the passed
699 list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost'
700 10% of scores. Assumes list is sorted by magnitude. Slices off LESS if
701 proportion results in a non-integer slice index (i.e., conservatively
702 slices off proportiontocut).
703
704 Usage: ltrimboth (l,proportiontocut)
705 Returns: trimmed version of list l
706 """
707 lowercut = int(proportiontocut*len(l))
708 uppercut = len(l) - lowercut
709 return l[lowercut:uppercut]
710
711
712 -def ltrim1 (l,proportiontocut,tail='right'):
713 """
714 Slices off the passed proportion of items from ONE end of the passed
715 list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
716 10% of scores). Slices off LESS if proportion results in a non-integer
717 slice index (i.e., conservatively slices off proportiontocut).
718
719 Usage: ltrim1 (l,proportiontocut,tail='right') or set tail='left'
720 Returns: trimmed version of list l
721 """
722 if tail == 'right':
723 lowercut = 0
724 uppercut = len(l) - int(proportiontocut*len(l))
725 elif tail == 'left':
726 lowercut = int(proportiontocut*len(l))
727 uppercut = len(l)
728 return l[lowercut:uppercut]
729
730
731
732
733
734
736 """
737 Interactively determines the type of data and then runs the
738 appropriated statistic for paired group data.
739
740 Usage: lpaired(x,y)
741 Returns: appropriate statistic name, value, and probability
742 """
743 samples = ''
744 while samples not in ['i','r','I','R','c','C']:
745 print '\nIndependent or related samples, or correlation (i,r,c): ',
746 samples = raw_input()
747
748 if samples in ['i','I','r','R']:
749 print '\nComparing variances ...',
750
751 r = obrientransform(x,y)
752 f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
753 if p<0.05:
754 vartype='unequal, p='+str(round(p,4))
755 else:
756 vartype='equal'
757 print vartype
758 if samples in ['i','I']:
759 if vartype[0]=='e':
760 t,p = ttest_ind(x,y,0)
761 print '\nIndependent samples t-test: ', round(t,4),round(p,4)
762 else:
763 if len(x)>20 or len(y)>20:
764 z,p = ranksums(x,y)
765 print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
766 else:
767 u,p = mannwhitneyu(x,y)
768 print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
769
770 else:
771 if vartype[0]=='e':
772 t,p = ttest_rel(x,y,0)
773 print '\nRelated samples t-test: ', round(t,4),round(p,4)
774 else:
775 t,p = ranksums(x,y)
776 print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
777 else:
778 corrtype = ''
779 while corrtype not in ['c','C','r','R','d','D']:
780 print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
781 corrtype = raw_input()
782 if corrtype in ['c','C']:
783 m,b,r,p,see = linregress(x,y)
784 print '\nLinear regression for continuous variables ...'
785 lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
786 pstat.printcc(lol)
787 elif corrtype in ['r','R']:
788 r,p = spearmanr(x,y)
789 print '\nCorrelation for ranked variables ...'
790 print "Spearman's r: ",round(r,4),round(p,4)
791 else:
792 r,p = pointbiserialr(x,y)
793 print '\nAssuming x contains a dichotomous variable ...'
794 print 'Point Biserial r: ',round(r,4),round(p,4)
795 print '\n\n'
796 return None
797
798
800 """
801 Calculates a Pearson correlation coefficient and the associated
802 probability value. Taken from Heiman's Basic Statistics for the Behav.
803 Sci (2nd), p.195.
804
805 Usage: lpearsonr(x,y) where x and y are equal-length lists
806 Returns: Pearson's r value, two-tailed p-value
807 """
808 TINY = 1.0e-30
809 if len(x) <> len(y):
810 raise ValueError, 'Input values not paired in pearsonr. Aborting.'
811 n = len(x)
812 x = map(float,x)
813 y = map(float,y)
814 xmean = mean(x)
815 ymean = mean(y)
816 r_num = n*(summult(x,y)) - sum(x)*sum(y)
817 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
818 r = (r_num / r_den)
819 df = n-2
820 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
821 prob = betai(0.5*df,0.5,df/float(df+t*t))
822 return r, prob
823
824
826 """
827 Calculates a Spearman rank-order correlation coefficient. Taken
828 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
829
830 Usage: lspearmanr(x,y) where x and y are equal-length lists
831 Returns: Spearman's r, two-tailed p-value
832 """
833 TINY = 1e-30
834 if len(x) <> len(y):
835 raise ValueError, 'Input values not paired in spearmanr. Aborting.'
836 n = len(x)
837 rankx = rankdata(x)
838 ranky = rankdata(y)
839 dsq = sumdiffsquared(rankx,ranky)
840 rs = 1 - 6*dsq / float(n*(n**2-1))
841 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
842 df = n-2
843 probrs = betai(0.5*df,0.5,df/(df+t*t))
844
845
846 return rs, probrs
847
848
850 """
851 Calculates a point-biserial correlation coefficient and the associated
852 probability value. Taken from Heiman's Basic Statistics for the Behav.
853 Sci (1st), p.194.
854
855 Usage: lpointbiserialr(x,y) where x,y are equal-length lists
856 Returns: Point-biserial r, two-tailed p-value
857 """
858 TINY = 1e-30
859 if len(x) <> len(y):
860 raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr. ABORTING.'
861 data = pstat.abut(x,y)
862 categories = pstat.unique(x)
863 if len(categories) <> 2:
864 raise ValueError, "Exactly 2 categories required for pointbiserialr()."
865 else:
866 codemap = pstat.abut(categories,range(2))
867 recoded = pstat.recode(data,codemap,0)
868 x = pstat.linexand(data,0,categories[0])
869 y = pstat.linexand(data,0,categories[1])
870 xmean = mean(pstat.colex(x,1))
871 ymean = mean(pstat.colex(y,1))
872 n = len(data)
873 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
874 rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust
875 df = n-2
876 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
877 prob = betai(0.5*df,0.5,df/(df+t*t))
878 return rpb, prob
879
880
882 """
883 Calculates Kendall's tau ... correlation of ordinal data. Adapted
884 from function kendl1 in Numerical Recipies. Needs good test-routine.@@@
885
886 Usage: lkendalltau(x,y)
887 Returns: Kendall's tau, two-tailed p-value
888 """
889 n1 = 0
890 n2 = 0
891 iss = 0
892 for j in range(len(x)-1):
893 for k in range(j,len(y)):
894 a1 = x[j] - x[k]
895 a2 = y[j] - y[k]
896 aa = a1 * a2
897 if (aa):
898 n1 = n1 + 1
899 n2 = n2 + 1
900 if aa > 0:
901 iss = iss + 1
902 else:
903 iss = iss -1
904 else:
905 if (a1):
906 n1 = n1 + 1
907 else:
908 n2 = n2 + 1
909 tau = iss / math.sqrt(n1*n2)
910 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
911 z = tau / math.sqrt(svar)
912 prob = erfcc(abs(z)/1.4142136)
913 return tau, prob
914
915
917 """
918 Calculates a regression line on x,y pairs.
919
920 Usage: llinregress(x,y) x,y are equal-length lists of x-y coordinates
921 Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate
922 """
923 TINY = 1.0e-20
924 if len(x) <> len(y):
925 raise ValueError, 'Input values not paired in linregress. Aborting.'
926 n = len(x)
927 x = map(float,x)
928 y = map(float,y)
929 xmean = mean(x)
930 ymean = mean(y)
931 r_num = float(n*(summult(x,y)) - sum(x)*sum(y))
932 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
933 r = r_num / r_den
934 z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
935 df = n-2
936 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
937 prob = betai(0.5*df,0.5,df/(df+t*t))
938 slope = r_num / float(n*ss(x) - square_of_sums(x))
939 intercept = ymean - slope*xmean
940 sterrest = math.sqrt(1-r*r)*samplestdev(y)
941 return slope, intercept, r, prob, sterrest
942
943
944
945
946
947
948 -def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
949 """
950 Calculates the t-obtained for the independent samples T-test on ONE group
951 of scores a, given a population mean. If printit=1, results are printed
952 to the screen. If printit='filename', the results are output to 'filename'
953 using the given writemode (default=append). Returns t-value, and prob.
954
955 Usage: lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
956 Returns: t-value, two-tailed prob
957 """
958 x = mean(a)
959 v = var(a)
960 n = len(a)
961 df = n-1
962 svar = ((n-1)*v)/float(df)
963 t = (x-popmean)/math.sqrt(svar*(1.0/n))
964 prob = betai(0.5*df,0.5,float(df)/(df+t*t))
965
966 if printit <> 0:
967 statname = 'Single-sample T-test.'
968 outputpairedstats(printit,writemode,
969 'Population','--',popmean,0,0,0,
970 name,n,x,v,min(a),max(a),
971 statname,t,prob)
972 return t,prob
973
974
975 -def lttest_ind (a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
976 """
977 Calculates the t-obtained T-test on TWO INDEPENDENT samples of
978 scores a, and b. From Numerical Recipies, p.483. If printit=1, results
979 are printed to the screen. If printit='filename', the results are output
980 to 'filename' using the given writemode (default=append). Returns t-value,
981 and prob.
982
983 Usage: lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a')
984 Returns: t-value, two-tailed prob
985 """
986 x1 = mean(a)
987 x2 = mean(b)
988 v1 = stdev(a)**2
989 v2 = stdev(b)**2
990 n1 = len(a)
991 n2 = len(b)
992 df = n1+n2-2
993 svar = ((n1-1)*v1+(n2-1)*v2)/float(df)
994 t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2))
995 prob = betai(0.5*df,0.5,df/(df+t*t))
996
997 if printit <> 0:
998 statname = 'Independent samples T-test.'
999 outputpairedstats(printit,writemode,
1000 name1,n1,x1,v1,min(a),max(a),
1001 name2,n2,x2,v2,min(b),max(b),
1002 statname,t,prob)
1003 return t,prob
1004
1005
1006 -def lttest_rel (a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'):
1007 """
1008 Calculates the t-obtained T-test on TWO RELATED samples of scores,
1009 a and b. From Numerical Recipies, p.483. If printit=1, results are
1010 printed to the screen. If printit='filename', the results are output to
1011 'filename' using the given writemode (default=append). Returns t-value,
1012 and prob.
1013
1014 Usage: lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a')
1015 Returns: t-value, two-tailed prob
1016 """
1017 if len(a)<>len(b):
1018 raise ValueError, 'Unequal length lists in ttest_rel.'
1019 x1 = mean(a)
1020 x2 = mean(b)
1021 v1 = var(a)
1022 v2 = var(b)
1023 n = len(a)
1024 cov = 0
1025 for i in range(len(a)):
1026 cov = cov + (a[i]-x1) * (b[i]-x2)
1027 df = n-1
1028 cov = cov / float(df)
1029 sd = math.sqrt((v1+v2 - 2.0*cov)/float(n))
1030 t = (x1-x2)/sd
1031 prob = betai(0.5*df,0.5,df/(df+t*t))
1032
1033 if printit <> 0:
1034 statname = 'Related samples T-test.'
1035 outputpairedstats(printit,writemode,
1036 name1,n,x1,v1,min(a),max(a),
1037 name2,n,x2,v2,min(b),max(b),
1038 statname,t,prob)
1039 return t, prob
1040
1041
1043 """
1044 Calculates a one-way chi square for list of observed frequencies and returns
1045 the result. If no expected frequencies are given, the total N is assumed to
1046 be equally distributed across all groups.
1047
1048 Usage: lchisquare(f_obs, f_exp=None) f_obs = list of observed cell freq.
1049 Returns: chisquare-statistic, associated p-value
1050 """
1051 k = len(f_obs)
1052 if f_exp == None:
1053 f_exp = [sum(f_obs)/float(k)] * len(f_obs)
1054 chisq = 0
1055 for i in range(len(f_obs)):
1056 chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i])
1057 return chisq, chisqprob(chisq, k-1)
1058
1059
1061 """
1062 Computes the Kolmogorov-Smirnof statistic on 2 samples. From
1063 Numerical Recipies in C, page 493.
1064
1065 Usage: lks_2samp(data1,data2) data1&2 are lists of values for 2 conditions
1066 Returns: KS D-value, associated p-value
1067 """
1068 j1 = 0
1069 j2 = 0
1070 fn1 = 0.0
1071 fn2 = 0.0
1072 n1 = len(data1)
1073 n2 = len(data2)
1074 en1 = n1
1075 en2 = n2
1076 d = 0.0
1077 data1.sort()
1078 data2.sort()
1079 while j1 < n1 and j2 < n2:
1080 d1=data1[j1]
1081 d2=data2[j2]
1082 if d1 <= d2:
1083 fn1 = (j1)/float(en1)
1084 j1 = j1 + 1
1085 if d2 <= d1:
1086 fn2 = (j2)/float(en2)
1087 j2 = j2 + 1
1088 dt = (fn2-fn1)
1089 if math.fabs(dt) > math.fabs(d):
1090 d = dt
1091 try:
1092 en = math.sqrt(en1*en2/float(en1+en2))
1093 prob = ksprob((en+0.12+0.11/en)*abs(d))
1094 except:
1095 prob = 1.0
1096 return d, prob
1097
1098
1100 """
1101 Calculates a Mann-Whitney U statistic on the provided scores and
1102 returns the result. Use only when the n in each condition is < 20 and
1103 you have 2 independent samples of ranks. NOTE: Mann-Whitney U is
1104 significant if the u-obtained is LESS THAN or equal to the critical
1105 value of U found in the tables. Equivalent to Kruskal-Wallis H with
1106 just 2 groups.
1107
1108 Usage: lmannwhitneyu(data)
1109 Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
1110 """
1111 n1 = len(x)
1112 n2 = len(y)
1113 ranked = rankdata(x+y)
1114 rankx = ranked[0:n1]
1115 ranky = ranked[n1:]
1116 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)
1117 u2 = n1*n2 - u1
1118 bigu = max(u1,u2)
1119 smallu = min(u1,u2)
1120 T = math.sqrt(tiecorrect(ranked))
1121 if T == 0:
1122 raise ValueError, 'All numbers are identical in lmannwhitneyu'
1123 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
1124 z = abs((bigu-n1*n2/2.0) / sd)
1125 return smallu, 1.0 - zprob(z)
1126
1127
1129 """
1130 Corrects for ties in Mann Whitney U and Kruskal Wallis H tests. See
1131 Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences.
1132 New York: McGraw-Hill. Code adapted from |Stat rankind.c code.
1133
1134 Usage: ltiecorrect(rankvals)
1135 Returns: T correction factor for U or H
1136 """
1137 sorted,posn = shellsort(rankvals)
1138 n = len(sorted)
1139 T = 0.0
1140 i = 0
1141 while (i<n-1):
1142 if sorted[i] == sorted[i+1]:
1143 nties = 1
1144 while (i<n-1) and (sorted[i] == sorted[i+1]):
1145 nties = nties +1
1146 i = i +1
1147 T = T + nties**3 - nties
1148 i = i+1
1149 T = T / float(n**3-n)
1150 return 1.0 - T
1151
1152
1154 """
1155 Calculates the rank sums statistic on the provided scores and
1156 returns the result. Use only when the n in each condition is > 20 and you
1157 have 2 independent samples of ranks.
1158
1159 Usage: lranksums(x,y)
1160 Returns: a z-statistic, two-tailed p-value
1161 """
1162 n1 = len(x)
1163 n2 = len(y)
1164 alldata = x+y
1165 ranked = rankdata(alldata)
1166 x = ranked[:n1]
1167 y = ranked[n1:]
1168 s = sum(x)
1169 expected = n1*(n1+n2+1) / 2.0
1170 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
1171 prob = 2*(1.0 -zprob(abs(z)))
1172 return z, prob
1173
1174
1176 """
1177 Calculates the Wilcoxon T-test for related samples and returns the
1178 result. A non-parametric T-test.
1179
1180 Usage: lwilcoxont(x,y)
1181 Returns: a t-statistic, two-tail probability estimate
1182 """
1183 if len(x) <> len(y):
1184 raise ValueError, 'Unequal N in wilcoxont. Aborting.'
1185 d=[]
1186 for i in range(len(x)):
1187 diff = x[i] - y[i]
1188 if diff <> 0:
1189 d.append(diff)
1190 count = len(d)
1191 absd = map(abs,d)
1192 absranked = rankdata(absd)
1193 r_plus = 0.0
1194 r_minus = 0.0
1195 for i in range(len(absd)):
1196 if d[i] < 0:
1197 r_minus = r_minus + absranked[i]
1198 else:
1199 r_plus = r_plus + absranked[i]
1200 wt = min(r_plus, r_minus)
1201 mn = count * (count+1) * 0.25
1202 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
1203 z = math.fabs(wt-mn) / se
1204 prob = 2*(1.0 -zprob(abs(z)))
1205 return wt, prob
1206
1207
1209 """
1210 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
1211 groups, requiring at least 5 subjects in each group. This function
1212 calculates the Kruskal-Wallis H-test for 3 or more independent samples
1213 and returns the result.
1214
1215 Usage: lkruskalwallish(*args)
1216 Returns: H-statistic (corrected for ties), associated p-value
1217 """
1218 args = list(args)
1219 n = [0]*len(args)
1220 all = []
1221 n = map(len,args)
1222 for i in range(len(args)):
1223 all = all + args[i]
1224 ranked = rankdata(all)
1225 T = tiecorrect(ranked)
1226 for i in range(len(args)):
1227 args[i] = ranked[0:n[i]]
1228 del ranked[0:n[i]]
1229 rsums = []
1230 for i in range(len(args)):
1231 rsums.append(sum(args[i])**2)
1232 rsums[i] = rsums[i] / float(n[i])
1233 ssbn = sum(rsums)
1234 totaln = sum(n)
1235 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
1236 df = len(args) - 1
1237 if T == 0:
1238 raise ValueError, 'All numbers are identical in lkruskalwallish'
1239 h = h / float(T)
1240 return h, chisqprob(h,df)
1241
1242
1244 """
1245 Friedman Chi-Square is a non-parametric, one-way within-subjects
1246 ANOVA. This function calculates the Friedman Chi-square test for repeated
1247 measures and returns the result, along with the associated probability
1248 value. It assumes 3 or more repeated measures. Only 3 levels requires a
1249 minimum of 10 subjects in the study. Four levels requires 5 subjects per
1250 level(??).
1251
1252 Usage: lfriedmanchisquare(*args)
1253 Returns: chi-square statistic, associated p-value
1254 """
1255 k = len(args)
1256 if k < 3:
1257 raise ValueError, 'Less than 3 levels. Friedman test not appropriate.'
1258 n = len(args[0])
1259 data = apply(pstat.abut,tuple(args))
1260 for i in range(len(data)):
1261 data[i] = rankdata(data[i])
1262 ssbn = 0
1263 for i in range(k):
1264 ssbn = ssbn + sum(args[i])**2
1265 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
1266 return chisq, chisqprob(chisq,k-1)
1267
1268
1269
1270
1271
1272
1274 """
1275 Returns the (1-tailed) probability value associated with the provided
1276 chi-square value and df. Adapted from chisq.c in Gary Perlman's |Stat.
1277
1278 Usage: lchisqprob(chisq,df)
1279 """
1280 BIG = 20.0
1281 def ex(x):
1282 BIG = 20.0
1283 if x < -BIG:
1284 return 0.0
1285 else:
1286 return math.exp(x)
1287
1288 if chisq <=0 or df < 1:
1289 return 1.0
1290 a = 0.5 * chisq
1291 if df%2 == 0:
1292 even = 1
1293 else:
1294 even = 0
1295 if df > 1:
1296 y = ex(-a)
1297 if even:
1298 s = y
1299 else:
1300 s = 2.0 * zprob(-math.sqrt(chisq))
1301 if (df > 2):
1302 chisq = 0.5 * (df - 1.0)
1303 if even:
1304 z = 1.0
1305 else:
1306 z = 0.5
1307 if a > BIG:
1308 if even:
1309 e = 0.0
1310 else:
1311 e = math.log(math.sqrt(math.pi))
1312 c = math.log(a)
1313 while (z <= chisq):
1314 e = math.log(z) + e
1315 s = s + ex(c*z-a-e)
1316 z = z + 1.0
1317 return s
1318 else:
1319 if even:
1320 e = 1.0
1321 else:
1322 e = 1.0 / math.sqrt(math.pi) / math.sqrt(a)
1323 c = 0.0
1324 while (z <= chisq):
1325 e = e * (a/float(z))
1326 c = c + e
1327 z = z + 1.0
1328 return (c*y+s)
1329 else:
1330 return s
1331
1332
1334 """
1335 Returns the complementary error function erfc(x) with fractional
1336 error everywhere less than 1.2e-7. Adapted from Numerical Recipies.
1337
1338 Usage: lerfcc(x)
1339 """
1340 z = abs(x)
1341 t = 1.0 / (1.0+0.5*z)
1342 ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
1343 if x >= 0:
1344 return ans
1345 else:
1346 return 2.0 - ans
1347
1348
1350 """
1351 Returns the area under the normal curve 'to the left of' the given z value.
1352 Thus, ::
1353 for z<0, zprob(z) = 1-tail probability
1354 for z>0, 1.0-zprob(z) = 1-tail probability
1355 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
1356 Adapted from z.c in Gary Perlman's |Stat.
1357
1358 Usage: lzprob(z)
1359 """
1360 Z_MAX = 6.0
1361 if z == 0.0:
1362 x = 0.0
1363 else:
1364 y = 0.5 * math.fabs(z)
1365 if y >= (Z_MAX*0.5):
1366 x = 1.0
1367 elif (y < 1.0):
1368 w = y*y
1369 x = ((((((((0.000124818987 * w
1370 -0.001075204047) * w +0.005198775019) * w
1371 -0.019198292004) * w +0.059054035642) * w
1372 -0.151968751364) * w +0.319152932694) * w
1373 -0.531923007300) * w +0.797884560593) * y * 2.0
1374 else:
1375 y = y - 2.0
1376 x = (((((((((((((-0.000045255659 * y
1377 +0.000152529290) * y -0.000019538132) * y
1378 -0.000676904986) * y +0.001390604284) * y
1379 -0.000794620820) * y -0.002034254874) * y
1380 +0.006549791214) * y -0.010557625006) * y
1381 +0.011630447319) * y -0.009279453341) * y
1382 +0.005353579108) * y -0.002141268741) * y
1383 +0.000535310849) * y +0.999936657524
1384 if z > 0.0:
1385 prob = ((x+1.0)*0.5)
1386 else:
1387 prob = ((1.0-x)*0.5)
1388 return prob
1389
1390
1392 """
1393 Computes a Kolmolgorov-Smirnov t-test significance level. Adapted from
1394 Numerical Recipies.
1395
1396 Usage: lksprob(alam)
1397 """
1398 fac = 2.0
1399 sum = 0.0
1400 termbf = 0.0
1401 a2 = -2.0*alam*alam
1402 for j in range(1,201):
1403 term = fac*math.exp(a2*j*j)
1404 sum = sum + term
1405 if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum):
1406 return sum
1407 fac = -fac
1408 termbf = math.fabs(term)
1409 return 1.0
1410
1411
1412 -def lfprob (dfnum, dfden, F):
1413 """
1414 Returns the (1-tailed) significance level (p-value) of an F
1415 statistic given the degrees of freedom for the numerator (dfR-dfF) and
1416 the degrees of freedom for the denominator (dfF).
1417
1418 Usage: lfprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
1419 """
1420 p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
1421 return p
1422
1423
1425 """
1426 This function evaluates the continued fraction form of the incomplete
1427 Beta function, betai. (Adapted from: Numerical Recipies in C.)
1428
1429 Usage: lbetacf(a,b,x)
1430 """
1431 ITMAX = 200
1432 EPS = 3.0e-7
1433
1434 bm = az = am = 1.0
1435 qab = a+b
1436 qap = a+1.0
1437 qam = a-1.0
1438 bz = 1.0-qab*x/qap
1439 for i in range(ITMAX+1):
1440 em = float(i+1)
1441 tem = em + em
1442 d = em*(b-em)*x/((qam+tem)*(a+tem))
1443 ap = az + d*am
1444 bp = bz+d*bm
1445 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
1446 app = ap+d*az
1447 bpp = bp+d*bz
1448 aold = az
1449 am = ap/bpp
1450 bm = bp/bpp
1451 az = app/bpp
1452 bz = 1.0
1453 if (abs(az-aold)<(EPS*abs(az))):
1454 return az
1455 print 'a or b too big, or ITMAX too small in Betacf.'
1456
1457
1459 """
1460 Returns the gamma function of xx.::
1461 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
1462 (Adapted from: Numerical Recipies in C.)
1463
1464 Usage: lgammln(xx)
1465 """
1466
1467 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
1468 0.120858003e-2, -0.536382e-5]
1469 x = xx - 1.0
1470 tmp = x + 5.5
1471 tmp = tmp - (x+0.5)*math.log(tmp)
1472 ser = 1.0
1473 for j in range(len(coeff)):
1474 x = x + 1
1475 ser = ser + coeff[j]/x
1476 return -tmp + math.log(2.50662827465*ser)
1477
1478
1480 """
1481 Returns the incomplete beta function::
1482
1483 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
1484
1485 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
1486 function of a. The continued fraction formulation is implemented here,
1487 using the betacf function. (Adapted from: Numerical Recipies in C.)
1488
1489 Usage: lbetai(a,b,x)
1490 """
1491 if (x<0.0 or x>1.0):
1492 raise ValueError, 'Bad x in lbetai'
1493 if (x==0.0 or x==1.0):
1494 bt = 0.0
1495 else:
1496 bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b*
1497 math.log(1.0-x))
1498 if (x<(a+1.0)/(a+b+2.0)):
1499 return bt*betacf(a,b,x)/float(a)
1500 else:
1501 return 1.0-bt*betacf(b,a,1.0-x)/float(b)
1502
1503
1504
1505
1506
1507
1509 """
1510 Performs a 1-way ANOVA, returning an F-value and probability given
1511 any number of groups. From Heiman, pp.394-7.
1512
1513 Usage: F_oneway(*lists) where *lists is any number of lists, one per treatment group
1514 Returns: F value, one-tailed p-value
1515 """
1516 a = len(lists)
1517 means = [0]*a
1518 vars = [0]*a
1519 ns = [0]*a
1520 alldata = []
1521 tmp = map(N.array,lists)
1522 means = map(amean,tmp)
1523 vars = map(avar,tmp)
1524 ns = map(len,lists)
1525 for i in range(len(lists)):
1526 alldata = alldata + lists[i]
1527 alldata = N.array(alldata)
1528 bign = len(alldata)
1529 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
1530 ssbn = 0
1531 for list in lists:
1532 ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list))
1533 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
1534 sswn = sstot-ssbn
1535 dfbn = a-1
1536 dfwn = bign - a
1537 msb = ssbn/float(dfbn)
1538 msw = sswn/float(dfwn)
1539 f = msb/msw
1540 prob = fprob(dfbn,dfwn,f)
1541 return f, prob
1542
1543
1545 """
1546 Returns an F-statistic given the following::
1547 ER = error associated with the null hypothesis (the Restricted model)
1548 EF = error associated with the alternate hypothesis (the Full model)
1549 dfR-dfF = degrees of freedom of the numerator
1550 dfF = degrees of freedom associated with the denominator/Full model
1551
1552 Usage: lF_value(ER,EF,dfnum,dfden)
1553 """
1554 return ((ER-EF)/float(dfnum) / (EF/float(dfden)))
1555
1556
1557
1558
1559
1560
1561 -def writecc (listoflists,file,writetype='w',extra=2):
1562 """
1563 Writes a list of lists to a file in columns, customized by the max
1564 size of items within the columns (max size of items in col, +2 characters)
1565 to specified file. File-overwrite is the default.
1566
1567 Usage: writecc (listoflists,file,writetype='w',extra=2)
1568 Returns: None
1569 """
1570 if type(listoflists[0]) not in [ListType,TupleType]:
1571 listoflists = [listoflists]
1572 outfile = open(file,writetype)
1573 rowstokill = []
1574 list2print = copy.deepcopy(listoflists)
1575 for i in range(len(listoflists)):
1576 if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes':
1577 rowstokill = rowstokill + [i]
1578 rowstokill.reverse()
1579 for row in rowstokill:
1580 del list2print[row]
1581 maxsize = [0]*len(list2print[0])
1582 for col in range(len(list2print[0])):
1583 items = pstat.colex(list2print,col)
1584 items = map(pstat.makestr,items)
1585 maxsize[col] = max(map(len,items)) + extra
1586 for row in listoflists:
1587 if row == ['\n'] or row == '\n':
1588 outfile.write('\n')
1589 elif row == ['dashes'] or row == 'dashes':
1590 dashes = [0]*len(maxsize)
1591 for j in range(len(maxsize)):
1592 dashes[j] = '-'*(maxsize[j]-2)
1593 outfile.write(pstat.lineincustcols(dashes,maxsize))
1594 else:
1595 outfile.write(pstat.lineincustcols(row,maxsize))
1596 outfile.write('\n')
1597 outfile.close()
1598 return None
1599
1600
1602 """
1603 Simulate a counting system from an n-dimensional list.
1604
1605 Usage: lincr(l,cap) l=list to increment, cap=max values for each list pos'n
1606 Returns: next set of values for list l, OR -1 (if overflow)
1607 """
1608 l[0] = l[0] + 1
1609 for i in range(len(l)):
1610 if l[i] > cap[i] and i < len(l)-1:
1611 l[i] = 0
1612 l[i+1] = l[i+1] + 1
1613 elif l[i] > cap[i] and i == len(l)-1:
1614 l = -1
1615 return l
1616
1617
1619 """
1620 Returns the sum of the items in the passed list.
1621
1622 Usage: lsum(inlist)
1623 """
1624 s = 0
1625 for item in inlist:
1626 s = s + item
1627 return s
1628
1629
1631 """
1632 Returns a list consisting of the cumulative sum of the items in the
1633 passed list.
1634
1635 Usage: lcumsum(inlist)
1636 """
1637 newlist = copy.deepcopy(inlist)
1638 for i in range(1,len(newlist)):
1639 newlist[i] = newlist[i] + newlist[i-1]
1640 return newlist
1641
1642
1644 """
1645 Squares each value in the passed list, adds up these squares and
1646 returns the result.
1647
1648 Usage: lss(inlist)
1649 """
1650 ss = 0
1651 for item in inlist:
1652 ss = ss + item*item
1653 return ss
1654
1655
1657 """
1658 Multiplies elements in list1 and list2, element by element, and
1659 returns the sum of all resulting multiplications. Must provide equal
1660 length lists.
1661
1662 Usage: lsummult(list1,list2)
1663 """
1664 if len(list1) <> len(list2):
1665 raise ValueError, "Lists not equal length in summult."
1666 s = 0
1667 for item1,item2 in pstat.abut(list1,list2):
1668 s = s + item1*item2
1669 return s
1670
1671
1673 """
1674 Takes pairwise differences of the values in lists x and y, squares
1675 these differences, and returns the sum of these squares.
1676
1677 Usage: lsumdiffsquared(x,y)
1678 Returns: sum[(x[i]-y[i])**2]
1679 """
1680 sds = 0
1681 for i in range(len(x)):
1682 sds = sds + (x[i]-y[i])**2
1683 return sds
1684
1685
1687 """
1688 Adds the values in the passed list, squares the sum, and returns
1689 the result.
1690
1691 Usage: lsquare_of_sums(inlist)
1692 Returns: sum(inlist[i])**2
1693 """
1694 s = sum(inlist)
1695 return float(s)*s
1696
1697
1699 """
1700 Shellsort algorithm. Sorts a 1D-list.
1701
1702 Usage: lshellsort(inlist)
1703 Returns: sorted-inlist, sorting-index-vector (for original list)
1704 """
1705 n = len(inlist)
1706 svec = copy.deepcopy(inlist)
1707 ivec = range(n)
1708 gap = n/2
1709 while gap >0:
1710 for i in range(gap,n):
1711 for j in range(i-gap,-1,-gap):
1712 while j>=0 and svec[j]>svec[j+gap]:
1713 temp = svec[j]
1714 svec[j] = svec[j+gap]
1715 svec[j+gap] = temp
1716 itemp = ivec[j]
1717 ivec[j] = ivec[j+gap]
1718 ivec[j+gap] = itemp
1719 gap = gap / 2
1720
1721 return svec, ivec
1722
1723
1725 """
1726 Ranks the data in inlist, dealing with ties appropritely. Assumes
1727 a 1D inlist. Adapted from Gary Perlman's |Stat ranksort.
1728
1729 Usage: lrankdata(inlist)
1730 Returns: a list of length equal to inlist, containing rank scores
1731 """
1732 n = len(inlist)
1733 svec, ivec = shellsort(inlist)
1734 sumranks = 0
1735 dupcount = 0
1736 newlist = [0]*n
1737 for i in range(n):
1738 sumranks = sumranks + i
1739 dupcount = dupcount + 1
1740 if i==n-1 or svec[i] <> svec[i+1]:
1741 averank = sumranks / float(dupcount) + 1
1742 for j in range(i-dupcount+1,i+1):
1743 newlist[ivec[j]] = averank
1744 sumranks = 0
1745 dupcount = 0
1746 return newlist
1747
1748
1749 -def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob):
1750 """
1751 Prints or write to a file stats for two groups, using the name, n,
1752 mean, sterr, min and max for each group, as well as the statistic name,
1753 its value, and the associated p-value.
1754
1755 Usage::
1756 outputpairedstats(fname,writemode,
1757 name1,n1,mean1,stderr1,min1,max1,
1758 name2,n2,mean2,stderr2,min2,max2,
1759 statname,stat,prob)
1760 Returns: None
1761 """
1762 suffix = ''
1763 try:
1764 x = prob.shape
1765 prob = prob[0]
1766 except:
1767 pass
1768 if prob < 0.001: suffix = ' ***'
1769 elif prob < 0.01: suffix = ' **'
1770 elif prob < 0.05: suffix = ' *'
1771 title = [['Name','N','Mean','SD','Min','Max']]
1772 lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1],
1773 [name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]]
1774 if type(fname)<>StringType or len(fname)==0:
1775 print
1776 print statname
1777 print
1778 pstat.printcc(lofl)
1779 print
1780 try:
1781 if stat.shape == ():
1782 stat = stat[0]
1783 if prob.shape == ():
1784 prob = prob[0]
1785 except:
1786 pass
1787 print 'Test statistic = ',round(stat,3),' p = ',round(prob,3),suffix
1788 print
1789 else:
1790 file = open(fname,writemode)
1791 file.write('\n'+statname+'\n\n')
1792 file.close()
1793 writecc(lofl,fname,'a')
1794 file = open(fname,'a')
1795 try:
1796 if stat.shape == ():
1797 stat = stat[0]
1798 if prob.shape == ():
1799 prob = prob[0]
1800 except:
1801 pass
1802 file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),' p = ',round(prob,4),suffix,'\n\n']))
1803 file.close()
1804 return None
1805
1806
1808 """
1809 Returns an integer representing a binary vector, where 1=within-
1810 subject factor, 0=between. Input equals the entire data 2D list (i.e.,
1811 column 0=random factor, column -1=measured values (those two are skipped).
1812 Note: input data is in |Stat format ... a list of lists ("2D list") with
1813 one row per measured value, first column=subject identifier, last column=
1814 score, one in-between column per factor (these columns contain level
1815 designations on each factor). See also stats.anova.__doc__.
1816
1817 Usage: lfindwithin(data) data in |Stat format
1818 """
1819
1820 numfact = len(data[0])-1
1821 withinvec = 0
1822 for col in range(1,numfact):
1823 examplelevel = pstat.unique(pstat.colex(data,col))[0]
1824 rows = pstat.linexand(data,col,examplelevel)
1825 factsubjs = pstat.unique(pstat.colex(rows,0))
1826 allsubjs = pstat.unique(pstat.colex(data,0))
1827 if len(factsubjs) == len(allsubjs):
1828 withinvec = withinvec + (1 << col)
1829 return withinvec
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), )
1840 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), )
1841 mean = Dispatch ( (lmean, (ListType, TupleType)), )
1842 median = Dispatch ( (lmedian, (ListType, TupleType)), )
1843 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), )
1844 mode = Dispatch ( (lmode, (ListType, TupleType)), )
1845
1846
1847 moment = Dispatch ( (lmoment, (ListType, TupleType)), )
1848 variation = Dispatch ( (lvariation, (ListType, TupleType)), )
1849 skew = Dispatch ( (lskew, (ListType, TupleType)), )
1850 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), )
1851 describe = Dispatch ( (ldescribe, (ListType, TupleType)), )
1852
1853
1854 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), )
1855 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), )
1856 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), )
1857 histogram = Dispatch ( (lhistogram, (ListType, TupleType)), )
1858 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), )
1859 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), )
1860
1861
1862 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), )
1863 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), )
1864 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), )
1865 var = Dispatch ( (lvar, (ListType, TupleType)), )
1866 stdev = Dispatch ( (lstdev, (ListType, TupleType)), )
1867 sterr = Dispatch ( (lsterr, (ListType, TupleType)), )
1868 sem = Dispatch ( (lsem, (ListType, TupleType)), )
1869 z = Dispatch ( (lz, (ListType, TupleType)), )
1870 zs = Dispatch ( (lzs, (ListType, TupleType)), )
1871
1872
1873 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), )
1874 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), )
1875
1876
1877 paired = Dispatch ( (lpaired, (ListType, TupleType)), )
1878 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), )
1879 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), )
1880 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), )
1881 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), )
1882 linregress = Dispatch ( (llinregress, (ListType, TupleType)), )
1883
1884
1885 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), )
1886 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), )
1887 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), )
1888 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), )
1889 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), )
1890 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), )
1891 ranksums = Dispatch ( (lranksums, (ListType, TupleType)), )
1892 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), )
1893 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), )
1894 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), )
1895 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), )
1896
1897
1898 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), )
1899 zprob = Dispatch ( (lzprob, (IntType, FloatType)), )
1900 ksprob = Dispatch ( (lksprob, (IntType, FloatType)), )
1901 fprob = Dispatch ( (lfprob, (IntType, FloatType)), )
1902 betacf = Dispatch ( (lbetacf, (IntType, FloatType)), )
1903 betai = Dispatch ( (lbetai, (IntType, FloatType)), )
1904 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), )
1905 gammln = Dispatch ( (lgammln, (IntType, FloatType)), )
1906
1907
1908 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), )
1909 F_value = Dispatch ( (lF_value, (ListType, TupleType)), )
1910
1911
1912 incr = Dispatch ( (lincr, (ListType, TupleType)), )
1913 sum = Dispatch ( (lsum, (ListType, TupleType)), )
1914 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), )
1915 ss = Dispatch ( (lss, (ListType, TupleType)), )
1916 summult = Dispatch ( (lsummult, (ListType, TupleType)), )
1917 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), )
1918 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), )
1919 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), )
1920 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), )
1921 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), )
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944 try:
1945 import Numeric
1946 N = Numeric
1947 import LinearAlgebra
1948 LA = LinearAlgebra
1949
1950
1951
1952
1953
1954
1956 """
1957 Calculates the geometric mean of the values in the passed array.
1958 That is: n-th root of (x1 * x2 * ... * xn). Defaults to ALL values in
1959 the passed array. Use dimension=None to flatten array first. REMEMBER: if
1960 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
1961 if dimension is a sequence, it collapses over all specified dimensions. If
1962 keepdims is set to 1, the resulting array will have as many dimensions as
1963 inarray, with only 1 'level' per dim that was collapsed over.
1964
1965 Usage: ageometricmean(inarray,dimension=None,keepdims=0)
1966 Returns: geometric mean computed over dim(s) listed in dimension
1967 """
1968 inarray = N.array(inarray,N.Float)
1969 if dimension == None:
1970 inarray = N.ravel(inarray)
1971 size = len(inarray)
1972 mult = N.power(inarray,1.0/size)
1973 mult = N.multiply.reduce(mult)
1974 elif type(dimension) in [IntType,FloatType]:
1975 size = inarray.shape[dimension]
1976 mult = N.power(inarray,1.0/size)
1977 mult = N.multiply.reduce(mult,dimension)
1978 if keepdims == 1:
1979 shp = list(inarray.shape)
1980 shp[dimension] = 1
1981 sum = N.reshape(sum,shp)
1982 else:
1983 dims = list(dimension)
1984 dims.sort()
1985 dims.reverse()
1986 size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float)
1987 mult = N.power(inarray,1.0/size)
1988 for dim in dims:
1989 mult = N.multiply.reduce(mult,dim)
1990 if keepdims == 1:
1991 shp = list(inarray.shape)
1992 for dim in dims:
1993 shp[dim] = 1
1994 mult = N.reshape(mult,shp)
1995 return mult
1996
1997
1999 """
2000 Calculates the harmonic mean of the values in the passed array.
2001 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Defaults to ALL values in
2002 the passed array. Use dimension=None to flatten array first. REMEMBER: if
2003 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2004 if dimension is a sequence, it collapses over all specified dimensions. If
2005 keepdims is set to 1, the resulting array will have as many dimensions as
2006 inarray, with only 1 'level' per dim that was collapsed over.
2007
2008 Usage: aharmonicmean(inarray,dimension=None,keepdims=0)
2009 Returns: harmonic mean computed over dim(s) in dimension
2010 """
2011 inarray = inarray.astype(N.Float)
2012 if dimension == None:
2013 inarray = N.ravel(inarray)
2014 size = len(inarray)
2015 s = N.add.reduce(1.0 / inarray)
2016 elif type(dimension) in [IntType,FloatType]:
2017 size = float(inarray.shape[dimension])
2018 s = N.add.reduce(1.0/inarray, dimension)
2019 if keepdims == 1:
2020 shp = list(inarray.shape)
2021 shp[dimension] = 1
2022 s = N.reshape(s,shp)
2023 else:
2024 dims = list(dimension)
2025 dims.sort()
2026 nondims = []
2027 for i in range(len(inarray.shape)):
2028 if i not in dims:
2029 nondims.append(i)
2030 tinarray = N.transpose(inarray,nondims+dims)
2031 idx = [0] *len(nondims)
2032 if idx == []:
2033 size = len(N.ravel(inarray))
2034 s = asum(1.0 / inarray)
2035 if keepdims == 1:
2036 s = N.reshape([s],N.ones(len(inarray.shape)))
2037 else:
2038 idx[0] = -1
2039 loopcap = N.array(tinarray.shape[0:len(nondims)]) -1
2040 s = N.zeros(loopcap+1,N.Float)
2041 while incr(idx,loopcap) <> -1:
2042 s[idx] = asum(1.0/tinarray[idx])
2043 size = N.multiply.reduce(N.take(inarray.shape,dims))
2044 if keepdims == 1:
2045 shp = list(inarray.shape)
2046 for dim in dims:
2047 shp[dim] = 1
2048 s = N.reshape(s,shp)
2049 return size / s
2050
2051
2052 - def amean (inarray,dimension=None,keepdims=0):
2053 """
2054 Calculates the arithmatic mean of the values in the passed array.
2055 That is: 1/n * (x1 + x2 + ... + xn). Defaults to ALL values in the
2056 passed array. Use dimension=None to flatten array first. REMEMBER: if
2057 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2058 if dimension is a sequence, it collapses over all specified dimensions. If
2059 keepdims is set to 1, the resulting array will have as many dimensions as
2060 inarray, with only 1 'level' per dim that was collapsed over.
2061
2062 Usage: amean(inarray,dimension=None,keepdims=0)
2063 Returns: arithematic mean calculated over dim(s) in dimension
2064 """
2065 if inarray.typecode() in ['l','s','b']:
2066 inarray = inarray.astype(N.Float)
2067 if dimension == None:
2068 inarray = N.ravel(inarray)
2069 sum = N.add.reduce(inarray)
2070 denom = float(len(inarray))
2071 elif type(dimension) in [IntType,FloatType]:
2072 sum = asum(inarray,dimension)
2073 denom = float(inarray.shape[dimension])
2074 if keepdims == 1:
2075 shp = list(inarray.shape)
2076 shp[dimension] = 1
2077 sum = N.reshape(sum,shp)
2078 else:
2079 dims = list(dimension)
2080 dims.sort()
2081 dims.reverse()
2082 sum = inarray *1.0
2083 for dim in dims:
2084 sum = N.add.reduce(sum,dim)
2085 denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float)
2086 if keepdims == 1:
2087 shp = list(inarray.shape)
2088 for dim in dims:
2089 shp[dim] = 1
2090 sum = N.reshape(sum,shp)
2091 return sum/denom
2092
2093
2116
2117
2141
2142
2143 - def amode(a, dimension=None):
2144 """
2145 Returns an array of the modal (most common) score in the passed array.
2146 If there is more than one such score, ONLY THE FIRST is returned.
2147 The bin-count for the modal values is also returned. Operates on whole
2148 array (dimension=None), or on a given dimension.
2149
2150 Usage: amode(a, dimension=None)
2151 Returns: array of bin-counts for mode(s), array of corresponding modal values
2152 """
2153
2154 if dimension == None:
2155 a = N.ravel(a)
2156 dimension = 0
2157 scores = pstat.aunique(N.ravel(a))
2158 testshape = list(a.shape)
2159 testshape[dimension] = 1
2160 oldmostfreq = N.zeros(testshape)
2161 oldcounts = N.zeros(testshape)
2162 for score in scores:
2163 template = N.equal(a,score)
2164 counts = asum(template,dimension,1)
2165 mostfrequent = N.where(N.greater(counts,oldcounts),score,oldmostfreq)
2166 oldcounts = N.where(N.greater(counts,oldcounts),counts,oldcounts)
2167 oldmostfreq = mostfrequent
2168 return oldcounts, mostfrequent
2169
2170
2171 - def atmean(a,limits=None,inclusive=(1,1)):
2172 """
2173 Returns the arithmetic mean of all values in an array, ignoring values
2174 strictly outside the sequence passed to 'limits'. Note: either limit
2175 in the sequence, or the value of limits itself, can be set to None. The
2176 inclusive list/tuple determines whether the lower and upper limiting bounds
2177 (respectively) are open/exclusive (0) or closed/inclusive (1).
2178
2179 Usage: atmean(a,limits=None,inclusive=(1,1))
2180 """
2181 if a.typecode() in ['l','s','b']:
2182 a = a.astype(N.Float)
2183 if limits == None:
2184 return mean(a)
2185 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atmean"
2186 if inclusive[0]: lowerfcn = N.greater_equal
2187 else: lowerfcn = N.greater
2188 if inclusive[1]: upperfcn = N.less_equal
2189 else: upperfcn = N.less
2190 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2191 raise ValueError, "No array values within given limits (atmean)."
2192 elif limits[0]==None and limits[1]<>None:
2193 mask = upperfcn(a,limits[1])
2194 elif limits[0]<>None and limits[1]==None:
2195 mask = lowerfcn(a,limits[0])
2196 elif limits[0]<>None and limits[1]<>None:
2197 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2198 s = float(N.add.reduce(N.ravel(a*mask)))
2199 n = float(N.add.reduce(N.ravel(mask)))
2200 return s/n
2201
2202
2203 - def atvar(a,limits=None,inclusive=(1,1)):
2204 """
2205 Returns the sample variance of values in an array, (i.e., using N-1),
2206 ignoring values strictly outside the sequence passed to 'limits'.
2207 Note: either limit in the sequence, or the value of limits itself,
2208 can be set to None. The inclusive list/tuple determines whether the lower
2209 and upper limiting bounds (respectively) are open/exclusive (0) or
2210 closed/inclusive (1).
2211
2212 Usage: atvar(a,limits=None,inclusive=(1,1))
2213 """
2214 a = a.astype(N.Float)
2215 if limits == None or limits == [None,None]:
2216 term1 = N.add.reduce(N.ravel(a*a))
2217 n = float(len(N.ravel(a))) - 1
2218 term2 = N.add.reduce(N.ravel(a))**2 / n
2219 return (term1 - term2) / n
2220 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atvar"
2221 if inclusive[0]: lowerfcn = N.greater_equal
2222 else: lowerfcn = N.greater
2223 if inclusive[1]: upperfcn = N.less_equal
2224 else: upperfcn = N.less
2225 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2226 raise ValueError, "No array values within given limits (atvar)."
2227 elif limits[0]==None and limits[1]<>None:
2228 mask = upperfcn(a,limits[1])
2229 elif limits[0]<>None and limits[1]==None:
2230 mask = lowerfcn(a,limits[0])
2231 elif limits[0]<>None and limits[1]<>None:
2232 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2233 term1 = N.add.reduce(N.ravel(a*a*mask))
2234 n = float(N.add.reduce(N.ravel(mask))) - 1
2235 term2 = N.add.reduce(N.ravel(a*mask))**2 / n
2236 return (term1 - term2) / n
2237
2238
2239 - def atmin(a,lowerlimit=None,dimension=None,inclusive=1):
2240 """
2241 Returns the minimum value of a, along dimension, including only values less
2242 than (or equal to, if inclusive=1) lowerlimit. If the limit is set to None,
2243 all values in the array are used.
2244
2245 Usage: atmin(a,lowerlimit=None,dimension=None,inclusive=1)
2246 """
2247 if inclusive: lowerfcn = N.greater
2248 else: lowerfcn = N.greater_equal
2249 if dimension == None:
2250 a = N.ravel(a)
2251 dimension = 0
2252 if lowerlimit == None:
2253 lowerlimit = N.minimum.reduce(N.ravel(a))-11
2254 biggest = N.maximum.reduce(N.ravel(a))
2255 ta = N.where(lowerfcn(a,lowerlimit),a,biggest)
2256 return N.minimum.reduce(ta,dimension)
2257
2258
2259 - def atmax(a,upperlimit,dimension=None,inclusive=1):
2260 """
2261 Returns the maximum value of a, along dimension, including only values greater
2262 than (or equal to, if inclusive=1) upperlimit. If the limit is set to None,
2263 a limit larger than the max value in the array is used.
2264
2265 Usage: atmax(a,upperlimit,dimension=None,inclusive=1)
2266 """
2267 if inclusive: upperfcn = N.less
2268 else: upperfcn = N.less_equal
2269 if dimension == None:
2270 a = N.ravel(a)
2271 dimension = 0
2272 if upperlimit == None:
2273 upperlimit = N.maximum.reduce(N.ravel(a))+1
2274 smallest = N.minimum.reduce(N.ravel(a))
2275 ta = N.where(upperfcn(a,upperlimit),a,smallest)
2276 return N.maximum.reduce(ta,dimension)
2277
2278
2279 - def atstdev(a,limits=None,inclusive=(1,1)):
2280 """
2281 Returns the standard deviation of all values in an array, ignoring values
2282 strictly outside the sequence passed to 'limits'. Note: either limit
2283 in the sequence, or the value of limits itself, can be set to None. The
2284 inclusive list/tuple determines whether the lower and upper limiting bounds
2285 (respectively) are open/exclusive (0) or closed/inclusive (1).
2286
2287 Usage: atstdev(a,limits=None,inclusive=(1,1))
2288 """
2289 return N.sqrt(tvar(a,limits,inclusive))
2290
2291
2292 - def atsem(a,limits=None,inclusive=(1,1)):
2293 """
2294 Returns the standard error of the mean for the values in an array,
2295 (i.e., using N for the denominator), ignoring values strictly outside
2296 the sequence passed to 'limits'. Note: either limit in the sequence,
2297 or the value of limits itself, can be set to None. The inclusive list/tuple
2298 determines whether the lower and upper limiting bounds (respectively) are
2299 open/exclusive (0) or closed/inclusive (1).
2300
2301 Usage: atsem(a,limits=None,inclusive=(1,1))
2302 """
2303 sd = tstdev(a,limits,inclusive)
2304 if limits == None or limits == [None,None]:
2305 n = float(len(N.ravel(a)))
2306 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atsem"
2307 if inclusive[0]: lowerfcn = N.greater_equal
2308 else: lowerfcn = N.greater
2309 if inclusive[1]: upperfcn = N.less_equal
2310 else: upperfcn = N.less
2311 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2312 raise ValueError, "No array values within given limits (atsem)."
2313 elif limits[0]==None and limits[1]<>None:
2314 mask = upperfcn(a,limits[1])
2315 elif limits[0]<>None and limits[1]==None:
2316 mask = lowerfcn(a,limits[0])
2317 elif limits[0]<>None and limits[1]<>None:
2318 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2319 term1 = N.add.reduce(N.ravel(a*a*mask))
2320 n = float(N.add.reduce(N.ravel(mask)))
2321 return sd/math.sqrt(n)
2322
2323
2324
2325
2326
2327
2328 - def amoment(a,moment=1,dimension=None):
2329 """
2330 Calculates the nth moment about the mean for a sample (defaults to the
2331 1st moment). Generally used to calculate coefficients of skewness and
2332 kurtosis. Dimension can equal None (ravel array first), an integer
2333 (the dimension over which to operate), or a sequence (operate over
2334 multiple dimensions).
2335
2336 Usage: amoment(a,moment=1,dimension=None)
2337 Returns: appropriate moment along given dimension
2338 """
2339 if dimension == None:
2340 a = N.ravel(a)
2341 dimension = 0
2342 if moment == 1:
2343 return 0.0
2344 else:
2345 mn = amean(a,dimension,1)
2346 s = N.power((a-mn),moment)
2347 return amean(s,dimension)
2348
2349
2351 """
2352 Returns the coefficient of variation, as defined in CRC Standard
2353 Probability and Statistics, p.6. Dimension can equal None (ravel array
2354 first), an integer (the dimension over which to operate), or a
2355 sequence (operate over multiple dimensions).
2356
2357 Usage: avariation(a,dimension=None)
2358 """
2359 return 100.0*asamplestdev(a,dimension)/amean(a,dimension)
2360
2361
2362 - def askew(a,dimension=None):
2363 """
2364 Returns the skewness of a distribution (normal ==> 0.0; >0 means extra
2365 weight in left tail). Use askewtest() to see if it's close enough.
2366 Dimension can equal None (ravel array first), an integer (the
2367 dimension over which to operate), or a sequence (operate over multiple
2368 dimensions).
2369
2370 Usage: askew(a, dimension=None)
2371 Returns: skew of vals in a along dimension, returning ZERO where all vals equal
2372 """
2373 denom = N.power(amoment(a,2,dimension),1.5)
2374 zero = N.equal(denom,0)
2375 if type(denom) == N.ArrayType and asum(zero) <> 0:
2376 print "Number of zeros in askew: ",asum(zero)
2377 denom = denom + zero
2378 return N.where(zero, 0, amoment(a,3,dimension)/denom)
2379
2380
2382 """
2383 Returns the kurtosis of a distribution (normal ==> 3.0; >3 means
2384 heavier in the tails, and usually more peaked). Use akurtosistest()
2385 to see if it's close enough. Dimension can equal None (ravel array
2386 first), an integer (the dimension over which to operate), or a
2387 sequence (operate over multiple dimensions).
2388
2389 Usage: akurtosis(a,dimension=None)
2390 Returns: kurtosis of values in a along dimension, and ZERO where all vals equal
2391 """
2392 denom = N.power(amoment(a,2,dimension),2)
2393 zero = N.equal(denom,0)
2394 if type(denom) == N.ArrayType and asum(zero) <> 0:
2395 print "Number of zeros in akurtosis: ",asum(zero)
2396 denom = denom + zero
2397 return N.where(zero,0,amoment(a,4,dimension)/denom)
2398
2399
2401 """
2402 Returns several descriptive statistics of the passed array. Dimension
2403 can equal None (ravel array first), an integer (the dimension over
2404 which to operate), or a sequence (operate over multiple dimensions).
2405
2406 Usage: adescribe(inarray,dimension=None)
2407 Returns: n, (min,max), mean, standard deviation, skew, kurtosis
2408 """
2409 if dimension == None:
2410 inarray = N.ravel(inarray)
2411 dimension = 0
2412 n = inarray.shape[dimension]
2413 mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray))
2414 m = amean(inarray,dimension)
2415 sd = astdev(inarray,dimension)
2416 skew = askew(inarray,dimension)
2417 kurt = akurtosis(inarray,dimension)
2418 return n, mm, m, sd, skew, kurt
2419
2420
2421
2422
2423
2424
2426 """
2427 Tests whether the skew is significantly different from a normal
2428 distribution. Dimension can equal None (ravel array first), an
2429 integer (the dimension over which to operate), or a sequence (operate
2430 over multiple dimensions).
2431
2432 Usage: askewtest(a,dimension=None)
2433 Returns: z-score and 2-tail z-probability
2434 """
2435 if dimension == None:
2436 a = N.ravel(a)
2437 dimension = 0
2438 b2 = askew(a,dimension)
2439 n = float(a.shape[dimension])
2440 y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)) )
2441 beta2 = ( 3.0*(n*n+27*n-70)*(n+1)*(n+3) ) / ( (n-2.0)*(n+5)*(n+7)*(n+9) )
2442 W2 = -1 + N.sqrt(2*(beta2-1))
2443 delta = 1/N.sqrt(N.log(N.sqrt(W2)))
2444 alpha = N.sqrt(2/(W2-1))
2445 y = N.where(N.equal(y,0),1,y)
2446 Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1))
2447 return Z, (1.0-zprob(Z))*2
2448
2449
2451 """
2452 Tests whether a dataset has normal kurtosis (i.e.,
2453 kurtosis=3(n-1)/(n+1)) Valid only for n>20. Dimension can equal None
2454 (ravel array first), an integer (the dimension over which to operate),
2455 or a sequence (operate over multiple dimensions).
2456
2457 Usage: akurtosistest(a,dimension=None)
2458 Returns: z-score and 2-tail z-probability, returns 0 for bad pixels
2459 """
2460 if dimension == None:
2461 a = N.ravel(a)
2462 dimension = 0
2463 n = float(a.shape[dimension])
2464 if n<20:
2465 print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n
2466 b2 = akurtosis(a,dimension)
2467 E = 3.0*(n-1) /(n+1)
2468 varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5))
2469 x = (b2-E)/N.sqrt(varb2)
2470 sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))/
2471 (n*(n-2)*(n-3)))
2472 A = 6.0 + 8.0/sqrtbeta1 *(2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2)))
2473 term1 = 1 -2/(9.0*A)
2474 denom = 1 +x*N.sqrt(2/(A-4.0))
2475 denom = N.where(N.less(denom,0), 99, denom)
2476 term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0))
2477 Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A))
2478 Z = N.where(N.equal(denom,99), 0, Z)
2479 return Z, (1.0-zprob(Z))*2
2480
2481
2483 """
2484 Tests whether skew and/OR kurtosis of dataset differs from normal
2485 curve. Can operate over multiple dimensions. Dimension can equal
2486 None (ravel array first), an integer (the dimension over which to
2487 operate), or a sequence (operate over multiple dimensions).
2488
2489 Usage: anormaltest(a,dimension=None)
2490 Returns: z-score and 2-tail probability
2491 """
2492 if dimension == None:
2493 a = N.ravel(a)
2494 dimension = 0
2495 s,p = askewtest(a,dimension)
2496 k,p = akurtosistest(a,dimension)
2497 k2 = N.power(s,2) + N.power(k,2)
2498 return k2, achisqprob(k2,2)
2499
2500
2501
2502
2503
2504
2506 """
2507 Returns a 2D array of item frequencies. Column 1 contains item values,
2508 column 2 contains their respective counts. Assumes a 1D array is passed.
2509
2510 Usage: aitemfreq(a)
2511 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
2512 """
2513 scores = pstat.aunique(a)
2514 scores = N.sort(scores)
2515 freq = N.zeros(len(scores))
2516 for i in range(len(scores)):
2517 freq[i] = N.add.reduce(N.equal(a,scores[i]))
2518 return N.array(pstat.aabut(scores, freq))
2519
2520
2522 """
2523 Usage: ascoreatpercentile(inarray,percent) 0<percent<100
2524 Returns: score at given percentile, relative to inarray distribution
2525 """
2526 percent = percent / 100.0
2527 targetcf = percent*len(inarray)
2528 h, lrl, binsize, extras = histogram(inarray)
2529 cumhist = cumsum(h*1)
2530 for i in range(len(cumhist)):
2531 if cumhist[i] >= targetcf:
2532 break
2533 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
2534 return score
2535
2536
2538 """
2539 Note: result of this function depends on the values used to histogram
2540 the data(!).
2541
2542 Usage: apercentileofscore(inarray,score,histbins=10,defaultlimits=None)
2543 Returns: percentile-position of score (0-100) relative to inarray
2544 """
2545 h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits)
2546 cumhist = cumsum(h*1)
2547 i = int((score - lrl)/float(binsize))
2548 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100
2549 return pct
2550
2551
2552 - def ahistogram (inarray,numbins=10,defaultlimits=None,printextras=1):
2553 """
2554 Returns (i) an array of histogram bin counts, (ii) the smallest value
2555 of the histogram binning, and (iii) the bin width (the last 2 are not
2556 necessarily integers). Default number of bins is 10. Defaultlimits
2557 can be None (the routine picks bins spanning all the numbers in the
2558 inarray) or a 2-sequence (lowerlimit, upperlimit). Returns all of the
2559 following: array of bin values, lowerreallimit, binsize, extrapoints.
2560
2561 Usage: ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1)
2562 Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range)
2563 """
2564 inarray = N.ravel(inarray)
2565 if (defaultlimits <> None):
2566 lowerreallimit = defaultlimits[0]
2567 upperreallimit = defaultlimits[1]
2568 binsize = (upperreallimit-lowerreallimit) / float(numbins)
2569 else:
2570 Min = N.minimum.reduce(inarray)
2571 Max = N.maximum.reduce(inarray)
2572 estbinwidth = float(Max - Min)/float(numbins) + 1
2573 binsize = (Max-Min+estbinwidth)/float(numbins)
2574 lowerreallimit = Min - binsize/2.0
2575 bins = N.zeros(numbins)
2576 extrapoints = 0
2577 for num in inarray:
2578 try:
2579 if (num-lowerreallimit) < 0:
2580 extrapoints = extrapoints + 1
2581 else:
2582 bintoincrement = int((num-lowerreallimit) / float(binsize))
2583 bins[bintoincrement] = bins[bintoincrement] + 1
2584 except:
2585 extrapoints = extrapoints + 1
2586 if (extrapoints > 0 and printextras == 1):
2587 print '\nPoints outside given histogram range =',extrapoints
2588 return (bins, lowerreallimit, binsize, extrapoints)
2589
2590
2591 - def acumfreq(a,numbins=10,defaultreallimits=None):
2592 """
2593 Returns a cumulative frequency histogram, using the histogram function.
2594 Defaultreallimits can be None (use all data), or a 2-sequence containing
2595 lower and upper limits on values to include.
2596
2597 Usage: acumfreq(a,numbins=10,defaultreallimits=None)
2598 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2599 """
2600 h,l,b,e = histogram(a,numbins,defaultreallimits)
2601 cumhist = cumsum(h*1)
2602 return cumhist,l,b,e
2603
2604
2605 - def arelfreq(a,numbins=10,defaultreallimits=None):
2606 """
2607 Returns a relative frequency histogram, using the histogram function.
2608 Defaultreallimits can be None (use all data), or a 2-sequence containing
2609 lower and upper limits on values to include.
2610
2611 Usage: arelfreq(a,numbins=10,defaultreallimits=None)
2612 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2613 """
2614 h,l,b,e = histogram(a,numbins,defaultreallimits)
2615 h = N.array(h/float(a.shape[0]))
2616 return h,l,b,e
2617
2618
2619
2620
2621
2622
2659
2660
2661 - def asamplevar (inarray,dimension=None,keepdims=0):
2662 """
2663 Returns the sample standard deviation of the values in the passed
2664 array (i.e., using N). Dimension can equal None (ravel array first),
2665 an integer (the dimension over which to operate), or a sequence
2666 (operate over multiple dimensions). Set keepdims=1 to return an array
2667 with the same number of dimensions as inarray.
2668
2669 Usage: asamplevar(inarray,dimension=None,keepdims=0)
2670 """
2671 if dimension == None:
2672 inarray = N.ravel(inarray)
2673 dimension = 0
2674 if dimension == 1:
2675 mn = amean(inarray,dimension)[:,N.NewAxis]
2676 else:
2677 mn = amean(inarray,dimension,keepdims=1)
2678 deviations = inarray - mn
2679 if type(dimension) == ListType:
2680 n = 1
2681 for d in dimension:
2682 n = n*inarray.shape[d]
2683 else:
2684 n = inarray.shape[dimension]
2685 svar = ass(deviations,dimension,keepdims) / float(n)
2686 return svar
2687
2688
2690 """
2691 Returns the sample standard deviation of the values in the passed
2692 array (i.e., using N). Dimension can equal None (ravel array first),
2693 an integer (the dimension over which to operate), or a sequence
2694 (operate over multiple dimensions). Set keepdims=1 to return an array
2695 with the same number of dimensions as inarray.
2696
2697 Usage: asamplestdev(inarray,dimension=None,keepdims=0)
2698 """
2699 return N.sqrt(asamplevar(inarray,dimension,keepdims))
2700
2701
2703 """
2704 Calculates signal-to-noise. Dimension can equal None (ravel array
2705 first), an integer (the dimension over which to operate), or a
2706 sequence (operate over multiple dimensions).
2707
2708 Usage: asignaltonoise(instack,dimension=0):
2709 Returns: array containing the value of (mean/stdev) along dimension, or 0 when stdev=0
2710 """
2711 m = mean(instack,dimension)
2712 sd = stdev(instack,dimension)
2713 return N.where(N.equal(sd,0),0,m/sd)
2714
2715
2716 - def avar (inarray, dimension=None,keepdims=0):
2717 """
2718 Returns the estimated population variance of the values in the passed
2719 array (i.e., N-1). Dimension can equal None (ravel array first), an
2720 integer (the dimension over which to operate), or a sequence (operate
2721 over multiple dimensions). Set keepdims=1 to return an array with the
2722 same number of dimensions as inarray.
2723
2724 Usage: avar(inarray,dimension=None,keepdims=0)
2725 """
2726 if dimension == None:
2727 inarray = N.ravel(inarray)
2728 dimension = 0
2729 mn = amean(inarray,dimension,1)
2730 deviations = inarray - mn
2731 if type(dimension) == ListType:
2732 n = 1
2733 for d in dimension:
2734 n = n*inarray.shape[d]
2735 else:
2736 n = inarray.shape[dimension]
2737 var = ass(deviations,dimension,keepdims)/float(n-1)
2738 return var
2739
2740
2741 - def astdev (inarray, dimension=None, keepdims=0):
2742 """
2743 Returns the estimated population standard deviation of the values in
2744 the passed array (i.e., N-1). Dimension can equal None (ravel array
2745 first), an integer (the dimension over which to operate), or a
2746 sequence (operate over multiple dimensions). Set keepdims=1 to return
2747 an array with the same number of dimensions as inarray.
2748
2749 Usage: astdev(inarray,dimension=None,keepdims=0)
2750 """
2751 return N.sqrt(avar(inarray,dimension,keepdims))
2752
2753
2754 - def asterr (inarray, dimension=None, keepdims=0):
2755 """
2756 Returns the estimated population standard error of the values in the
2757 passed array (i.e., N-1). Dimension can equal None (ravel array
2758 first), an integer (the dimension over which to operate), or a
2759 sequence (operate over multiple dimensions). Set keepdims=1 to return
2760 an array with the same number of dimensions as inarray.
2761
2762 Usage: asterr(inarray,dimension=None,keepdims=0)
2763 """
2764 if dimension == None:
2765 inarray = N.ravel(inarray)
2766 dimension = 0
2767 return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension]))
2768
2769
2770 - def asem (inarray, dimension=None, keepdims=0):
2771 """
2772 Returns the standard error of the mean (i.e., using N) of the values
2773 in the passed array. Dimension can equal None (ravel array first), an
2774 integer (the dimension over which to operate), or a sequence (operate
2775 over multiple dimensions). Set keepdims=1 to return an array with the
2776 same number of dimensions as inarray.
2777
2778 Usage: asem(inarray,dimension=None, keepdims=0)
2779 """
2780 if dimension == None:
2781 inarray = N.ravel(inarray)
2782 dimension = 0
2783 if type(dimension) == ListType:
2784 n = 1
2785 for d in dimension:
2786 n = n*inarray.shape[d]
2787 else:
2788 n = inarray.shape[dimension]
2789 s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1)
2790 return s
2791
2792
2793 - def az (a, score):
2794 """
2795 Returns the z-score of a given input score, given thearray from which
2796 that score came. Not appropriate for population calculations, nor for
2797 arrays > 1D.
2798
2799 Usage: az(a, score)
2800 """
2801 z = (score-amean(a)) / asamplestdev(a)
2802 return z
2803
2804
2806 """
2807 Returns a 1D array of z-scores, one for each score in the passed array,
2808 computed relative to the passed array.
2809
2810 Usage: azs(a)
2811 """
2812 zscores = []
2813 for item in a:
2814 zscores.append(z(a,item))
2815 return N.array(zscores)
2816
2817
2818 - def azmap (scores, compare, dimension=0):
2819 """
2820 Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to
2821 array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0
2822 of the compare array.
2823
2824 Usage: azs(scores, compare, dimension=0)
2825 """
2826 mns = amean(compare,dimension)
2827 sstd = asamplestdev(compare,0)
2828 return (scores - mns) / sstd
2829
2830
2831
2832
2833
2834
2836 """
2837 Rounds all values in array a to 'digits' decimal places.
2838
2839 Usage: around(a,digits)
2840 Returns: a, where each value is rounded to 'digits' decimals
2841 """
2842 def ar(x,d=digits):
2843 return round(x,d)
2844
2845 if type(a) <> N.ArrayType:
2846 try:
2847 a = N.array(a)
2848 except:
2849 a = N.array(a,'O')
2850 shp = a.shape
2851 if a.typecode() in ['f','F','d','D']:
2852 b = N.ravel(a)
2853 b = N.array(map(ar,b))
2854 b.shape = shp
2855 elif a.typecode() in ['o','O']:
2856 b = N.ravel(a)*1
2857 for i in range(len(b)):
2858 if type(b[i]) == FloatType:
2859 b[i] = round(b[i],digits)
2860 b.shape = shp
2861 else:
2862 b = a*1
2863 return b
2864
2865
2866 - def athreshold(a,threshmin=None,threshmax=None,newval=0):
2867 """
2868 Like Numeric.clip() except that values <threshmid or >threshmax are replaced
2869 by newval instead of by threshmin/threshmax (respectively).
2870
2871 Usage: athreshold(a,threshmin=None,threshmax=None,newval=0)
2872 Returns: a, with values <threshmin or >threshmax replaced with newval
2873 """
2874 mask = N.zeros(a.shape)
2875 if threshmin <> None:
2876 mask = mask + N.where(N.less(a,threshmin),1,0)
2877 if threshmax <> None:
2878 mask = mask + N.where(N.greater(a,threshmax),1,0)
2879 mask = N.clip(mask,0,1)
2880 return N.where(mask,newval,a)
2881
2882
2884 """
2885 Slices off the passed proportion of items from BOTH ends of the passed
2886 array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND
2887 'rightmost' 10% of scores. You must pre-sort the array if you want
2888 "proper" trimming. Slices off LESS if proportion results in a
2889 non-integer slice index (i.e., conservatively slices off
2890 proportiontocut).
2891
2892 Usage: atrimboth (a,proportiontocut)
2893 Returns: trimmed version of array a
2894 """
2895 lowercut = int(proportiontocut*len(a))
2896 uppercut = len(a) - lowercut
2897 return a[lowercut:uppercut]
2898
2899
2900 - def atrim1 (a,proportiontocut,tail='right'):
2901 """
2902 Slices off the passed proportion of items from ONE end of the passed
2903 array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
2904 10% of scores). Slices off LESS if proportion results in a non-integer
2905 slice index (i.e., conservatively slices off proportiontocut).
2906
2907 Usage: atrim1(a,proportiontocut,tail='right') or set tail='left'
2908 Returns: trimmed version of array a
2909 """
2910 if string.lower(tail) == 'right':
2911 lowercut = 0
2912 uppercut = len(a) - int(proportiontocut*len(a))
2913 elif string.lower(tail) == 'left':
2914 lowercut = int(proportiontocut*len(a))
2915 uppercut = len(a)
2916 return a[lowercut:uppercut]
2917
2918
2919
2920
2921
2922
2924 """
2925 Computes the covariance matrix of a matrix X. Requires a 2D matrix input.
2926
2927 Usage: acovariance(X)
2928 Returns: covariance matrix of X
2929 """
2930 if len(X.shape) <> 2:
2931 raise TypeError, "acovariance requires 2D matrices"
2932 n = X.shape[0]
2933 mX = amean(X,0)
2934 return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX)
2935
2936
2938 """
2939 Computes the correlation matrix of a matrix X. Requires a 2D matrix input.
2940
2941 Usage: acorrelation(X)
2942 Returns: correlation matrix of X
2943 """
2944 C = acovariance(X)
2945 V = N.diagonal(C)
2946 return C / N.sqrt(N.multiply.outer(V,V))
2947
2948
2950 """
2951 Interactively determines the type of data in x and y, and then runs the
2952 appropriated statistic for paired group data.
2953
2954 Usage: apaired(x,y) x,y = the two arrays of values to be compared
2955 Returns: appropriate statistic name, value, and probability
2956 """
2957 samples = ''
2958 while samples not in ['i','r','I','R','c','C']:
2959 print '\nIndependent or related samples, or correlation (i,r,c): ',
2960 samples = raw_input()
2961
2962 if samples in ['i','I','r','R']:
2963 print '\nComparing variances ...',
2964
2965 r = obrientransform(x,y)
2966 f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
2967 if p<0.05:
2968 vartype='unequal, p='+str(round(p,4))
2969 else:
2970 vartype='equal'
2971 print vartype
2972 if samples in ['i','I']:
2973 if vartype[0]=='e':
2974 t,p = ttest_ind(x,y,None,0)
2975 print '\nIndependent samples t-test: ', round(t,4),round(p,4)
2976 else:
2977 if len(x)>20 or len(y)>20:
2978 z,p = ranksums(x,y)
2979 print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
2980 else:
2981 u,p = mannwhitneyu(x,y)
2982 print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
2983
2984 else:
2985 if vartype[0]=='e':
2986 t,p = ttest_rel(x,y,0)
2987 print '\nRelated samples t-test: ', round(t,4),round(p,4)
2988 else:
2989 t,p = ranksums(x,y)
2990 print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
2991 else:
2992 corrtype = ''
2993 while corrtype not in ['c','C','r','R','d','D']:
2994 print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
2995 corrtype = raw_input()
2996 if corrtype in ['c','C']:
2997 m,b,r,p,see = linregress(x,y)
2998 print '\nLinear regression for continuous variables ...'
2999 lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
3000 pstat.printcc(lol)
3001 elif corrtype in ['r','R']:
3002 r,p = spearmanr(x,y)
3003 print '\nCorrelation for ranked variables ...'
3004 print "Spearman's r: ",round(r,4),round(p,4)
3005 else:
3006 r,p = pointbiserialr(x,y)
3007 print '\nAssuming x contains a dichotomous variable ...'
3008 print 'Point Biserial r: ',round(r,4),round(p,4)
3009 print '\n\n'
3010 return None
3011
3012
3014 """
3015 Calculates a Pearson correlation coefficient and returns p. Taken
3016 from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195.
3017
3018 Usage: apearsonr(x,y,verbose=1) where x,y are equal length arrays
3019 Returns: Pearson's r, two-tailed p-value
3020 """
3021 TINY = 1.0e-20
3022 n = len(x)
3023 xmean = amean(x)
3024 ymean = amean(y)
3025 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3026 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3027 r = (r_num / r_den)
3028 df = n-2
3029 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3030 prob = abetai(0.5*df,0.5,df/(df+t*t),verbose)
3031 return r,prob
3032
3033
3035 """
3036 Calculates a Spearman rank-order correlation coefficient. Taken
3037 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
3038
3039 Usage: aspearmanr(x,y) where x,y are equal-length arrays
3040 Returns: Spearman's r, two-tailed p-value
3041 """
3042 TINY = 1e-30
3043 n = len(x)
3044 rankx = rankdata(x)
3045 ranky = rankdata(y)
3046 dsq = N.add.reduce((rankx-ranky)**2)
3047 rs = 1 - 6*dsq / float(n*(n**2-1))
3048 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
3049 df = n-2
3050 probrs = abetai(0.5*df,0.5,df/(df+t*t))
3051
3052
3053 return rs, probrs
3054
3055
3057 """
3058 Calculates a point-biserial correlation coefficient and the associated
3059 probability value. Taken from Heiman's Basic Statistics for the Behav.
3060 Sci (1st), p.194.
3061
3062 Usage: apointbiserialr(x,y) where x,y are equal length arrays
3063 Returns: Point-biserial r, two-tailed p-value
3064 """
3065 TINY = 1e-30
3066 categories = pstat.aunique(x)
3067 data = pstat.aabut(x,y)
3068 if len(categories) <> 2:
3069 raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()."
3070 else:
3071 codemap = pstat.aabut(categories,N.arange(2))
3072 recoded = pstat.arecode(data,codemap,0)
3073 x = pstat.alinexand(data,0,categories[0])
3074 y = pstat.alinexand(data,0,categories[1])
3075 xmean = amean(pstat.acolex(x,1))
3076 ymean = amean(pstat.acolex(y,1))
3077 n = len(data)
3078 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
3079 rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust
3080 df = n-2
3081 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
3082 prob = abetai(0.5*df,0.5,df/(df+t*t))
3083 return rpb, prob
3084
3085
3087 """
3088 Calculates Kendall's tau ... correlation of ordinal data. Adapted
3089 from function kendl1 in Numerical Recipies. Needs good test-cases.@@@
3090
3091 Usage: akendalltau(x,y)
3092 Returns: Kendall's tau, two-tailed p-value
3093 """
3094 n1 = 0
3095 n2 = 0
3096 iss = 0
3097 for j in range(len(x)-1):
3098 for k in range(j,len(y)):
3099 a1 = x[j] - x[k]
3100 a2 = y[j] - y[k]
3101 aa = a1 * a2
3102 if (aa):
3103 n1 = n1 + 1
3104 n2 = n2 + 1
3105 if aa > 0:
3106 iss = iss + 1
3107 else:
3108 iss = iss -1
3109 else:
3110 if (a1):
3111 n1 = n1 + 1
3112 else:
3113 n2 = n2 + 1
3114 tau = iss / math.sqrt(n1*n2)
3115 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
3116 z = tau / math.sqrt(svar)
3117 prob = erfcc(abs(z)/1.4142136)
3118 return tau, prob
3119
3120
3122 """
3123 Calculates a regression line on two arrays, x and y, corresponding to x,y
3124 pairs. If a single 2D array is passed, alinregress finds dim with 2 levels
3125 and splits data into x,y pairs along that dim.
3126
3127 Usage: alinregress(*args) args=2 equal-length arrays, or one 2D array
3128 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate
3129 """
3130 TINY = 1.0e-20
3131 if len(args) == 1:
3132 args = args[0]
3133 if len(args) == 2:
3134 x = args[0]
3135 y = args[1]
3136 else:
3137 x = args[:,0]
3138 y = args[:,1]
3139 else:
3140 x = args[0]
3141 y = args[1]
3142 n = len(x)
3143 xmean = amean(x)
3144 ymean = amean(y)
3145 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3146 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3147 r = r_num / r_den
3148 z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
3149 df = n-2
3150 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3151 prob = abetai(0.5*df,0.5,df/(df+t*t))
3152 slope = r_num / (float(n)*ass(x) - asquare_of_sums(x))
3153 intercept = ymean - slope*xmean
3154 sterrest = math.sqrt(1-r*r)*asamplestdev(y)
3155 return slope, intercept, r, prob, sterrest
3156
3157
3158
3159
3160
3161
3162 - def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
3163 """
3164 Calculates the t-obtained for the independent samples T-test on ONE group
3165 of scores a, given a population mean. If printit=1, results are printed
3166 to the screen. If printit='filename', the results are output to 'filename'
3167 using the given writemode (default=append). Returns t-value, and prob.
3168
3169 Usage: attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
3170 Returns: t-value, two-tailed prob
3171 """
3172 if type(a) != N.ArrayType:
3173 a = N.array(a)
3174 x = amean(a)
3175 v = avar(a)
3176 n = len(a)
3177 df = n-1
3178 svar = ((n-1)*v) / float(df)
3179 t = (x-popmean)/math.sqrt(svar*(1.0/n))
3180 prob = abetai(0.5*df,0.5,df/(df+t*t))
3181
3182 if printit <> 0:
3183 statname = 'Single-sample T-test.'
3184 outputpairedstats(printit,writemode,
3185 'Population','--',popmean,0,0,0,
3186 name,n,x,v,N.minimum.reduce(N.ravel(a)),
3187 N.maximum.reduce(N.ravel(a)),
3188 statname,t,prob)
3189 return t,prob
3190
3191
3192 - def attest_ind (a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'):
3193 """
3194 Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores
3195 a, and b. From Numerical Recipies, p.483. If printit=1, results are
3196 printed to the screen. If printit='filename', the results are output
3197 to 'filename' using the given writemode (default=append). Dimension
3198 can equal None (ravel array first), or an integer (the dimension over
3199 which to operate on a and b).
3200
3201 Usage::
3202 attest_ind (a,b,dimension=None,printit=0,
3203 Name1='Samp1',Name2='Samp2',writemode='a')
3204 Returns: t-value, two-tailed p-value
3205 """
3206 if dimension == None:
3207 a = N.ravel(a)
3208 b = N.ravel(b)
3209 dimension = 0
3210 x1 = amean(a,dimension)
3211 x2 = amean(b,dimension)
3212 v1 = avar(a,dimension)
3213 v2 = avar(b,dimension)
3214 n1 = a.shape[dimension]
3215 n2 = b.shape[dimension]
3216 df = n1+n2-2
3217 svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
3218 zerodivproblem = N.equal(svar,0)
3219 t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2))
3220 t = N.where(zerodivproblem,1.0,t)
3221 probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3222
3223 if type(t) == N.ArrayType:
3224 probs = N.reshape(probs,t.shape)
3225 if len(probs) == 1:
3226 probs = probs[0]
3227
3228 if printit <> 0:
3229 if type(t) == N.ArrayType:
3230 t = t[0]
3231 if type(probs) == N.ArrayType:
3232 probs = probs[0]
3233 statname = 'Independent samples T-test.'
3234 outputpairedstats(printit,writemode,
3235 name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)),
3236 N.maximum.reduce(N.ravel(a)),
3237 name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)),
3238 N.maximum.reduce(N.ravel(b)),
3239 statname,t,probs)
3240 return
3241 return t, probs
3242
3243
3244 - def attest_rel (a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'):
3245 """
3246 Calculates the t-obtained T-test on TWO RELATED samples of scores, a
3247 and b. From Numerical Recipies, p.483. If printit=1, results are
3248 printed to the screen. If printit='filename', the results are output
3249 to 'filename' using the given writemode (default=append). Dimension
3250 can equal None (ravel array first), or an integer (the dimension over
3251 which to operate on a and b).
3252
3253 Usage::
3254 attest_rel(a,b,dimension=None,printit=0,
3255 name1='Samp1',name2='Samp2',writemode='a')
3256 Returns: t-value, two-tailed p-value
3257 """
3258 if dimension == None:
3259 a = N.ravel(a)
3260 b = N.ravel(b)
3261 dimension = 0
3262 if len(a)<>len(b):
3263 raise ValueError, 'Unequal length arrays.'
3264 x1 = amean(a,dimension)
3265 x2 = amean(b,dimension)
3266 v1 = avar(a,dimension)
3267 v2 = avar(b,dimension)
3268 n = a.shape[dimension]
3269 df = float(n-1)
3270 d = (a-b).astype('d')
3271
3272 denom = N.sqrt((n*N.add.reduce(d*d,dimension) - N.add.reduce(d,dimension)**2) /df)
3273 zerodivproblem = N.equal(denom,0)
3274 t = N.add.reduce(d,dimension) / denom
3275 t = N.where(zerodivproblem,1.0,t)
3276 t = N.where(zerodivproblem,1.0,t)
3277 probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3278 if type(t) == N.ArrayType:
3279 probs = N.reshape(probs,t.shape)
3280 if len(probs) == 1:
3281 probs = probs[0]
3282
3283 if printit <> 0:
3284 statname = 'Related samples T-test.'
3285 outputpairedstats(printit,writemode,
3286 name1,n,x1,v1,N.minimum.reduce(N.ravel(a)),
3287 N.maximum.reduce(N.ravel(a)),
3288 name2,n,x2,v2,N.minimum.reduce(N.ravel(b)),
3289 N.maximum.reduce(N.ravel(b)),
3290 statname,t,probs)
3291 return
3292 return t, probs
3293
3294
3296 """
3297 Calculates a one-way chi square for array of observed frequencies and returns
3298 the result. If no expected frequencies are given, the total N is assumed to
3299 be equally distributed across all groups.
3300
3301 Usage: achisquare(f_obs, f_exp=None) f_obs = array of observed cell freq.
3302 Returns: chisquare-statistic, associated p-value
3303 """
3304
3305 k = len(f_obs)
3306 if f_exp == None:
3307 f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.Float)
3308 f_exp = f_exp.astype(N.Float)
3309 chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp)
3310 return chisq, chisqprob(chisq, k-1)
3311
3312
3314 """
3315 Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from
3316 Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc-
3317 like.
3318
3319 Usage: aks_2samp(data1,data2) where data1 and data2 are 1D arrays
3320 Returns: KS D-value, p-value
3321 """
3322 j1 = 0
3323 j2 = 0
3324 fn1 = 0.0
3325 fn2 = 0.0
3326 n1 = data1.shape[0]
3327 n2 = data2.shape[0]
3328 en1 = n1*1
3329 en2 = n2*1
3330 d = N.zeros(data1.shape[1:],N.Float)
3331 data1 = N.sort(data1,0)
3332 data2 = N.sort(data2,0)
3333 while j1 < n1 and j2 < n2:
3334 d1=data1[j1]
3335 d2=data2[j2]
3336 if d1 <= d2:
3337 fn1 = (j1)/float(en1)
3338 j1 = j1 + 1
3339 if d2 <= d1:
3340 fn2 = (j2)/float(en2)
3341 j2 = j2 + 1
3342 dt = (fn2-fn1)
3343 if abs(dt) > abs(d):
3344 d = dt
3345 try:
3346 en = math.sqrt(en1*en2/float(en1+en2))
3347 prob = aksprob((en+0.12+0.11/en)*N.fabs(d))
3348 except:
3349 prob = 1.0
3350 return d, prob
3351
3352
3354 """
3355 Calculates a Mann-Whitney U statistic on the provided scores and
3356 returns the result. Use only when the n in each condition is < 20 and
3357 you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is
3358 significant if the u-obtained is LESS THAN or equal to the critical
3359 value of U.
3360
3361 Usage: amannwhitneyu(x,y) where x,y are arrays of values for 2 conditions
3362 Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
3363 """
3364 n1 = len(x)
3365 n2 = len(y)
3366 ranked = rankdata(N.concatenate((x,y)))
3367 rankx = ranked[0:n1]
3368 ranky = ranked[n1:]
3369 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)
3370 u2 = n1*n2 - u1
3371 bigu = max(u1,u2)
3372 smallu = min(u1,u2)
3373 T = math.sqrt(tiecorrect(ranked))
3374 if T == 0:
3375 raise ValueError, 'All numbers are identical in amannwhitneyu'
3376 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
3377 z = abs((bigu-n1*n2/2.0) / sd)
3378 return smallu, 1.0 - zprob(z)
3379
3380
3382 """
3383 Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests.
3384 See Siegel, S. (1956) Nonparametric Statistics for the Behavioral
3385 Sciences. New York: McGraw-Hill. Code adapted from |Stat rankind.c
3386 code.
3387
3388 Usage: atiecorrect(rankvals)
3389 Returns: T correction factor for U or H
3390 """
3391 sorted,posn = ashellsort(N.array(rankvals))
3392 n = len(sorted)
3393 T = 0.0
3394 i = 0
3395 while (i<n-1):
3396 if sorted[i] == sorted[i+1]:
3397 nties = 1
3398 while (i<n-1) and (sorted[i] == sorted[i+1]):
3399 nties = nties +1
3400 i = i +1
3401 T = T + nties**3 - nties
3402 i = i+1
3403 T = T / float(n**3-n)
3404 return 1.0 - T
3405
3406
3408 """
3409 Calculates the rank sums statistic on the provided scores and returns
3410 the result.
3411
3412 Usage: aranksums(x,y) where x,y are arrays of values for 2 conditions
3413 Returns: z-statistic, two-tailed p-value
3414 """
3415 n1 = len(x)
3416 n2 = len(y)
3417 alldata = N.concatenate((x,y))
3418 ranked = arankdata(alldata)
3419 x = ranked[:n1]
3420 y = ranked[n1:]
3421 s = sum(x)
3422 expected = n1*(n1+n2+1) / 2.0
3423 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
3424 prob = 2*(1.0 -zprob(abs(z)))
3425 return z, prob
3426
3427
3429 """
3430 Calculates the Wilcoxon T-test for related samples and returns the
3431 result. A non-parametric T-test.
3432
3433 Usage: awilcoxont(x,y) where x,y are equal-length arrays for 2 conditions
3434 Returns: t-statistic, two-tailed p-value
3435 """
3436 if len(x) <> len(y):
3437 raise ValueError, 'Unequal N in awilcoxont. Aborting.'
3438 d = x-y
3439 d = N.compress(N.not_equal(d,0),d)
3440 count = len(d)
3441 absd = abs(d)
3442 absranked = arankdata(absd)
3443 r_plus = 0.0
3444 r_minus = 0.0
3445 for i in range(len(absd)):
3446 if d[i] < 0:
3447 r_minus = r_minus + absranked[i]
3448 else:
3449 r_plus = r_plus + absranked[i]
3450 wt = min(r_plus, r_minus)
3451 mn = count * (count+1) * 0.25
3452 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
3453 z = math.fabs(wt-mn) / se
3454 z = math.fabs(wt-mn) / se
3455 prob = 2*(1.0 -zprob(abs(z)))
3456 return wt, prob
3457
3458
3460 """
3461 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
3462 groups, requiring at least 5 subjects in each group. This function
3463 calculates the Kruskal-Wallis H and associated p-value for 3 or more
3464 independent samples.
3465
3466 Usage: akruskalwallish(*args) args are separate arrays for 3+ conditions
3467 Returns: H-statistic (corrected for ties), associated p-value
3468 """
3469 assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()"
3470 args = list(args)
3471 n = [0]*len(args)
3472 n = map(len,args)
3473 all = []
3474 for i in range(len(args)):
3475 all = all + args[i].tolist()
3476 ranked = rankdata(all)
3477 T = tiecorrect(ranked)
3478 for i in range(len(args)):
3479 args[i] = ranked[0:n[i]]
3480 del ranked[0:n[i]]
3481 rsums = []
3482 for i in range(len(args)):
3483 rsums.append(sum(args[i])**2)
3484 rsums[i] = rsums[i] / float(n[i])
3485 ssbn = sum(rsums)
3486 totaln = sum(n)
3487 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
3488 df = len(args) - 1
3489 if T == 0:
3490 raise ValueError, 'All numbers are identical in akruskalwallish'
3491 h = h / float(T)
3492 return h, chisqprob(h,df)
3493
3494
3496 """
3497 Friedman Chi-Square is a non-parametric, one-way within-subjects
3498 ANOVA. This function calculates the Friedman Chi-square test for
3499 repeated measures and returns the result, along with the associated
3500 probability value. It assumes 3 or more repeated measures. Only 3
3501 levels requires a minimum of 10 subjects in the study. Four levels
3502 requires 5 subjects per level(??).
3503
3504 Usage: afriedmanchisquare(*args) args are separate arrays for 2+ conditions
3505 Returns: chi-square statistic, associated p-value
3506 """
3507 k = len(args)
3508 if k < 3:
3509 raise ValueError, '\nLess than 3 levels. Friedman test not appropriate.\n'
3510 n = len(args[0])
3511 data = apply(pstat.aabut,args)
3512 data = data.astype(N.Float)
3513 for i in range(len(data)):
3514 data[i] = arankdata(data[i])
3515 ssbn = asum(asum(args,1)**2)
3516 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
3517 return chisq, chisqprob(chisq,k-1)
3518
3519
3520
3521
3522
3523
3525 """
3526 Returns the (1-tail) probability value associated with the provided chi-square
3527 value and df. Heavily modified from chisq.c in Gary Perlman's |Stat. Can
3528 handle multiple dimensions.
3529
3530 Usage: achisqprob(chisq,df) chisq=chisquare stat., df=degrees of freedom
3531 """
3532 BIG = 200.0
3533 def ex(x):
3534 BIG = 200.0
3535 exponents = N.where(N.less(x,-BIG),-BIG,x)
3536 return N.exp(exponents)
3537
3538 if type(chisq) == N.ArrayType:
3539 arrayflag = 1
3540 else:
3541 arrayflag = 0
3542 chisq = N.array([chisq])
3543 if df < 1:
3544 return N.ones(chisq.shape,N.float)
3545 probs = N.zeros(chisq.shape,N.Float)
3546 probs = N.where(N.less_equal(chisq,0),1.0,probs)
3547 a = 0.5 * chisq
3548 if df > 1:
3549 y = ex(-a)
3550 if df%2 == 0:
3551 even = 1
3552 s = y*1
3553 s2 = s*1
3554 else:
3555 even = 0
3556 s = 2.0 * azprob(-N.sqrt(chisq))
3557 s2 = s*1
3558 if (df > 2):
3559 chisq = 0.5 * (df - 1.0)
3560 if even:
3561 z = N.ones(probs.shape,N.Float)
3562 else:
3563 z = 0.5 *N.ones(probs.shape,N.Float)
3564 if even:
3565 e = N.zeros(probs.shape,N.Float)
3566 else:
3567 e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.Float)
3568 c = N.log(a)
3569 mask = N.zeros(probs.shape)
3570 a_big = N.greater(a,BIG)
3571 a_big_frozen = -1 *N.ones(probs.shape,N.Float)
3572 totalelements = N.multiply.reduce(N.array(probs.shape))
3573 while asum(mask)<>totalelements:
3574 e = N.log(z) + e
3575 s = s + ex(c*z-a-e)
3576 z = z + 1.0
3577 print z, e, s
3578 newmask = N.greater(z,chisq)
3579 a_big_frozen = N.where(newmask*N.equal(mask,0)*a_big, s, a_big_frozen)
3580 mask = N.clip(newmask+mask,0,1)
3581 if even:
3582 z = N.ones(probs.shape,N.Float)
3583 e = N.ones(probs.shape,N.Float)
3584 else:
3585 z = 0.5 *N.ones(probs.shape,N.Float)
3586 e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.Float)
3587 c = 0.0
3588 mask = N.zeros(probs.shape)
3589 a_notbig_frozen = -1 *N.ones(probs.shape,N.Float)
3590 while asum(mask)<>totalelements:
3591 e = e * (a/z.astype(N.Float))
3592 c = c + e
3593 z = z + 1.0
3594 print '#2', z, e, c, s, c*y+s2
3595 newmask = N.greater(z,chisq)
3596 a_notbig_frozen = N.where(newmask*N.equal(mask,0)*(1-a_big),
3597 c*y+s2, a_notbig_frozen)
3598 mask = N.clip(newmask+mask,0,1)
3599 probs = N.where(N.equal(probs,1),1,
3600 N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen))
3601 return probs
3602 else:
3603 return s
3604
3605
3607 """
3608 Returns the complementary error function erfc(x) with fractional error
3609 everywhere less than 1.2e-7. Adapted from Numerical Recipies. Can
3610 handle multiple dimensions.
3611
3612 Usage: aerfcc(x)
3613 """
3614 z = abs(x)
3615 t = 1.0 / (1.0+0.5*z)
3616 ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
3617 return N.where(N.greater_equal(x,0), ans, 2.0-ans)
3618
3619
3621 """
3622 Returns the area under the normal curve 'to the left of' the given z value.
3623 Thus, ::
3624 for z<0, zprob(z) = 1-tail probability
3625 for z>0, 1.0-zprob(z) = 1-tail probability
3626 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
3627 Adapted from z.c in Gary Perlman's |Stat. Can handle multiple dimensions.
3628
3629 Usage: azprob(z) where z is a z-value
3630 """
3631 def yfunc(y):
3632 x = (((((((((((((-0.000045255659 * y
3633 +0.000152529290) * y -0.000019538132) * y
3634 -0.000676904986) * y +0.001390604284) * y
3635 -0.000794620820) * y -0.002034254874) * y
3636 +0.006549791214) * y -0.010557625006) * y
3637 +0.011630447319) * y -0.009279453341) * y
3638 +0.005353579108) * y -0.002141268741) * y
3639 +0.000535310849) * y +0.999936657524
3640 return x
3641
3642 def wfunc(w):
3643 x = ((((((((0.000124818987 * w
3644 -0.001075204047) * w +0.005198775019) * w
3645 -0.019198292004) * w +0.059054035642) * w
3646 -0.151968751364) * w +0.319152932694) * w
3647 -0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0
3648 return x
3649
3650 Z_MAX = 6.0
3651 x = N.zeros(z.shape,N.Float)
3652 y = 0.5 * N.fabs(z)
3653 x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0))
3654 x = N.where(N.greater(y,Z_MAX*0.5),1.0,x)
3655 prob = N.where(N.greater(z,0),(x+1)*0.5,(1-x)*0.5)
3656 return prob
3657
3658
3660 """
3661 Returns the probability value for a K-S statistic computed via ks_2samp.
3662 Adapted from Numerical Recipies. Can handle multiple dimensions.
3663
3664 Usage: aksprob(alam)
3665 """
3666 if type(alam) == N.ArrayType:
3667 frozen = -1 *N.ones(alam.shape,N.Float64)
3668 alam = alam.astype(N.Float64)
3669 arrayflag = 1
3670 else:
3671 frozen = N.array(-1.)
3672 alam = N.array(alam,N.Float64)
3673 mask = N.zeros(alam.shape)
3674 fac = 2.0 *N.ones(alam.shape,N.Float)
3675 sum = N.zeros(alam.shape,N.Float)
3676 termbf = N.zeros(alam.shape,N.Float)
3677 a2 = N.array(-2.0*alam*alam,N.Float64)
3678 totalelements = N.multiply.reduce(N.array(mask.shape))
3679 for j in range(1,201):
3680 if asum(mask) == totalelements:
3681 break
3682 exponents = (a2*j*j)
3683 overflowmask = N.less(exponents,-746)
3684 frozen = N.where(overflowmask,0,frozen)
3685 mask = mask+overflowmask
3686 term = fac*N.exp(exponents)
3687 sum = sum + term
3688 newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) +
3689 N.less(abs(term),1.0e-8*sum), 1, 0)
3690 frozen = N.where(newmask*N.equal(mask,0), sum, frozen)
3691 mask = N.clip(mask+newmask,0,1)
3692 fac = -fac
3693 termbf = abs(term)
3694 if arrayflag:
3695 return N.where(N.equal(frozen,-1), 1.0, frozen)
3696 else:
3697 return N.where(N.equal(frozen,-1), 1.0, frozen)[0]
3698
3699
3700 - def afprob (dfnum, dfden, F):
3701 """
3702 Returns the 1-tailed significance level (p-value) of an F statistic
3703 given the degrees of freedom for the numerator (dfR-dfF) and the degrees
3704 of freedom for the denominator (dfF). Can handle multiple dims for F.
3705
3706 Usage: afprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
3707 """
3708 if type(F) == N.ArrayType:
3709 return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F))
3710 else:
3711 return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
3712
3713
3715 """
3716 Evaluates the continued fraction form of the incomplete Beta function,
3717 betai. (Adapted from: Numerical Recipies in C.) Can handle multiple
3718 dimensions for x.
3719
3720 Usage: abetacf(a,b,x,verbose=1)
3721 """
3722 ITMAX = 200
3723 EPS = 3.0e-7
3724
3725 arrayflag = 1
3726 if type(x) == N.ArrayType:
3727 frozen = N.ones(x.shape,N.Float) *-1
3728 else:
3729 arrayflag = 0
3730 frozen = N.array([-1])
3731 x = N.array([x])
3732 mask = N.zeros(x.shape)
3733 bm = az = am = 1.0
3734 qab = a+b
3735 qap = a+1.0
3736 qam = a-1.0
3737 bz = 1.0-qab*x/qap
3738 for i in range(ITMAX+1):
3739 if N.sum(N.ravel(N.equal(frozen,-1)))==0:
3740 break
3741 em = float(i+1)
3742 tem = em + em
3743 d = em*(b-em)*x/((qam+tem)*(a+tem))
3744 ap = az + d*am
3745 bp = bz+d*bm
3746 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
3747 app = ap+d*az
3748 bpp = bp+d*bz
3749 aold = az*1
3750 am = ap/bpp
3751 bm = bp/bpp
3752 az = app/bpp
3753 bz = 1.0
3754 newmask = N.less(abs(az-aold),EPS*abs(az))
3755 frozen = N.where(newmask*N.equal(mask,0), az, frozen)
3756 mask = N.clip(mask+newmask,0,1)
3757 noconverge = asum(N.equal(frozen,-1))
3758 if noconverge <> 0 and verbose:
3759 print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements'
3760 if arrayflag:
3761 return frozen
3762 else:
3763 return frozen[0]
3764
3765
3767 """
3768 Returns the gamma function of xx.::
3769 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
3770 Adapted from: Numerical Recipies in C. Can handle multiple dims ... but
3771 probably doesn't normally have to.
3772
3773 Usage: agammln(xx)
3774 """
3775 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
3776 0.120858003e-2, -0.536382e-5]
3777 x = xx - 1.0
3778 tmp = x + 5.5
3779 tmp = tmp - (x+0.5)*N.log(tmp)
3780 ser = 1.0
3781 for j in range(len(coeff)):
3782 x = x + 1
3783 ser = ser + coeff[j]/x
3784 return -tmp + N.log(2.50662827465*ser)
3785
3786
3787 - def abetai(a,b,x,verbose=1):
3788 """
3789 Returns the incomplete beta function::
3790
3791 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
3792
3793 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
3794 function of a. The continued fraction formulation is implemented
3795 here, using the betacf function. (Adapted from: Numerical Recipies in
3796 C.) Can handle multiple dimensions.
3797
3798 Usage: abetai(a,b,x,verbose=1)
3799 """
3800 TINY = 1e-15
3801 if type(a) == N.ArrayType:
3802 if asum(N.less(x,0)+N.greater(x,1)) <> 0:
3803 raise ValueError, 'Bad x in abetai'
3804 x = N.where(N.equal(x,0),TINY,x)
3805 x = N.where(N.equal(x,1.0),1-TINY,x)
3806
3807 bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1)
3808 exponents = ( gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b*
3809 N.log(1.0-x) )
3810
3811 exponents = N.where(N.less(exponents,-740),-740,exponents)
3812 bt = N.exp(exponents)
3813 if type(x) == N.ArrayType:
3814 ans = N.where(N.less(x,(a+1)/(a+b+2.0)),
3815 bt*abetacf(a,b,x,verbose)/float(a),
3816 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b))
3817 else:
3818 if x<(a+1)/(a+b+2.0):
3819 ans = bt*abetacf(a,b,x,verbose)/float(a)
3820 else:
3821 ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)
3822 return ans
3823
3824
3825
3826
3827
3828
3829 import LinearAlgebra, operator
3830 LA = LinearAlgebra
3831
3832 - def aglm(data,para):
3833 """
3834 Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken
3835 from::
3836 Peterson et al. Statistical limitations in functional neuroimaging
3837 I. Non-inferential methods and statistical models. Phil Trans Royal Soc
3838 Lond B 354: 1239-1260.
3839
3840 Usage: aglm(data,para)
3841 Returns: statistic, p-value ???
3842 """
3843 if len(para) <> len(data):
3844 print "data and para must be same length in aglm"
3845 return
3846 n = len(para)
3847 p = pstat.aunique(para)
3848 x = N.zeros((n,len(p)))
3849 for l in range(len(p)):
3850 x[:,l] = N.equal(para,p[l])
3851 b = N.dot(N.dot(LA.inverse(N.dot(N.transpose(x),x)),
3852 N.transpose(x)),
3853 data)
3854 diffs = (data - N.dot(x,b))
3855 s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs)
3856
3857 if len(p) == 2:
3858 c = N.array([1,-1])
3859 df = n-2
3860 fact = asum(1.0/asum(x,0))
3861 t = N.dot(c,b) / N.sqrt(s_sq*fact)
3862 probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3863 return t, probs
3864
3865
3867 """
3868 Performs a 1-way ANOVA, returning an F-value and probability given
3869 any number of groups. From Heiman, pp.394-7.
3870
3871 Usage: aF_oneway (*args) where *args is 2 or more arrays, one per treatment group
3872 Returns: f-value, probability
3873 """
3874 na = len(args)
3875 means = [0]*na
3876 vars = [0]*na
3877 ns = [0]*na
3878 alldata = []
3879 tmp = map(N.array,args)
3880 means = map(amean,tmp)
3881 vars = map(avar,tmp)
3882 ns = map(len,args)
3883 alldata = N.concatenate(args)
3884 bign = len(alldata)
3885 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
3886 ssbn = 0
3887 for a in args:
3888 ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a))
3889 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
3890 sswn = sstot-ssbn
3891 dfbn = na-1
3892 dfwn = bign - na
3893 msb = ssbn/float(dfbn)
3894 msw = sswn/float(dfwn)
3895 f = msb/msw
3896 prob = fprob(dfbn,dfwn,f)
3897 return f, prob
3898
3899
3901 """
3902 Returns an F-statistic given the following::
3903 ER = error associated with the null hypothesis (the Restricted model)
3904 EF = error associated with the alternate hypothesis (the Full model)
3905 dfR = degrees of freedom the Restricted model
3906 dfF = degrees of freedom associated with the Restricted model
3907 """
3908 return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF)))
3909
3910
3912 Enum = round(Enum,3)
3913 Eden = round(Eden,3)
3914 dfnum = round(Enum,3)
3915 dfden = round(dfden,3)
3916 f = round(f,3)
3917 prob = round(prob,3)
3918 suffix = ''
3919 if prob < 0.001: suffix = ' ***'
3920 elif prob < 0.01: suffix = ' **'
3921 elif prob < 0.05: suffix = ' *'
3922 title = [['EF/ER','DF','Mean Square','F-value','prob','']]
3923 lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix],
3924 [Eden, dfden, round(Eden/float(dfden),3),'','','']]
3925 pstat.printcc(lofl)
3926 return
3927
3928
3930 """
3931 Returns an F-statistic given the following::
3932 ER = error associated with the null hypothesis (the Restricted model)
3933 EF = error associated with the alternate hypothesis (the Full model)
3934 dfR = degrees of freedom the Restricted model
3935 dfF = degrees of freedom associated with the Restricted model
3936 where ER and EF are matrices from a multivariate F calculation.
3937 """
3938 if type(ER) in [IntType, FloatType]:
3939 ER = N.array([[ER]])
3940 if type(EF) in [IntType, FloatType]:
3941 EF = N.array([[EF]])
3942 n_um = (LA.determinant(ER) - LA.determinant(EF)) / float(dfnum)
3943 d_en = LA.determinant(EF) / float(dfden)
3944 return n_um / d_en
3945
3946
3947
3948
3949
3950
3952 """
3953 Usage: asign(a)
3954 Returns: array shape of a, with -1 where a<0 and +1 where a>=0
3955 """
3956 a = N.asarray(a)
3957 if ((type(a) == type(1.4)) or (type(a) == type(1))):
3958 return a-a-N.less(a,0)+N.greater(a,0)
3959 else:
3960 return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0)
3961
3962
3963 - def asum (a, dimension=None,keepdims=0):
3964 """
3965 An alternative to the Numeric.add.reduce function, which allows one to
3966 (1) collapse over multiple dimensions at once, and/or (2) to retain
3967 all dimensions in the original array (squashing one down to size.
3968 Dimension can equal None (ravel array first), an integer (the
3969 dimension over which to operate), or a sequence (operate over multiple
3970 dimensions). If keepdims=1, the resulting array will have as many
3971 dimensions as the input array.
3972
3973 Usage: asum(a, dimension=None, keepdims=0)
3974 Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1
3975 """
3976 if type(a) == N.ArrayType and a.typecode() in ['l','s','b']:
3977 a = a.astype(N.Float)
3978 if dimension == None:
3979 s = N.sum(N.ravel(a))
3980 elif type(dimension) in [IntType,FloatType]:
3981 s = N.add.reduce(a, dimension)
3982 if keepdims == 1:
3983 shp = list(a.shape)
3984 shp[dimension] = 1
3985 s = N.reshape(s,shp)
3986 else:
3987 dims = list(dimension)
3988 dims.sort()
3989 dims.reverse()
3990 s = a *1.0
3991 for dim in dims:
3992 s = N.add.reduce(s,dim)
3993 if keepdims == 1:
3994 shp = list(a.shape)
3995 for dim in dims:
3996 shp[dim] = 1
3997 s = N.reshape(s,shp)
3998 return s
3999
4000
4002 """
4003 Returns an array consisting of the cumulative sum of the items in the
4004 passed array. Dimension can equal None (ravel array first), an
4005 integer (the dimension over which to operate), or a sequence (operate
4006 over multiple dimensions, but this last one just barely makes sense).
4007
4008 Usage: acumsum(a,dimension=None)
4009 """
4010 if dimension == None:
4011 a = N.ravel(a)
4012 dimension = 0
4013 if type(dimension) in [ListType, TupleType, N.ArrayType]:
4014 dimension = list(dimension)
4015 dimension.sort()
4016 dimension.reverse()
4017 for d in dimension:
4018 a = N.add.accumulate(a,d)
4019 return a
4020 else:
4021 return N.add.accumulate(a,dimension)
4022
4023
4024 - def ass(inarray, dimension=None, keepdims=0):
4025 """
4026 Squares each value in the passed array, adds these squares & returns
4027 the result. Unfortunate function name. :-) Defaults to ALL values in
4028 the array. Dimension can equal None (ravel array first), an integer
4029 (the dimension over which to operate), or a sequence (operate over
4030 multiple dimensions). Set keepdims=1 to maintain the original number
4031 of dimensions.
4032
4033 Usage: ass(inarray, dimension=None, keepdims=0)
4034 Returns: sum-along-'dimension' for (inarray*inarray)
4035 """
4036 if dimension == None:
4037 inarray = N.ravel(inarray)
4038 dimension = 0
4039 return asum(inarray*inarray,dimension,keepdims)
4040
4041
4042 - def asummult (array1,array2,dimension=None,keepdims=0):
4043 """
4044 Multiplies elements in array1 and array2, element by element, and
4045 returns the sum (along 'dimension') of all resulting multiplications.
4046 Dimension can equal None (ravel array first), an integer (the
4047 dimension over which to operate), or a sequence (operate over multiple
4048 dimensions). A trivial function, but included for completeness.
4049
4050 Usage: asummult(array1,array2,dimension=None,keepdims=0)
4051 """
4052 if dimension == None:
4053 array1 = N.ravel(array1)
4054 array2 = N.ravel(array2)
4055 dimension = 0
4056 return asum(array1*array2,dimension,keepdims)
4057
4058
4060 """
4061 Adds the values in the passed array, squares that sum, and returns the
4062 result. Dimension can equal None (ravel array first), an integer (the
4063 dimension over which to operate), or a sequence (operate over multiple
4064 dimensions). If keepdims=1, the returned array will have the same
4065 NUMBER of dimensions as the original.
4066
4067 Usage: asquare_of_sums(inarray, dimension=None, keepdims=0)
4068 Returns: the square of the sum over dim(s) in dimension
4069 """
4070 if dimension == None:
4071 inarray = N.ravel(inarray)
4072 dimension = 0
4073 s = asum(inarray,dimension,keepdims)
4074 if type(s) == N.ArrayType:
4075 return s.astype(N.Float)*s
4076 else:
4077 return float(s)*s
4078
4079
4081 """
4082 Takes pairwise differences of the values in arrays a and b, squares
4083 these differences, and returns the sum of these squares. Dimension
4084 can equal None (ravel array first), an integer (the dimension over
4085 which to operate), or a sequence (operate over multiple dimensions).
4086 keepdims=1 means the return shape = len(a.shape) = len(b.shape)
4087
4088 Usage: asumdiffsquared(a,b)
4089 Returns: sum[ravel(a-b)**2]
4090 """
4091 if dimension == None:
4092 inarray = N.ravel(a)
4093 dimension = 0
4094 return asum((a-b)**2,dimension,keepdims)
4095
4096
4098 """
4099 Shellsort algorithm. Sorts a 1D-array.
4100
4101 Usage: ashellsort(inarray)
4102 Returns: sorted-inarray, sorting-index-vector (for original array)
4103 """
4104 n = len(inarray)
4105 svec = inarray *1.0
4106 ivec = range(n)
4107 gap = n/2
4108 while gap >0:
4109 for i in range(gap,n):
4110 for j in range(i-gap,-1,-gap):
4111 while j>=0 and svec[j]>svec[j+gap]:
4112 temp = svec[j]
4113 svec[j] = svec[j+gap]
4114 svec[j+gap] = temp
4115 itemp = ivec[j]
4116 ivec[j] = ivec[j+gap]
4117 ivec[j+gap] = itemp
4118 gap = gap / 2
4119
4120 return svec, ivec
4121
4122
4124 """
4125 Ranks the data in inarray, dealing with ties appropritely. Assumes
4126 a 1D inarray. Adapted from Gary Perlman's |Stat ranksort.
4127
4128 Usage: arankdata(inarray)
4129 Returns: array of length equal to inarray, containing rank scores
4130 """
4131 n = len(inarray)
4132 svec, ivec = ashellsort(inarray)
4133 sumranks = 0
4134 dupcount = 0
4135 newarray = N.zeros(n,N.Float)
4136 for i in range(n):
4137 sumranks = sumranks + i
4138 dupcount = dupcount + 1
4139 if i==n-1 or svec[i] <> svec[i+1]:
4140 averank = sumranks / float(dupcount) + 1
4141 for j in range(i-dupcount+1,i+1):
4142 newarray[ivec[j]] = averank
4143 sumranks = 0
4144 dupcount = 0
4145 return newarray
4146
4147
4149 """
4150 Returns a binary vector, 1=within-subject factor, 0=between. Input
4151 equals the entire data array (i.e., column 1=random factor, last
4152 column = measured values.
4153
4154 Usage: afindwithin(data) data in |Stat format
4155 """
4156 numfact = len(data[0])-2
4157 withinvec = [0]*numfact
4158 for col in range(1,numfact+1):
4159 rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0])
4160 if len(pstat.unique(pstat.colex(rows,0))) < len(rows):
4161 withinvec[col-1] = 1
4162 return withinvec
4163
4164
4165
4166
4167
4168
4169
4170
4171
4172 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)),
4173 (ageometricmean, (N.ArrayType,)) )
4174 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)),
4175 (aharmonicmean, (N.ArrayType,)) )
4176 mean = Dispatch ( (lmean, (ListType, TupleType)),
4177 (amean, (N.ArrayType,)) )
4178 median = Dispatch ( (lmedian, (ListType, TupleType)),
4179 (amedian, (N.ArrayType,)) )
4180 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)),
4181 (amedianscore, (N.ArrayType,)) )
4182 mode = Dispatch ( (lmode, (ListType, TupleType)),
4183 (amode, (N.ArrayType,)) )
4184 tmean = Dispatch ( (atmean, (N.ArrayType,)) )
4185 tvar = Dispatch ( (atvar, (N.ArrayType,)) )
4186 tstdev = Dispatch ( (atstdev, (N.ArrayType,)) )
4187 tsem = Dispatch ( (atsem, (N.ArrayType,)) )
4188
4189
4190 moment = Dispatch ( (lmoment, (ListType, TupleType)),
4191 (amoment, (N.ArrayType,)) )
4192 variation = Dispatch ( (lvariation, (ListType, TupleType)),
4193 (avariation, (N.ArrayType,)) )
4194 skew = Dispatch ( (lskew, (ListType, TupleType)),
4195 (askew, (N.ArrayType,)) )
4196 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)),
4197 (akurtosis, (N.ArrayType,)) )
4198 describe = Dispatch ( (ldescribe, (ListType, TupleType)),
4199 (adescribe, (N.ArrayType,)) )
4200
4201
4202
4203 skewtest = Dispatch ( (askewtest, (ListType, TupleType)),
4204 (askewtest, (N.ArrayType,)) )
4205 kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)),
4206 (akurtosistest, (N.ArrayType,)) )
4207 normaltest = Dispatch ( (anormaltest, (ListType, TupleType)),
4208 (anormaltest, (N.ArrayType,)) )
4209
4210
4211 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)),
4212 (aitemfreq, (N.ArrayType,)) )
4213 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)),
4214 (ascoreatpercentile, (N.ArrayType,)) )
4215 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)),
4216 (apercentileofscore, (N.ArrayType,)) )
4217 histogram = Dispatch ( (lhistogram, (ListType, TupleType)),
4218 (ahistogram, (N.ArrayType,)) )
4219 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)),
4220 (acumfreq, (N.ArrayType,)) )
4221 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)),
4222 (arelfreq, (N.ArrayType,)) )
4223
4224
4225 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)),
4226 (aobrientransform, (N.ArrayType,)) )
4227 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)),
4228 (asamplevar, (N.ArrayType,)) )
4229 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)),
4230 (asamplestdev, (N.ArrayType,)) )
4231 signaltonoise = Dispatch( (asignaltonoise, (N.ArrayType,)),)
4232 var = Dispatch ( (lvar, (ListType, TupleType)),
4233 (avar, (N.ArrayType,)) )
4234 stdev = Dispatch ( (lstdev, (ListType, TupleType)),
4235 (astdev, (N.ArrayType,)) )
4236 sterr = Dispatch ( (lsterr, (ListType, TupleType)),
4237 (asterr, (N.ArrayType,)) )
4238 sem = Dispatch ( (lsem, (ListType, TupleType)),
4239 (asem, (N.ArrayType,)) )
4240 z = Dispatch ( (lz, (ListType, TupleType)),
4241 (az, (N.ArrayType,)) )
4242 zs = Dispatch ( (lzs, (ListType, TupleType)),
4243 (azs, (N.ArrayType,)) )
4244
4245
4246 threshold = Dispatch( (athreshold, (N.ArrayType,)),)
4247 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)),
4248 (atrimboth, (N.ArrayType,)) )
4249 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)),
4250 (atrim1, (N.ArrayType,)) )
4251
4252
4253 paired = Dispatch ( (lpaired, (ListType, TupleType)),
4254 (apaired, (N.ArrayType,)) )
4255 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)),
4256 (apearsonr, (N.ArrayType,)) )
4257 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)),
4258 (aspearmanr, (N.ArrayType,)) )
4259 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)),
4260 (apointbiserialr, (N.ArrayType,)) )
4261 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)),
4262 (akendalltau, (N.ArrayType,)) )
4263 linregress = Dispatch ( (llinregress, (ListType, TupleType)),
4264 (alinregress, (N.ArrayType,)) )
4265
4266
4267 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)),
4268 (attest_1samp, (N.ArrayType,)) )
4269 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)),
4270 (attest_ind, (N.ArrayType,)) )
4271 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)),
4272 (attest_rel, (N.ArrayType,)) )
4273 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)),
4274 (achisquare, (N.ArrayType,)) )
4275 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)),
4276 (aks_2samp, (N.ArrayType,)) )
4277 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)),
4278 (amannwhitneyu, (N.ArrayType,)) )
4279 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)),
4280 (atiecorrect, (N.ArrayType,)) )
4281 ranksums = Dispatch ( (lranksums, (ListType, TupleType)),
4282 (aranksums, (N.ArrayType,)) )
4283 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)),
4284 (awilcoxont, (N.ArrayType,)) )
4285 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)),
4286 (akruskalwallish, (N.ArrayType,)) )
4287 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)),
4288 (afriedmanchisquare, (N.ArrayType,)) )
4289
4290
4291 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)),
4292 (achisqprob, (N.ArrayType,)) )
4293 zprob = Dispatch ( (lzprob, (IntType, FloatType)),
4294 (azprob, (N.ArrayType,)) )
4295 ksprob = Dispatch ( (lksprob, (IntType, FloatType)),
4296 (aksprob, (N.ArrayType,)) )
4297 fprob = Dispatch ( (lfprob, (IntType, FloatType)),
4298 (afprob, (N.ArrayType,)) )
4299 betacf = Dispatch ( (lbetacf, (IntType, FloatType)),
4300 (abetacf, (N.ArrayType,)) )
4301 betai = Dispatch ( (lbetai, (IntType, FloatType)),
4302 (abetai, (N.ArrayType,)) )
4303 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)),
4304 (aerfcc, (N.ArrayType,)) )
4305 gammln = Dispatch ( (lgammln, (IntType, FloatType)),
4306 (agammln, (N.ArrayType,)) )
4307
4308
4309 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)),
4310 (aF_oneway, (N.ArrayType,)) )
4311 F_value = Dispatch ( (lF_value, (ListType, TupleType)),
4312 (aF_value, (N.ArrayType,)) )
4313
4314
4315 incr = Dispatch ( (lincr, (ListType, TupleType, N.ArrayType)), )
4316 sum = Dispatch ( (lsum, (ListType, TupleType)),
4317 (asum, (N.ArrayType,)) )
4318 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)),
4319 (acumsum, (N.ArrayType,)) )
4320 ss = Dispatch ( (lss, (ListType, TupleType)),
4321 (ass, (N.ArrayType,)) )
4322 summult = Dispatch ( (lsummult, (ListType, TupleType)),
4323 (asummult, (N.ArrayType,)) )
4324 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)),
4325 (asquare_of_sums, (N.ArrayType,)) )
4326 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)),
4327 (asumdiffsquared, (N.ArrayType,)) )
4328 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)),
4329 (ashellsort, (N.ArrayType,)) )
4330 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)),
4331 (arankdata, (N.ArrayType,)) )
4332 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)),
4333 (afindwithin, (N.ArrayType,)) )
4334
4335
4336
4337
4338
4339 except ImportError:
4340 pass
4341