Package Biskit :: Package Statistics :: Module pstat
Source Code for Module Biskit.Statistics.pstat

   1  # Copyright (c) 1999-2000 Gary Strangman; All Rights Reserved. 
   2  # 
   3  # This software is distributable under the terms of the GNU 
   4  # General Public License (GPL) v2, the text of which can be found at 
   5  # http://www.gnu.org/copyleft/gpl.html. Installing, importing or otherwise 
   6  # using this module constitutes acceptance of the terms of this License. 
   7  # 
   8  # Disclaimer 
   9  #  
  10  # This software is provided "as-is".  There are no expressed or implied 
  11  # warranties of any kind, including, but not limited to, the warranties 
  12  # of merchantability and fittness for a given application.  In no event 
  13  # shall Gary Strangman be liable for any direct, indirect, incidental, 
  14  # special, exemplary or consequential damages (including, but not limited 
  15  # to, loss of use, data or profits, or business interruption) however 
  16  # caused and on any theory of liability, whether in contract, strict 
  17  # liability or tort (including negligence or otherwise) arising in any way 
  18  # out of the use of this software, even if advised of the possibility of 
  19  # such damage. 
  20  # 
  21  # Comments and/or additions are welcome (send e-mail to: 
  22  # strang@nmr.mgh.harvard.edu). 
  23  #  
  24  """ 
  25  pstat.py module 
  26   
  27  :: 
  28   ################################################# 
  29   #######  Written by:  Gary Strangman  ########### 
  30   #######  Last modified:  Jun 29, 2001 ########### 
  31   ################################################# 
  32   
  33  This module provides some useful list and array manipulation routines 
  34  modeled after those found in the |Stat package by Gary Perlman, plus a 
  35  number of other useful list/file manipulation functions. The list-based 
  36  functions include:: 
  37   
  38        abut (source,*args) 
  39        simpleabut (source, addon) 
  40        colex (listoflists,cnums) 
  41        collapse (listoflists,keepcols,collapsecols,fcn1=None,fcn2=None,cfcn=None) 
  42        dm (listoflists,criterion) 
  43        flat (l) 
  44        linexand (listoflists,columnlist,valuelist) 
  45        linexor (listoflists,columnlist,valuelist) 
  46        linedelimited (inlist,delimiter) 
  47        lineincols (inlist,colsize)  
  48        lineincustcols (inlist,colsizes) 
  49        list2string (inlist) 
  50        makelol(inlist) 
  51        makestr(x) 
  52        printcc (lst,extra=2) 
  53        printincols (listoflists,colsize) 
  54        pl (listoflists) 
  55        printl(listoflists) 
  56        replace (lst,oldval,newval) 
  57        recode (inlist,listmap,cols='all') 
  58        remap (listoflists,criterion) 
  59        roundlist (inlist,num_digits_to_round_floats_to) 
  60        sortby(listoflists,sortcols) 
  61        unique (inlist) 
  62        duplicates(inlist) 
  63        writedelimited (listoflists, delimiter, file, writetype='w') 
  64   
  65  Some of these functions have alternate versions which are defined only if 
  66  Numeric (NumPy) can be imported.  These functions are generally named as 
  67  above, with an 'a' prefix.:: 
  68   
  69        aabut (source, *args) 
  70        acolex (a,indices,axis=1) 
  71        acollapse (a,keepcols,collapsecols,sterr=0,ns=0) 
  72        adm (a,criterion) 
  73        alinexand (a,columnlist,valuelist) 
  74        alinexor (a,columnlist,valuelist) 
  75        areplace (a,oldval,newval) 
  76        arecode (a,listmap,col='all') 
  77        arowcompare (row1, row2) 
  78        arowsame (row1, row2) 
  79        asortrows(a,axis=0) 
  80        aunique(inarray) 
  81        aduplicates(inarray) 
  82   
  83  Currently, the code is all but completely un-optimized.  In many cases, the 
  84  array versions of functions amount simply to aliases to built-in array 
  85  functions/methods.  Their inclusion here is for function name consistency. 
  86  """ 
  87   
  88  ## CHANGE LOG: 
  89  ## ========== 
  90  ## 01-11-15 ... changed list2string() to accept a delimiter 
  91  ## 01-06-29 ... converted exec()'s to eval()'s to make compatible with Py2.1 
  92  ## 01-05-31 ... added duplicates() and aduplicates() functions 
  93  ## 00-12-28 ... license made GPL, docstring and import requirements 
  94  ## 99-11-01 ... changed version to 0.3 
  95  ## 99-08-30 ... removed get, getstrings, put, aget, aput (into io.py) 
  96  ## 03/27/99 ... added areplace function, made replace fcn recursive 
  97  ## 12/31/98 ... added writefc function for ouput to fixed column sizes 
  98  ## 12/07/98 ... fixed import problem (failed on collapse() fcn) 
  99  ##              added __version__ variable (now 0.2) 
 100  ## 12/05/98 ... updated doc-strings 
 101  ##              added features to collapse() function 
 102  ##              added flat() function for lists 
 103  ##              fixed a broken asortrows()  
 104  ## 11/16/98 ... fixed minor bug in aput for 1D arrays 
 105  ## 
 106  ## 11/08/98 ... fixed aput to output large arrays correctly 
 107   
 108  import stats  # required 3rd party module 
 109  import string, copy 
 110  from types import * 
 111   
 112  __version__ = 0.4 
 113   
 114  ###===========================  LIST FUNCTIONS  ========================== 
 115  ### 
 116  ### Here are the list functions, DEFINED FOR ALL SYSTEMS. 
 117  ### Array functions (for NumPy-enabled computers) appear below. 
 118  ### 
 119   
 120 -def abut (source,*args): 
 121      """ 
 122  Like the |Stat abut command.  It concatenates two lists side-by-side 
 123  and returns the result.  '2D' lists are also accomodated for either argument 
 124  (source or addon).  CAUTION:  If one list is shorter, it will be repeated 
 125  until it is as long as the longest list.  If this behavior is not desired, 
 126  use pstat.simpleabut(). 
 127   
 128  Usage:   abut(source, args)   where args=any # of lists 
 129  Returns: a list of lists as long as the LONGEST list past, source on the 
 130  'left', lists in <args> attached consecutively on the 'right' 
 131  """ 
 132   
 133      if type(source) not in [ListType,TupleType]: 
 134          source = [source] 
 135      for addon in args: 
 136          if type(addon) not in [ListType,TupleType]: 
 137              addon = [addon] 
 138          if len(addon) < len(source):                # is source list longer? 
 139              if len(source) % len(addon) == 0:        # are they integer multiples? 
 140                  repeats = len(source)/len(addon)    # repeat addon n times 
 141                  origadd = copy.deepcopy(addon) 
 142                  for i in range(repeats-1): 
 143                      addon = addon + origadd 
 144              else: 
 145                  repeats = len(source)/len(addon)+1  # repeat addon x times, 
 146                  origadd = copy.deepcopy(addon)      #    x is NOT an integer 
 147                  for i in range(repeats-1): 
 148                      addon = addon + origadd 
 149                      addon = addon[0:len(source)] 
 150          elif len(source) < len(addon):                # is addon list longer? 
 151              if len(addon) % len(source) == 0:        # are they integer multiples? 
 152                  repeats = len(addon)/len(source)    # repeat source n times 
 153                  origsour = copy.deepcopy(source) 
 154                  for i in range(repeats-1): 
 155                      source = source + origsour 
 156              else: 
 157                  repeats = len(addon)/len(source)+1  # repeat source x times, 
 158                  origsour = copy.deepcopy(source)    #   x is NOT an integer 
 159                  for i in range(repeats-1): 
 160                      source = source + origsour 
 161                  source = source[0:len(addon)] 
 162   
 163          source = simpleabut(source,addon) 
 164      return source 
 165   
 166   
 167 -def simpleabut (source, addon): 
 168      """ 
 169  Concatenates two lists as columns and returns the result.  '2D' lists 
 170  are also accomodated for either argument (source or addon).  This DOES NOT 
 171  repeat either list to make the 2 lists of equal length.  Beware of list pairs 
 172  with different lengths ... the resulting list will be the length of the 
 173  FIRST list passed. 
 174   
 175  Usage:   simpleabut(source,addon)  where source, addon=list (or list-of-lists) 
 176  Returns: a list of lists as long as source, with source on the 'left' and 
 177  addon on the 'right' 
 178  """ 
 179      if type(source) not in [ListType,TupleType]: 
 180          source = [source] 
 181      if type(addon) not in [ListType,TupleType]: 
 182          addon = [addon] 
 183      minlen = min(len(source),len(addon)) 
 184      list = copy.deepcopy(source)                # start abut process 
 185      if type(source[0]) not in [ListType,TupleType]: 
 186          if type(addon[0]) not in [ListType,TupleType]: 
 187              for i in range(minlen): 
 188                  list[i] = [source[i]] + [addon[i]]        # source/addon = column 
 189          else: 
 190              for i in range(minlen): 
 191                  list[i] = [source[i]] + addon[i]        # addon=list-of-lists 
 192      else: 
 193          if type(addon[0]) not in [ListType,TupleType]: 
 194              for i in range(minlen): 
 195                  list[i] = source[i] + [addon[i]]        # source=list-of-lists 
 196          else: 
 197              for i in range(minlen): 
 198                  list[i] = source[i] + addon[i]        # source/addon = list-of-lists 
 199      source = list 
 200      return source 
 201   
 202   
 203 -def colex (listoflists,cnums): 
 204      """ 
 205  Extracts from listoflists the columns specified in the list 'cnums' 
 206  (cnums can be an integer, a sequence of integers, or a string-expression that 
 207  corresponds to a slice operation on the variable x ... e.g., 'x[3:]' will colex 
 208  columns 3 onward from the listoflists). 
 209   
 210  Usage:   colex (listoflists,cnums) 
 211  Returns: a list-of-lists corresponding to the columns from listoflists 
 212  specified by cnums, in the order the column numbers appear in cnums 
 213  """ 
 214      global index 
 215      column = 0 
 216      if type(cnums) in [ListType,TupleType]:   # if multiple columns to get 
 217          index = cnums[0] 
 218          column = map(lambda x: x[index], listoflists) 
 219          for col in cnums[1:]: 
 220              index = col 
 221              column = abut(column,map(lambda x: x[index], listoflists)) 
 222      elif type(cnums) == StringType:              # if an 'x[3:]' type expr. 
 223          evalstring = 'map(lambda x: x'+cnums+', listoflists)' 
 224          column = eval(evalstring) 
 225      else:                                     # else it's just 1 col to get 
 226          index = cnums 
 227          column = map(lambda x: x[index], listoflists) 
 228      return column 
 229   
 230   
 231 -def collapse (listoflists,keepcols,collapsecols,fcn1=None,fcn2=None,cfcn=None): 
 232       """ 
 233  Averages data in collapsecol, keeping all unique items in keepcols 
 234  (using unique, which keeps unique LISTS of column numbers), retaining the 
 235  unique sets of values in keepcols, the mean for each.  Setting fcn1 
 236  and/or fcn2 to point to a function rather than None (e.g., stats.sterr, len) 
 237  will append those results (e.g., the sterr, N) after each calculated mean. 
 238  cfcn is the collapse function to apply (defaults to mean, defined here in the 
 239  pstat module to avoid circular imports with stats.py, but harmonicmean or 
 240  others could be passed). 
 241   
 242  Usage:    collapse (listoflists,keepcols,collapsecols,fcn1=None,fcn2=None,cfcn=None) 
 243  Returns: a list of lists with all unique permutations of entries appearing in 
 244  columns ("conditions") specified by keepcols, abutted with the result of 
 245  cfcn (if cfcn=None, defaults to the mean) of each column specified by 
 246  collapsecols. 
 247  """ 
 248       def collmean (inlist): 
 249           s = 0 
 250           for item in inlist: 
 251               s = s + item 
 252           return s/float(len(inlist)) 
 253   
 254       if type(keepcols) not in [ListType,TupleType]: 
 255           keepcols = [keepcols] 
 256       if type(collapsecols) not in [ListType,TupleType]: 
 257           collapsecols = [collapsecols] 
 258       if cfcn == None: 
 259           cfcn = collmean 
 260       if keepcols == []: 
 261           means = [0]*len(collapsecols) 
 262           for i in range(len(collapsecols)): 
 263               avgcol = colex(listoflists,collapsecols[i]) 
 264               means[i] = cfcn(avgcol) 
 265               if fcn1: 
 266                   try: 
 267                       test = fcn1(avgcol) 
 268                   except: 
 269                       test = 'N/A' 
 270                       means[i] = [means[i], test] 
 271               if fcn2: 
 272                   try: 
 273                       test = fcn2(avgcol) 
 274                   except: 
 275                       test = 'N/A' 
 276                   try: 
 277                       means[i] = means[i] + [len(avgcol)] 
 278                   except TypeError: 
 279                       means[i] = [means[i],len(avgcol)] 
 280           return means 
 281       else: 
 282           values = colex(listoflists,keepcols) 
 283           uniques = unique(values) 
 284           uniques.sort() 
 285           newlist = [] 
 286           if type(keepcols) not in [ListType,TupleType]:  keepcols = [keepcols] 
 287           for item in uniques: 
 288               if type(item) not in [ListType,TupleType]:  item =[item] 
 289               tmprows = linexand(listoflists,keepcols,item) 
 290               for col in collapsecols: 
 291                   avgcol = colex(tmprows,col) 
 292                   item.append(cfcn(avgcol)) 
 293                   if fcn1 <> None: 
 294                       try: 
 295                           test = fcn1(avgcol) 
 296                       except: 
 297                           test = 'N/A' 
 298                       item.append(test) 
 299                   if fcn2 <> None: 
 300                       try: 
 301                           test = fcn2(avgcol) 
 302                       except: 
 303                           test = 'N/A' 
 304                       item.append(test) 
 305                   newlist.append(item) 
 306           return newlist 
 307   
 308   
 309 -def dm (listoflists,criterion): 
 310      """ 
 311  Returns rows from the passed list of lists that meet the criteria in 
 312  the passed criterion expression (a string as a function of x; e.g., 'x[3]>=9' 
 313  will return all rows where the 4th column>=9 and "x[2]=='N'" will return rows 
 314  with column 2 equal to the string 'N'). 
 315   
 316  Usage:   dm (listoflists, criterion) 
 317  Returns: rows from listoflists that meet the specified criterion. 
 318  """ 
 319      function = 'filter(lambda x: '+criterion+',listoflists)' 
 320      lines = eval(function) 
 321      return lines 
 322   
 323   
 324 -def flat(l): 
 325      """ 
 326  Returns the flattened version of a '2D' list.  List-correlate to the a.flat() 
 327  method of NumPy arrays. 
 328   
 329  Usage:    flat(l) 
 330  """ 
 331      newl = [] 
 332      for i in range(len(l)): 
 333          for j in range(len(l[i])): 
 334              newl.append(l[i][j]) 
 335      return newl 
 336   
 337   
 338 -def linexand (listoflists,columnlist,valuelist): 
 339      """ 
 340  Returns the rows of a list of lists where col (from columnlist) = val 
 341  (from valuelist) for EVERY pair of values (columnlist[i],valuelists[i]). 
 342  len(columnlist) must equal len(valuelist). 
 343   
 344  Usage:   linexand (listoflists,columnlist,valuelist) 
 345  Returns: the rows of listoflists where columnlist[i]=valuelist[i] for ALL i 
 346  """ 
 347      if type(columnlist) not in [ListType,TupleType]: 
 348          columnlist = [columnlist] 
 349      if type(valuelist) not in [ListType,TupleType]: 
 350          valuelist = [valuelist] 
 351      criterion = '' 
 352      for i in range(len(columnlist)): 
 353          if type(valuelist[i])==StringType: 
 354              critval = '\'' + valuelist[i] + '\'' 
 355          else: 
 356              critval = str(valuelist[i]) 
 357          criterion = criterion + ' x['+str(columnlist[i])+']=='+critval+' and' 
 358      criterion = criterion[0:-3]         # remove the "and" after the last crit 
 359      function = 'filter(lambda x: '+criterion+',listoflists)' 
 360      lines = eval(function) 
 361      return lines 
 362   
 363   
 364 -def linexor (listoflists,columnlist,valuelist): 
 365      """ 
 366  Returns the rows of a list of lists where col (from columnlist) = val 
 367  (from valuelist) for ANY pair of values (colunmlist[i],valuelist[i[). 
 368  One value is required for each column in columnlist.  If only one value 
 369  exists for columnlist but multiple values appear in valuelist, the 
 370  valuelist values are all assumed to pertain to the same column. 
 371   
 372  Usage:   linexor (listoflists,columnlist,valuelist) 
 373  Returns: the rows of listoflists where columnlist[i]=valuelist[i] for ANY i 
 374  """ 
 375      if type(columnlist) not in [ListType,TupleType]: 
 376          columnlist = [columnlist] 
 377      if type(valuelist) not in [ListType,TupleType]: 
 378          valuelist = [valuelist] 
 379      criterion = '' 
 380      if len(columnlist) == 1 and len(valuelist) > 1: 
 381          columnlist = columnlist*len(valuelist) 
 382      for i in range(len(columnlist)):          # build an exec string 
 383          if type(valuelist[i])==StringType: 
 384              critval = '\'' + valuelist[i] + '\'' 
 385          else: 
 386              critval = str(valuelist[i]) 
 387          criterion = criterion + ' x['+str(columnlist[i])+']=='+critval+' or' 
 388      criterion = criterion[0:-2]         # remove the "or" after the last crit 
 389      function = 'filter(lambda x: '+criterion+',listoflists)' 
 390      lines = eval(function) 
 391      return lines 
 392   
 393   
 394 -def linedelimited (inlist,delimiter): 
 395      """ 
 396  Returns a string composed of elements in inlist, with each element 
 397  separated by 'delimiter.'  Used by function writedelimited.  Use '\t' 
 398  for tab-delimiting. 
 399   
 400  Usage:   linedelimited (inlist,delimiter) 
 401  """ 
 402      outstr = '' 
 403      for item in inlist: 
 404          if type(item) <> StringType: 
 405              item = str(item) 
 406          outstr = outstr + item + delimiter 
 407      outstr = outstr[0:-1] 
 408      return outstr 
 409   
 410   
 411 -def lineincols (inlist,colsize): 
 412      """ 
 413  Returns a string composed of elements in inlist, with each element 
 414  right-aligned in columns of (fixed) colsize. 
 415   
 416  Usage:   lineincols (inlist,colsize)   where colsize is an integer 
 417  """ 
 418      outstr = '' 
 419      for item in inlist: 
 420          if type(item) <> StringType: 
 421              item = str(item) 
 422          size = len(item) 
 423          if size <= colsize: 
 424              for i in range(colsize-size): 
 425                  outstr = outstr + ' ' 
 426              outstr = outstr + item 
 427          else: 
 428              outstr = outstr + item[0:colsize+1] 
 429      return outstr 
 430   
 431   
 432 -def lineincustcols (inlist,colsizes): 
 433      """ 
 434  Returns a string composed of elements in inlist, with each element 
 435  right-aligned in a column of width specified by a sequence colsizes.  The 
 436  length of colsizes must be greater than or equal to the number of columns 
 437  in inlist. 
 438   
 439  Usage:   lineincustcols (inlist,colsizes) 
 440  Returns: formatted string created from inlist 
 441  """ 
 442      outstr = '' 
 443      for i in range(len(inlist)): 
 444          if type(inlist[i]) <> StringType: 
 445              item = str(inlist[i]) 
 446          else: 
 447              item = inlist[i] 
 448          size = len(item) 
 449          if size <= colsizes[i]: 
 450              for j in range(colsizes[i]-size): 
 451                  outstr = outstr + ' ' 
 452              outstr = outstr + item 
 453          else: 
 454              outstr = outstr + item[0:colsizes[i]+1] 
 455      return outstr 
 456   
 457   
 458 -def list2string (inlist,delimit=' '): 
 459      """ 
 460  Converts a 1D list to a single long string for file output, using 
 461  the string.join function. 
 462   
 463  Usage:   list2string (inlist,delimit=' ') 
 464  Returns: the string created from inlist 
 465  """ 
 466      stringlist = map(makestr,inlist) 
 467      return string.join(stringlist,delimit) 
 468   
 469   
 470 -def makelol(inlist): 
 471      """ 
 472  Converts a 1D list to a 2D list (i.e., a list-of-lists).  Useful when you 
 473  want to use put() to write a 1D list one item per line in the file. 
 474   
 475  Usage:   makelol(inlist) 
 476  Returns: if l = [1,2,'hi'] then returns [[1],[2],['hi']] etc. 
 477  """ 
 478      x = [] 
 479      for item in inlist: 
 480          x.append([item]) 
 481      return x 
 482   
 483   
 484 -def makestr (x): 
 485      if type(x) <> StringType: 
 486          x = str(x) 
 487      return x 
 488   
 489   
 490 -def printcc (lst,extra=2): 
 491      """ 
 492  Prints a list of lists in columns, customized by the max size of items 
 493  within the columns (max size of items in col, plus 'extra' number of spaces). 
 494  Use 'dashes' or '\n' in the list-of-lists to print dashes or blank lines, 
 495  respectively. 
 496   
 497  Usage:   printcc (lst,extra=2) 
 498  Returns: None 
 499  """ 
 500      if type(lst[0]) not in [ListType,TupleType]: 
 501          lst = [lst] 
 502      rowstokill = [] 
 503      list2print = copy.deepcopy(lst) 
 504      for i in range(len(lst)): 
 505          if lst[i] == ['\n'] or lst[i]=='\n' or lst[i]=='dashes' or lst[i]=='' or lst[i]==['']: 
 506              rowstokill = rowstokill + [i] 
 507      rowstokill.reverse()   # delete blank rows from the end 
 508      for row in rowstokill: 
 509          del list2print[row] 
 510      maxsize = [0]*len(list2print[0]) 
 511      for col in range(len(list2print[0])): 
 512          items = colex(list2print,col) 
 513          items = map(makestr,items) 
 514          maxsize[col] = max(map(len,items)) + extra 
 515      for row in lst: 
 516          if row == ['\n'] or row == '\n' or row == '' or row == ['']: 
 517              print 
 518          elif row == ['dashes'] or row == 'dashes': 
 519              dashes = [0]*len(maxsize) 
 520              for j in range(len(maxsize)): 
 521                  dashes[j] = '-'*(maxsize[j]-2) 
 522              print lineincustcols(dashes,maxsize) 
 523          else: 
 524              print lineincustcols(row,maxsize) 
 525      return None 
 526   
 527   
 528 -def printincols (listoflists,colsize): 
 529      """ 
 530  Prints a list of lists in columns of (fixed) colsize width, where 
 531  colsize is an integer. 
 532   
 533  Usage:   printincols (listoflists,colsize) 
 534  Returns: None 
 535  """ 
 536      for row in listoflists: 
 537          print lineincols(row,colsize) 
 538      return None 
 539   
 540   
 541 -def pl (listoflists): 
 542      """ 
 543  Prints a list of lists, 1 list (row) at a time. 
 544   
 545  Usage:   pl(listoflists) 
 546  Returns: None 
 547  """ 
 548      for row in listoflists: 
 549          if row[-1] == '\n': 
 550              print row, 
 551          else: 
 552              print row 
 553      return None 
 554   
 555   
 556 -def printl(listoflists): 
 557      """Alias for pl.""" 
 558      pl(listoflists) 
 559      return 
 560   
 561   
 562 -def replace (inlst,oldval,newval): 
 563      """ 
 564  Replaces all occurrences of 'oldval' with 'newval', recursively. 
 565   
 566  Usage:   replace (inlst,oldval,newval) 
 567  """ 
 568      lst = inlst*1 
 569      for i in range(len(lst)): 
 570          if type(lst[i]) not in [ListType,TupleType]: 
 571              if lst[i]==oldval: lst[i]=newval 
 572          else: 
 573              lst[i] = replace(lst[i],oldval,newval) 
 574      return lst 
 575   
 576   
 577 -def recode (inlist,listmap,cols=None): 
 578      """ 
 579  Changes the values in a list to a new set of values (useful when 
 580  you need to recode data from (e.g.) strings to numbers.  cols defaults 
 581  to None (meaning all columns are recoded). 
 582   
 583  Usage:   recode (inlist,listmap,cols=None)  cols=recode cols, listmap=2D list 
 584  Returns: inlist with the appropriate values replaced with new ones 
 585  """ 
 586      lst = copy.deepcopy(inlist) 
 587      if cols != None: 
 588          if type(cols) not in [ListType,TupleType]: 
 589              cols = [cols] 
 590          for col in cols: 
 591              for row in range(len(lst)): 
 592                  try: 
 593                      idx = colex(listmap,0).index(lst[row][col]) 
 594                      lst[row][col] = listmap[idx][1] 
 595                  except ValueError: 
 596                      pass 
 597      else: 
 598          for row in range(len(lst)): 
 599              for col in range(len(lst)): 
 600                  try: 
 601                      idx = colex(listmap,0).index(lst[row][col]) 
 602                      lst[row][col] = listmap[idx][1] 
 603                  except ValueError: 
 604                      pass 
 605      return lst 
 606   
 607   
 608 -def remap (listoflists,criterion): 
 609      """ 
 610  Remaps values in a given column of a 2D list (listoflists).  This requires 
 611  a criterion as a function of 'x' so that the result of the following is 
 612  returned ... map(lambda x: 'criterion',listoflists).   
 613   
 614  Usage:   remap(listoflists,criterion)    criterion=string 
 615  Returns: remapped version of listoflists 
 616  """ 
 617      function = 'map(lambda x: '+criterion+',listoflists)' 
 618      lines = eval(function) 
 619      return lines 
 620   
 621   
 622 -def roundlist (inlist,digits): 
 623      """ 
 624  Goes through each element in a 1D or 2D inlist, and applies the following 
 625  function to all elements of FloatType ... round(element,digits). 
 626   
 627  Usage:   roundlist(inlist,digits) 
 628  Returns: list with rounded floats 
 629  """ 
 630      if type(inlist[0]) in [IntType, FloatType]: 
 631          inlist = [inlist] 
 632      l = inlist*1 
 633      for i in range(len(l)): 
 634          for j in range(len(l[i])): 
 635              if type(l[i][j])==FloatType: 
 636                  l[i][j] = round(l[i][j],digits) 
 637      return l 
 638   
 639   
 640 -def sortby(listoflists,sortcols): 
 641      """ 
 642  Sorts a list of lists on the column(s) specified in the sequence 
 643  sortcols. 
 644   
 645  Usage:   sortby(listoflists,sortcols) 
 646  Returns: sorted list, unchanged column ordering 
 647  """ 
 648      newlist = abut(colex(listoflists,sortcols),listoflists) 
 649      newlist.sort() 
 650      try: 
 651          numcols = len(sortcols) 
 652      except TypeError: 
 653          numcols = 1 
 654      crit = '[' + str(numcols) + ':]' 
 655      newlist = colex(newlist,crit) 
 656      return newlist 
 657   
 658   
 659 -def unique (inlist): 
 660      """ 
 661  Returns all unique items in the passed list.  If the a list-of-lists 
 662  is passed, unique LISTS are found (i.e., items in the first dimension are 
 663  compared). 
 664   
 665  Usage:   unique (inlist) 
 666  Returns: the unique elements (or rows) in inlist 
 667  """ 
 668      uniques = [] 
 669      for item in inlist: 
 670          if item not in uniques: 
 671              uniques.append(item) 
 672      return uniques 
 673   
 674 -def duplicates(inlist): 
 675      """ 
 676  Returns duplicate items in the FIRST dimension of the passed list. 
 677   
 678  Usage:   duplicates (inlist) 
 679  """ 
 680      dups = [] 
 681      for i in range(len(inlist)): 
 682          if inlist[i] in inlist[i+1:]: 
 683              dups.append(inlist[i]) 
 684      return dups 
 685   
 686   
 687 -def nonrepeats(inlist): 
 688      """ 
 689  Returns items that are NOT duplicated in the first dim of the passed list. 
 690   
 691  Usage:   nonrepeats (inlist) 
 692  """ 
 693      nonrepeats = [] 
 694      for i in range(len(inlist)): 
 695          if inlist.count(inlist[i]) == 1: 
 696              nonrepeats.append(inlist[i]) 
 697      return nonrepeats 
 698   
 699   
 700  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 701  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 702  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 703  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 704  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 705  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 706  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 707  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 708  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 709  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 710  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 711  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 712  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 713  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 714  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 715  #===================   PSTAT ARRAY FUNCTIONS  ===================== 
 716   
 717  try:                         # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE 
 718   import Numeric 
 719   N = Numeric 
 720   
 721 - def aabut (source, *args): 
 722      """ 
 723  Like the |Stat abut command.  It concatenates two arrays column-wise 
 724  and returns the result.  CAUTION:  If one array is shorter, it will be 
 725  repeated until it is as long as the other. 
 726   
 727  Usage:   aabut (source, args)    where args=any # of arrays 
 728  Returns: an array as long as the LONGEST array past, source appearing on the 
 729  'left', arrays in <args> attached on the 'right'. 
 730  """ 
 731      if len(source.shape)==1: 
 732          width = 1 
 733          source = N.resize(source,[source.shape[0],width]) 
 734      else: 
 735          width = source.shape[1] 
 736      for addon in args: 
 737          if len(addon.shape)==1: 
 738              width = 1 
 739              addon = N.resize(addon,[source.shape[0],width]) 
 740          else: 
 741              width = source.shape[1] 
 742          if len(addon) < len(source): 
 743              addon = N.resize(addon,[source.shape[0],addon.shape[1]]) 
 744          elif len(source) < len(addon): 
 745              source = N.resize(source,[addon.shape[0],source.shape[1]]) 
 746          source = N.concatenate((source,addon),1) 
 747      return source 
 748   
 749   
 750 - def acolex (a,indices,axis=1): 
 751      """ 
 752  Extracts specified indices (a list) from passed array, along passed 
 753  axis (column extraction is default).  BEWARE: A 1D array is presumed to be a 
 754  column-array (and that the whole array will be returned as a column). 
 755   
 756  Usage:   acolex (a,indices,axis=1) 
 757  Returns: the columns of a specified by indices 
 758  """ 
 759      if type(indices) not in [ListType,TupleType,N.ArrayType]: 
 760          indices = [indices] 
 761      if len(N.shape(a)) == 1: 
 762          cols = N.resize(a,[a.shape[0],1]) 
 763      else: 
 764          cols = N.take(a,indices,axis) 
 765      return cols 
 766   
 767   
 768 - def acollapse (a,keepcols,collapsecols,fcn1=None,fcn2=None,cfcn=None): 
 769      """ 
 770  Averages data in collapsecol, keeping all unique items in keepcols 
 771  (using unique, which keeps unique LISTS of column numbers), retaining 
 772  the unique sets of values in keepcols, the mean for each.  If stderror or 
 773  N of the mean are desired, set either or both parameters to 1. 
 774   
 775  Usage:   acollapse (a,keepcols,collapsecols,fcn1=None,fcn2=None,cfcn=None) 
 776  Returns: unique 'conditions' specified by the contents of columns specified 
 777  by keepcols, abutted with the mean(s) of column(s) specified by collapsecols 
 778  """ 
 779      def acollmean (inarray): 
 780          return N.sum(N.ravel(inarray)) 
 781   
 782      if cfcn == None: 
 783          cfcn = acollmean 
 784      if keepcols == []: 
 785          avgcol = acolex(a,collapsecols) 
 786          means = N.sum(avgcol)/float(len(avgcol)) 
 787          if fcn1<>None: 
 788              try: 
 789                  test = fcn1(avgcol) 
 790              except: 
 791                  test = N.array(['N/A']*len(means)) 
 792              means = aabut(means,test) 
 793          if fcn2<>None: 
 794              try: 
 795                  test = fcn2(avgcol) 
 796              except: 
 797                  test = N.array(['N/A']*len(means)) 
 798              means = aabut(means,test) 
 799          return means 
 800      else: 
 801          if type(keepcols) not in [ListType,TupleType,N.ArrayType]: 
 802              keepcols = [keepcols] 
 803          values = colex(a,keepcols)   # so that "item" can be appended (below) 
 804          uniques = unique(values)  # get a LIST, so .sort keeps rows intact 
 805          uniques.sort() 
 806          newlist = [] 
 807          for item in uniques: 
 808              if type(item) not in [ListType,TupleType,N.ArrayType]: 
 809                  item =[item] 
 810              tmprows = alinexand(a,keepcols,item) 
 811              for col in collapsecols: 
 812                  avgcol = acolex(tmprows,col) 
 813                  item.append(acollmean(avgcol)) 
 814                  if fcn1<>None: 
 815                      try: 
 816                          test = fcn1(avgcol) 
 817                      except: 
 818                          test = 'N/A' 
 819                      item.append(test) 
 820                  if fcn2<>None: 
 821                      try: 
 822                          test = fcn2(avgcol) 
 823                      except: 
 824                          test = 'N/A' 
 825                      item.append(test) 
 826                  newlist.append(item) 
 827          try: 
 828              new_a = N.array(newlist) 
 829          except TypeError: 
 830              new_a = N.array(newlist,'O') 
 831          return new_a 
 832   
 833   
 834 - def adm (a,criterion): 
 835      """ 
 836  Returns rows from the passed list of lists that meet the criteria in 
 837  the passed criterion expression (a string as a function of x). 
 838   
 839  Usage:   adm (a,criterion)   where criterion is like 'x[2]==37' 
 840  """ 
 841      function = 'filter(lambda x: '+criterion+',a)' 
 842      lines = eval(function) 
 843      try: 
 844          lines = N.array(lines) 
 845      except: 
 846          lines = N.array(lines,'O') 
 847      return lines 
 848   
 849   
 850 - def isstring(x): 
 851      if type(x)==StringType: 
 852          return 1 
 853      else: 
 854          return 0 
 855   
 856   
 857 - def alinexand (a,columnlist,valuelist): 
 858      """ 
 859  Returns the rows of an array where col (from columnlist) = val 
 860  (from valuelist).  One value is required for each column in columnlist. 
 861   
 862  Usage:   alinexand (a,columnlist,valuelist) 
 863  Returns: the rows of a where columnlist[i]=valuelist[i] for ALL i 
 864  """ 
 865      if type(columnlist) not in [ListType,TupleType,N.ArrayType]: 
 866          columnlist = [columnlist] 
 867      if type(valuelist) not in [ListType,TupleType,N.ArrayType]: 
 868          valuelist = [valuelist] 
 869      criterion = '' 
 870      for i in range(len(columnlist)): 
 871          if type(valuelist[i])==StringType: 
 872              critval = '\'' + valuelist[i] + '\'' 
 873          else: 
 874              critval = str(valuelist[i]) 
 875          criterion = criterion + ' x['+str(columnlist[i])+']=='+critval+' and' 
 876      criterion = criterion[0:-3]         # remove the "and" after the last crit 
 877      return adm(a,criterion) 
 878   
 879   
 880 - def alinexor (a,columnlist,valuelist): 
 881      """ 
 882  Returns the rows of an array where col (from columnlist) = val (from 
 883  valuelist).  One value is required for each column in columnlist. 
 884  The exception is if either columnlist or valuelist has only 1 value, 
 885  in which case that item will be expanded to match the length of the 
 886  other list. 
 887   
 888  Usage:   alinexor (a,columnlist,valuelist) 
 889  Returns: the rows of a where columnlist[i]=valuelist[i] for ANY i 
 890  """ 
 891      if type(columnlist) not in [ListType,TupleType,N.ArrayType]: 
 892          columnlist = [columnlist] 
 893      if type(valuelist) not in [ListType,TupleType,N.ArrayType]: 
 894          valuelist = [valuelist] 
 895      criterion = '' 
 896      if len(columnlist) == 1 and len(valuelist) > 1: 
 897          columnlist = columnlist*len(valuelist) 
 898      elif len(valuelist) == 1 and len(columnlist) > 1: 
 899          valuelist = valuelist*len(columnlist) 
 900      for i in range(len(columnlist)): 
 901          if type(valuelist[i])==StringType: 
 902              critval = '\'' + valuelist[i] + '\'' 
 903          else: 
 904              critval = str(valuelist[i]) 
 905          criterion = criterion + ' x['+str(columnlist[i])+']=='+critval+' or' 
 906      criterion = criterion[0:-2]         # remove the "or" after the last crit 
 907      return adm(a,criterion) 
 908   
 909   
 910 - def areplace (a,oldval,newval): 
 911      """ 
 912  Replaces all occurrences of oldval with newval in array a. 
 913   
 914  Usage:   areplace(a,oldval,newval) 
 915  """ 
 916      newa = N.not_equal(a,oldval)*a 
 917      return newa+N.equal(a,oldval)*newval 
 918   
 919   
 920 - def arecode (a,listmap,col='all'): 
 921      """ 
 922  Remaps the values in an array to a new set of values (useful when 
 923  you need to recode data from (e.g.) strings to numbers as most stats 
 924  packages require.  Can work on SINGLE columns, or 'all' columns at once. 
 925   
 926  Usage:   arecode (a,listmap,col='all') 
 927  Returns: a version of array a where listmap[i][0] = (instead) listmap[i][1] 
 928  """ 
 929      ashape = a.shape 
 930      if col == 'all': 
 931          work = a.flat 
 932      else: 
 933          work = acolex(a,col) 
 934          work = work.flat 
 935      for pair in listmap: 
 936          if type(pair[1]) == StringType or work.typecode()=='O' or a.typecode()=='O': 
 937              work = N.array(work,'O') 
 938              a = N.array(a,'O') 
 939              for i in range(len(work)): 
 940                  if work[i]==pair[0]: 
 941                      work[i] = pair[1] 
 942              if col == 'all': 
 943                  return N.reshape(work,ashape) 
 944              else: 
 945                  return N.concatenate([a[:,0:col],work[:,N.NewAxis],a[:,col+1:]],1) 
 946          else:   # must be a non-Object type array and replacement 
 947              work = N.where(N.equal(work,pair[0]),pair[1],work) 
 948              return N.concatenate([a[:,0:col],work[:,N.NewAxis],a[:,col+1:]],1) 
 949   
 950   
 951 - def arowcompare(row1, row2): 
 952      """ 
 953  Compares two rows from an array, regardless of whether it is an 
 954  array of numbers or of python objects (which requires the cmp function). 
 955   
 956  Usage:   arowcompare(row1,row2) 
 957  Returns: an array of equal length containing 1s where the two rows had 
 958  identical elements and 0 otherwise 
 959  """ 
 960      if row1.typecode()=='O' or row2.typecode=='O': 
 961          cmpvect = N.logical_not(abs(N.array(map(cmp,row1,row2)))) # cmp fcn gives -1,0,1 
 962      else: 
 963          cmpvect = N.equal(row1,row2) 
 964      return cmpvect 
 965   
 966   
 967 - def arowsame(row1, row2): 
 968      """ 
 969  Compares two rows from an array, regardless of whether it is an 
 970  array of numbers or of python objects (which requires the cmp function). 
 971   
 972  Usage:   arowsame(row1,row2) 
 973  Returns: 1 if the two rows are identical, 0 otherwise. 
 974  """ 
 975      cmpval = N.alltrue(arowcompare(row1,row2)) 
 976      return cmpval 
 977   
 978   
 979 - def asortrows(a,axis=0): 
 980      """ 
 981  Sorts an array "by rows".  This differs from the Numeric.sort() function, 
 982  which sorts elements WITHIN the given axis.  Instead, this function keeps 
 983  the elements along the given axis intact, but shifts them 'up or down' 
 984  relative to one another. 
 985   
 986  Usage:   asortrows(a,axis=0) 
 987  Returns: sorted version of a 
 988  """ 
 989      if axis != 0: 
 990          a = N.swapaxes(a, axis, 0) 
 991      l = a.tolist() 
 992      l.sort()           # or l.sort(_sort) 
 993      y = N.array(l) 
 994      if axis != 0: 
 995          y = N.swapaxes(y, axis, 0) 
 996      return y 
 997   
 998   
 999 - def aunique(inarray): 
1000      """ 
1001  Returns unique items in the FIRST dimension of the passed array. Only 
1002  works on arrays NOT including string items. 
1003   
1004  Usage:   aunique (inarray) 
1005  """ 
1006      uniques = N.array([inarray[0]]) 
1007      if len(uniques.shape) == 1:            # IF IT'S A 1D ARRAY 
1008          for item in inarray[1:]: 
1009              if N.add.reduce(N.equal(uniques,item).flat) == 0: 
1010                  try: 
1011                      uniques = N.concatenate([uniques,N.array[N.NewAxis,:]]) 
1012                  except TypeError: 
1013                      uniques = N.concatenate([uniques,N.array([item])]) 
1014      else:                                  # IT MUST BE A 2+D ARRAY 
1015          if inarray.typecode() != 'O':  # not an Object array 
1016              for item in inarray[1:]: 
1017                  if not N.sum(N.alltrue(N.equal(uniques,item),1)): 
1018                      try: 
1019                          uniques = N.concatenate( [uniques,item[N.NewAxis,:]] ) 
1020                      except TypeError:    # the item to add isn't a list 
1021                          uniques = N.concatenate([uniques,N.array([item])]) 
1022                  else: 
1023                      pass  # this item is already in the uniques array 
1024          else:   # must be an Object array, alltrue/equal functions don't work 
1025              for item in inarray[1:]: 
1026                  newflag = 1 
1027                  for unq in uniques:  # NOTE: cmp --> 0=same, -1=<, 1=> 
1028                      test = N.sum(abs(N.array(map(cmp,item,unq)))) 
1029                      if test == 0:   # if item identical to any 1 row in uniques 
1030                          newflag = 0 # then not a novel item to add 
1031                          break 
1032                  if newflag == 1: 
1033                      try: 
1034                          uniques = N.concatenate( [uniques,item[N.NewAxis,:]] ) 
1035                      except TypeError:    # the item to add isn't a list 
1036                          uniques = N.concatenate([uniques,N.array([item])]) 
1037      return uniques 
1038   
1039   
1040 - def aduplicates(inarray): 
1041      """ 
1042  Returns duplicate items in the FIRST dimension of the passed array. Only 
1043  works on arrays NOT including string items. 
1044   
1045  Usage:   aunique (inarray) 
1046  """ 
1047      inarray = N.array(inarray) 
1048      if len(inarray.shape) == 1:            # IF IT'S A 1D ARRAY 
1049          dups = [] 
1050          inarray = inarray.tolist() 
1051          for i in range(len(inarray)): 
1052              if inarray[i] in inarray[i+1:]: 
1053                  dups.append(inarray[i]) 
1054          dups = aunique(dups) 
1055      else:                                  # IT MUST BE A 2+D ARRAY 
1056          dups = [] 
1057          aslist = inarray.tolist() 
1058          for i in range(len(aslist)): 
1059              if aslist[i] in aslist[i+1:]: 
1060                  dups.append(aslist[i]) 
1061          dups = unique(dups) 
1062          dups = N.array(dups) 
1063      return dups 
1064   
1065  except ImportError:    # IF NUMERIC ISN'T AVAILABLE, SKIP ALL arrayfuncs 
1066   pass 
1067