Biskit.Mod.SequenceSearcher.SequenceSearcher

localBlast Uses Bio.Blast.NCBIStandalone.blastall (Biopython) to perform the search.

localPSIBlast Uses Bio.Blast.NCBIStandalone.blastpgp (Biopython)

remoteBlast Uses Bio.Blast.NCBIWWW.qblast (Biopython) which performs a BLAST search using the QBLAST server at NCBI.

Note: the blast sequence database has to be built with -o potion e.g. cat db1.dat db2.dat | formatdb -i stdin -o T -n indexed_db

To Do: copy blast output

Instance Methods

[hide private]

__init__(self, outFolder='.', clusterLimit=50, verbose=0, log=None)

prepareFolders(self)
Create needed output folders if not there.

[Bio.Fasta.Record] getRecords(self)
Get all homologues.

[Bio.Fasta.Record] getClusteredRecords(self)
Get representative of each cluster.

__blast2dict(self, parsed_blast, db)
Convert parsed blast result into dictionary of FastaRecords indexed by sequence ID.

remoteBlast(self, seqFile, db, method, e=0.01, **kw)
Perform a remote BLAST search using the QBLAST server at NCBI.

localBlast(self, seqFile, db, method='blastp', resultOut=None, e=0.01, **kw)
Performa a local blast search (requires that the blast binaries and databases are installed localy).

localPSIBlast(self, seqFile, db, resultOut=None, e=0.01, **kw)
Performa a local psi-blast search (requires that the blast binaries and databases are installed localy).

[str] getSequenceIDs(self, blast_records)
Extract sequence ids from BlastParser result.

Bio.Fasta.Record fastaRecordFromId(self, db, id)
Use:

{str: Bio.Fasta.Record} fastaFromIds(self, db, id_lst, fastaOut=None)
Use:

copyClusterOut(self, raw=None)
Write clustering results to file.

reportClustering(self, raw=None)
Report the clustering result.

clusterFasta(self, fastaIn=None, simCut=1.75, lenCut=0.9, ncpu=1)
Cluster sequences.

clusterFastaIterative(self, fastaIn=None, simCut=1.75, lenCut=0.9, ncpu=1)
Run cluterFasta iteratively, with tighter clustering settings, until the number of clusters are less than self.clusterLimit.

str selectFasta(self, ids_from_cluster)
Select one member of cluster of sequences.

writeFasta(self, frecords, fastaOut)
Create fasta file for given set of records.

writeFastaAll(self, fastaOut=None)
Write all found template sequences to fasta file.

writeFastaClustered(self, fastaOut=None)
Write non-redundant set of template sequences to fasta file.

__writeBlastResult(self, parsed_blast, outFile)
Write the result from the blast search to file (similar to the output produced by a regular blast run).

writeClusteredBlastResult(self, allFile, clustFile, selection)
Reads the blast.out file and keeps only centers.

Class Variables

[hide private]

F_RESULT_FOLDER = '/sequences'

F_FASTA_ALL = F_RESULT_FOLDER+ '/all.fasta'

F_FASTA_NR = F_RESULT_FOLDER+ '/nr.fasta'

F_CLUSTER_RAW = F_RESULT_FOLDER+ '/cluster_raw.out'

F_CLUSTER_LOG = F_RESULT_FOLDER+ '/cluster_result.out'

F_BLAST_OUT = F_RESULT_FOLDER+ '/blast.out'

F_CLUSTER_BLAST_OUT = F_RESULT_FOLDER+ '/cluster_blast.out'

F_FASTA_TARGET = '/target.fasta'

Method Details

Class SequenceSearcher

__init__(self, outFolder='.', clusterLimit=50, verbose=0, log=None) (Constructor)

prepareFolders(self)

getRecords(self)

getClusteredRecords(self)

__blast2dict(self, parsed_blast, db)

remoteBlast(self, seqFile, db, method, e=0.01, **kw)

localBlast(self, seqFile, db, method='blastp', resultOut=None, e=0.01, **kw)

localPSIBlast(self, seqFile, db, resultOut=None, e=0.01, **kw)

getSequenceIDs(self, blast_records)

fastaRecordFromId(self, db, id)

fastaFromIds(self, db, id_lst, fastaOut=None)

copyClusterOut(self, raw=None)

reportClustering(self, raw=None)

clusterFasta(self, fastaIn=None, simCut=1.75, lenCut=0.9, ncpu=1)

clusterFastaIterative(self, fastaIn=None, simCut=1.75, lenCut=0.9, ncpu=1)

selectFasta(self, ids_from_cluster)

writeFasta(self, frecords, fastaOut)

writeFastaAll(self, fastaOut=None)

writeFastaClustered(self, fastaOut=None)

__writeBlastResult(self, parsed_blast, outFile)

writeClusteredBlastResult(self, allFile, clustFile, selection)

F_RESULT_FOLDER

F_FASTA_ALL

F_FASTA_NR

F_CLUSTER_RAW

F_CLUSTER_LOG

F_BLAST_OUT

F_CLUSTER_BLAST_OUT

F_FASTA_TARGET

init(self, outFolder='.', clusterLimit=50, verbose=0, log=None)
(Constructor)