cblaster.classes module¶

This module stores the classes (Organism, Scaffold, Hit) used in cblaster.

class cblaster.classes.Cluster(indices=None, subjects=None, intermediate_genes=None, query_sequence_order=None, score=None, start=None, end=None, number=None)¶

Bases: cblaster.classes.Serializer

A cluster of subjects on the same scaffold

indices¶

indexes of the subjects in the list of subjects

Type:	list

of the parent scaffold

subjects¶

Subject objects that are in this cluster. Note:

Type:	list

These are not serialised for this cluster

intermediate_genes¶

Type:	list

start¶

The start coordinate of the cluster on the parent scaffold

Type:	int

end¶

The end coordinate of the cluster on the parent scaffold

Type:	int

number¶

number that is unique for each cluster in order to identify them

Type:	int

NUMBER = count(0)¶

calculate_score(query_sequence_order=None)¶

Calculate the score of the current cluster

The score is based on accumulated blastbitscore, total amount of hits against the query and a synteny score if query sequence order is provided. If there are multiple hits in a subject the hit with the top bitscore is selected for the calculation.

Parameters:	query_sequence_order (list) – list of sequences of the order in the query file, is provided if the query has a meningfull order (only) –
Returns:	a float

classmethod from_dict(d, subjects=None)¶: Loads class from dict.

intermediate_end¶: The end of the cluster taking the intermediate genes into account

intermediate_start¶: The start of the cluster taking the intermediate genes into account

names¶

remove_subject(subject, scaffold_index)¶

Safely remove a subject from a cluster.

This is important when subjects become empty when recomputing a session with different treshholds

Parameters:	subject (Subject) – cblaster Subject object scaffold_index (int) – the index of the subject in the scaffold it is saved in

sequences¶

to_dict(save_subjects=False)¶: Serialises class to dict.

class cblaster.classes.Hit(query, subject, identity, coverage, evalue, bitscore)¶

Bases: cblaster.classes.Serializer

A BLAST hit identified during a cblaster search.

This class is first instantiated when parsing BLAST results, and is then updated with genomic coordinates after querying either the Identical Protein Groups (IPG) resource on NCBI, or a local JSON database.

query¶

Name of query sequence.

Type:	str

subject¶

Name of subject sequence.

Type:	str

identity¶

Percentage identity (%) of hit.

Type:	float

coverage¶

Query coverage (%) of hit.

Type:	float

evalue¶

E-value of hit.

Type:	float

bitscore¶

Bitscore of hit.

Type:	float

copy(**kwargs)¶: Creates a copy of this Hit with any additional args.

classmethod from_dict(d)¶: Loads class from dict.

to_dict()¶: Serialises class to dict.

values(decimals=4)¶

Formats hit attributes for printing.

Parameters:	decimals (int) – Total decimal places to show in score values.
Returns:	List of formatted attribute strings.

class cblaster.classes.Organism(name, strain, scaffolds=None)¶

Bases: cblaster.classes.Serializer

A unique organism containing hits found in a cblaster search.

Every strain (or lack thereof) is a unique Organism, and will be reported separately in cblaster results.

name¶

Organism name, typically the genus and species epithet.

Type:	str

strain¶

Strain name of this organism, e.g. CBS 536.65.

Type:	str

scaffolds¶

Scaffold objects belonging to this organism.

Type:	dict

clusters¶

classmethod from_dict(d)¶: Loads class from dict.

full_name¶: The full name (including strain) of the organism. Note: if strain found in name, returns just name.

summary(decimals=4, hide_headers=True, delimiter=None)¶

to_dict()¶: Serialises class to dict.

total_hit_clusters¶: Counts total amount of hit clusters in this Organism.

class cblaster.classes.Scaffold(accession, clusters=None, subjects=None)¶

Bases: cblaster.classes.Serializer

A genomic scaffold containing hits found in a cblaster search.

accession¶

Name of this scaffold, typically NCBI accession.

Type:	str

subjects¶

Subject objects located on this scaffold.

Type:	list

clusters¶

Clusters of hits identified on this scaffold.

Type:	list

add_clusters(subject_lists, query_sequence_order=None)¶

Add clusters to this scaffold

After clusters are added they are sorted based on score

Parameters:	subject_lists (list) – a list of lists of Subject objects that are a clusters (form) – query_sequence_order (list) – list of sequences of the order in the query file, is provided if the query has a meningfull order (only) –

classmethod from_dict(d)¶: Loads class from dict.

remove_subject(subject)¶

Safely remove a subject from a cluster by removing it from the cluster as well.

Parameters:	subject (Subject) – cblaster Subject object

summary(hide_headers=False, delimiter=None, decimals=4)¶

to_dict()¶: Serialises class to dict.

class cblaster.classes.Serializer¶

Bases: abc.ABC

JSON serialisation mixin class.

Classes that inherit from this class should implement to_dict and from_dict methods.

classmethod from_dict(d)¶: Loads class from dict.

classmethod from_json(js)¶: Instantiates class from JSON handle.

to_dict()¶: Serialises class to dict.

to_json(fp=None, **kwargs)¶: Serialises class to JSON.

class cblaster.classes.Session(queries=None, sequences=None, params=None, organisms=None, query=None)¶

Bases: cblaster.classes.Serializer

Stores the state of a cblaster search.

This class stores query proteins, search parameters, Organism objects created during searches, as well as methods for generating summary tables. It can also be dumped to/loaded from JSON for re-filtering, plotting, etc.

>>> s = Session()
>>> with open("session.json", "w") as fp:
...     s.to_json(fp)
>>> with open("session.json") as fp:
...     s2 = Session.from_json(fp)
>>> s == s2
True

queries¶

Names of query sequences.

Type:	list

params¶

Search parameters.

Type:	dict

organisms¶

Organism objects created in a search.

Type:	list

sequences¶

Query sequence translations

Type:	dict

query¶

cblaster Cluster object for query

Type:	Cluster

format(form, fp=None, **kwargs)¶

Generates a summary table.

Parameters:	form (str) – Type of table to generate (‘summary’ or ‘binary’). fp (file handle) – File handle to write to.
Raises:	`ValueError` – form not ‘binary’ or ‘summary’
Returns:	Summary table.

classmethod from_dict(d)¶: Loads class from dict.

classmethod from_file(file)¶

classmethod from_files(files)¶

to_dict()¶: Serialises class to dict.

class cblaster.classes.Subject(id=None, hits=None, name=None, ipg=None, start=None, end=None, strand=None, sequence=None)¶

Bases: cblaster.classes.Serializer

A sequence representing one or more BLAST hits.

This class is instantiated during the contextual lookup stage. It is important since it allows for subject sequences which hit >1 of the query sequences, while still staying non-redundant.

hits¶

Hit objects referencing this subject sequence.

Type:	list

ipg¶

NCBI Identical Protein Group (IPG) id.

Type:	int

start¶

Start of sequence on parent scaffold.

Type:	int

end¶

End of sequence on parent scaffold.

Type:	int

strand¶

Strandedness of the sequence (‘+’ or ‘-‘).

Type:	str

classmethod from_dict(d)¶: Loads class from dict.

to_dict()¶: Serialises class to dict.

values(decimals=4)¶

cblaster.context module¶

cblaster.database module¶

cblaster.extract module¶

cblaster.extract_clusters module¶

cblaster.formatters module¶

cblaster result formatters.

cblaster.formatters.add_field_whitespace(rows, lengths)¶: Fills table fields with whitespace to specified lengths.

cblaster.formatters.binary(session, hide_headers=False, delimiter=None, key=<built-in function len>, attr='identity', decimals=4, sort_clusters=False)¶: Generates a binary summary table from a Session object.

cblaster.formatters.generate_header_string(text, symbol='-')¶

Generates a 2-line header string with underlined text.

>>> header = generate_header_string("header string", symbol="*")
>>> print(header)
header string
*************

cblaster.formatters.get_cell_values(queries, subjects, key=<built-in function len>, attr=None)¶

Generates the values of cells in the binary matrix.

This function calls some specified key function (def. max) against all values of a specified attribute (def. None) from Hits inside Subjects which match each query. By default, this function will just count all matching Hits (i.e. len() is called on all Hits whose query attribute matches). To find maximum identities, for example, provide key=max and attr=’identity’ to this function.

Parameters:	queries (list) – Names of query sequences. subjects (list) – Subject objects to generate vlaues for. key (callable) – Some callable that takes a list and produces a value. attr (str) – A Hit attribute to calculate values with in key function.

cblaster.formatters.get_maximum_row_lengths(rows)¶: Finds the longest lengths of fields per column in a collection of rows.

cblaster.formatters.gne_summary(data, hide_headers=False, delimiter=None, decimals=4)¶

cblaster.formatters.humanise(rows)¶: Formats a collection of fields as human-readable.

cblaster.formatters.set_decimals(value, decimals=4)¶

cblaster.formatters.summarise_cluster(cluster, decimals=4, hide_headers=True, delimiter=None)¶

Generates a summary table for a hit cluster.

Parameters:	cluster (Cluster) – collection of Subject objects decimals (int) – number of decimal points to show hide_headers (bool) – hide column headers in output delimiter (str) – delimiting string between the subjects
Returns:	summary table

cblaster.formatters.summarise_gne(data, hide_headers=False, delimiter=None, decimals=4)¶

cblaster.formatters.summarise_organism(organism, hide_headers=True, delimiter=None, decimals=4)¶

cblaster.formatters.summarise_scaffold(scaffold, hide_headers=True, delimiter=None, decimals=4)¶

cblaster.formatters.summary(session, hide_headers=False, delimiter=None, decimals=4, sort_clusters=False)¶

cblaster.genome_parsers module¶

cblaster.genome_parsers.find_fasta(gff_path)¶: Finds a FASTA file corresponding to the given GFF path.

cblaster.genome_parsers.find_feature(array, ftype)¶

cblaster.genome_parsers.find_files(paths, recurse=True, level=0)¶

cblaster.genome_parsers.find_gene_name(qualifiers)¶: Finds a gene name in a dictionary of feature qualifiers.

cblaster.genome_parsers.find_overlapping_location(feature, locations)¶

Finds the index of a gene location containing feature.

Parameters:	feature (SeqFeature) – Feature being matched to a location locations (list) – Start and end coordinates of gene features
Returns:	Index of matching start/end, if any None: No match found
Return type:	int

cblaster.genome_parsers.find_regions(directives)¶: Looks for ##sequence-region directives in a list of GFF3 directives.

cblaster.genome_parsers.find_translation(record, feature)¶

cblaster.genome_parsers.iter_overlapping_features(features)¶

cblaster.genome_parsers.merge_cds_features(features)¶

cblaster.genome_parsers.organisms_to_tuples(organisms)¶

Generates insertion tuples from parsed organisms.

Parameters:	organisms (list) – Organism dictionaries parsed by parse_file
Returns:	SQLite3 database insertion tuples for all genes
Return type:	list

cblaster.genome_parsers.parse_cds_features(features, record_start)¶

cblaster.genome_parsers.parse_file(path, to_tuples=False)¶

Dispatches a given file path to the correct parser given its extension.

Parameters:	path (str) – Path to genome file to_tuples (bool) – Generate insertion tuples from parsed SeqRecords
Returns:	File name and list of SeqRecord objects corresponding to scaffolds in file
Return type:	dict

cblaster.genome_parsers.parse_gff(path)¶

Parses GFF and corresponding FASTA using GFFutils.

Parameters:	path (str) – Path to GFF file. Should have a corresponding FASTA file of the same name with a valid FASTA suffix (.fa, .fasta, .fsa, .fna, .faa).
Returns:	SeqRecord objects corresponding to each scaffold in the file
Return type:	list

cblaster.genome_parsers.parse_infile(path, format)¶

cblaster.genome_parsers.return_file_handle(input_file)¶: Handles compressed and uncompressed files.

cblaster.genome_parsers.seqrecord_to_tuples(record, source)¶

cblaster.helpers module¶

cblaster.helpers.dict_to_cluster(sequences, spacing=500)¶: Creates a mock Cluster from a sequence dictionary.

cblaster.helpers.efetch_sequences(headers)¶

Retrieve protein sequences from NCBI for supplied accessions.

This function uses EFetch from the NCBI E-utilities to retrieve the sequences for all synthases specified in headers. The calls to EFetch can not exceed 500 accessions this means that the calls have to be limited. It then calls fasta.parse to parse the returned response; note that extra processing has to occur because the returned FASTA will contain a full sequence description in the header line after the accession.

Parameters:	headers (list) – Valid NCBI sequence identifiers (accession, GI, etc.).
Returns:	a dictionary of sequences keyed on header id

cblaster.helpers.fasta_seqrecords_to_cluster(records, spacing=500)¶: Creates a mock Cluster from a SeqIO FASTA parser handle.

cblaster.helpers.find_sqlite_db(path)¶

cblaster.helpers.form_command(parameters)¶: Flatten a dictionary to create a command list for use in subprocess.run()

cblaster.helpers.get_program_path(aliases)¶

Get programs path given a list of program names.

Parameters:	aliases (list) – Program aliases, e.g. [“diamond”, “diamond-aligner”]
Raises:	`ValueError` – Could not find any of the given aliases on system $PATH.
Returns:	Path to program executable.

cblaster.helpers.get_project_root()¶

cblaster.helpers.get_sequences(query_file=None, query_ids=None, query_profiles=None)¶

Convenience function to get dictionary of query sequences from file or IDs.

Parameters:	query_file (str) – Path to FASTA genbank or EMBL file containing query sequences. (protein) – query_ids (list) – NCBI sequence accessions. query_profiles (list) – Pfam profile accessions.
Raises:	`ValueError` – Did not receive values for query_file or query_ids.
Returns:	Dictionary of query sequences keyed on accession.
Return type:	sequences (dict)

cblaster.helpers.parse_query_sequences(query_file=None, query_ids=None, query_profiles=None)¶

Creates a Cluster object from query sequences.

If EMBL/GenBank, Cluster will use exact genomic coordinates parsed from file. Otherwise, a fake Cluster will be created where genes are drawn to scale, but always on positive strand and with fixed intergenic distance.

cblaster.helpers.seqrecord_to_cluster(record)¶: Creates a Cluster object from a SeqIO GenBank/EMBL parser handle.

cblaster.helpers.sequences_to_fasta(sequences)¶: Formats sequence dictionary as FASTA.

cblaster.hmm_search module¶

Hmmfetch and hmmsearch implementation

cblaster.hmm_search.check_pfam_db(path)¶

Check if Pfam-A db exists else download

Parameters:	path – String, path where to check

cblaster.hmm_search.fetch_pfam_profiles(hmm, keys)¶

Fetch hmm profiles from db and save in a file

Parameters:	db_path – String, path where db are stored keys_ls – String, Path to file with acc-nr
Returns:	List, strings with acc-numbers
Return type:	ls_keys

cblaster.hmm_search.get_pfam_accession(dat_path: Union[str, pathlib.Path], keys: Collection[str]) → Tuple[Set[str], Set[str]]¶

Get full accession number of Pfam profiles

Looks for keys in ID and AC attributes, such that accessions can be retrieved by name or accession.

Parameters:	keys – Strings of accession profiles numbers db_path – Path to dat.gz file with the full acc-nr
Returns:	List, string of full acc-number
Return type:	key_lines

cblaster.hmm_search.get_profile_names(profiles: Collection[str]) → Collection[str]¶: Extracts names from profile HMMs using regular expressions.

cblaster.hmm_search.group_profiles(profiles: Collection[str]) → tuple¶

Group input query profile HMMs by Pfam, custom or invalid.

If the profile is found on disk, it will be loaded directly. If not found locally, but starts with PF, will try to extract from Pfam. Otherwise, marked as invalid.

cblaster.hmm_search.parse_hmmer_output(results)¶

Parse hmmsearch output

Parameters:	file_list – List, string of file name of results that need parsing
Returns:	list of class objects, with information query, subject, identity, coverage, e-value, bit score
Return type:	hit_info

cblaster.hmm_search.perform_hmmer(fasta: str, query_profiles: List[str], pfam: str, session: cblaster.classes.Session, hmm_out: str = None) → Optional[Collection[cblaster.classes.Hit]]¶

Main of running a hmmer search

Parameters:	fasta – Path to database FASTA file query_profiles – Pfam names/accessions, or paths to profile HMM files pfam – Path to folder containing Pfam database session – cblaster search session
Returns:	List of class objects with the hits

cblaster.hmm_search.read_profiles(files: Collection[str]) → Collection[str]¶: Reads in profile HMMs from a list of files.

cblaster.hmm_search.run_hmmsearch(fasta, query)¶

Run the hmmsearch command

Parameters:	path_pfam – String, Path to the pfam database path_db – String, Path to db that will be searched for profiles ls_keys – List, string of pfam profile names
Returns:	List, String of result file names
Return type:	temp_res

cblaster.intermediate_genes module¶

cblaster.local module¶

cblaster.local.diamond(fasta, database, max_evalue=0.01, min_identity=30, min_coverage=50, hitlist_size=5000, cpus=None, sensitivity='fast')¶

Launch a local DIAMOND search against a database.

Parameters:	fasta (str) – Path to FASTA format query file database (str) – Path to DIAMOND database generated with cblaster makedb max_evalue (float) – Maximum e-value threshold min_identity (float) – Minimum identity (%) cutoff min_coverage (float) – Minimum coverage (%) cutoff hitlist_size (int) – Maximum number of hits to save cpus (int) – Number of CPU threads for DIAMOND to use
Returns:	Rows from DIAMOND search result table (split by newline)
Return type:	list

cblaster.local.parse(results, min_identity=30, min_coverage=50, max_evalue=0.01)¶

Parse a string containing results of a BLAST/DIAMOND search.

Parameters:	results (list) – Results returned by diamond() or blastp() min_identity (float) – Minimum identity (%) cutoff min_coverage (float) – Minimum coverage (%) cutoff max_evalue (float) – Maximum e-value threshold
Returns:	Hit objects representing hits that surpass scoring thresholds
Return type:	list

cblaster.local.search(database, sequences=None, query_file=None, query_ids=None, blast_file=None, dmnd_sensitivity='fast', min_identity=30, min_coverage=50, max_evalue=0.01, hitlist_size=5000, **kwargs)¶

Launch a new BLAST search using either DIAMOND or command-line BLASTp (remote).

Parameters:	database (str) – Path to DIAMOND database sequences (dict) – Query sequences query_file (str) – Path to FASTA file containing query sequences query_ids (list) – NCBI sequence accessions blast_file (str) – Path to the file blast results are written to
Raises:	`ValueError` – No value given for query_file or query_ids
Returns:	Parsed rows with hits from DIAMOND results table
Return type:	list

cblaster.main module¶

cblaster.parsers module¶

Argument parsers.

cblaster.parsers.add_binary_arguments(group)¶

cblaster.parsers.add_clustering_group(search)¶

cblaster.parsers.add_config_subparser(subparsers)¶

cblaster.parsers.add_extract_clusters_subparser(subparsers)¶

cblaster.parsers.add_extract_subparser(subparsers)¶

cblaster.parsers.add_filtering_group(search)¶

cblaster.parsers.add_gne_output_group(parser)¶

cblaster.parsers.add_gne_params_group(parser)¶

cblaster.parsers.add_gne_subparser(subparsers)¶

cblaster.parsers.add_gui_subparser(subparsers)¶

cblaster.parsers.add_input_group(search)¶

cblaster.parsers.add_intermediate_genes_group(search)¶

cblaster.parsers.add_makedb_subparser(subparsers)¶

cblaster.parsers.add_output_arguments(group)¶

cblaster.parsers.add_output_group(search)¶

cblaster.parsers.add_plot_clusters_subparser(subparsers)¶

cblaster.parsers.add_search_subparser(subparsers)¶

cblaster.parsers.add_searching_group(search)¶

cblaster.parsers.full_database_path(database, *acces_modes)¶

Make sure the database path is also correct, but do not check when providing one of the NCBI databases

Parameters:	database (str) – a string that is the path to the database creation files or a NCBI database identifier acces_modes (List) – a list of integers of acces modes for which at least one should be allowed
Returns:	a string that is the full path to the database file or a NCBI database identifier

cblaster.parsers.full_path(file_path, *acces_modes, dir=False)¶

Test if a file path or directory exists and has the correct permissions and create a full path

For reading acces the file has to be pressent and there has to be read acces. For writing acces the directory with the file has to be present and there has to be write acces in that directory.

Parameters:	file_path (str) – relative or absoluete path to a file acces_modes (List) – a list of integers of acces modes for which at least one should be allowed dir (bool) – if the path is to a directory or not
Returns:	A string that is the full path to the provided file_path
Raises:	argparse.ArgumentTypeError when the provided path does not exist or the file does not have the correct permissions to be accessed

cblaster.parsers.get_parser()¶

cblaster.parsers.max_cpus(value)¶

Ensure that the cpu’s do not go above the available amount. Setting to high cpu’s will crash database creation badly

Parameters:	value (int) – number of cpu’s as provided by the user
Returns:	value as an integer with 1 <= value <= multiprocessing.cpu_count()

cblaster.parsers.parse_args(args)¶

cblaster.plot module¶

cblaster.plot_clusters module¶

cblaster.remote module¶

This module handles all interaction with NCBI’s BLAST API, including launching new remote searches, polling for completion status, and retrieval of results.

cblaster.remote.check(rid)¶

Check completion status of a BLAST search given a Request Identifier (RID).

Parameters:	rid (str) – NCBI BLAST search request identifier (RID)
Returns:	Search has completed successfully and hits were reported
Return type:	bool
Raises:	`ValueError` – Search has failed. This is caused either by program error (in which case, NCBI requests you submit an error report with the RID) or expiration of the RID (only stored for 24 hours). `ValueError` – Search has completed successfully, but no hits were reported.

cblaster.remote.parse(handle, sequences=None, query_file=None, query_ids=None, max_evalue=0.01, min_identity=30, min_coverage=50)¶

Parse Tabular results from remote BLAST search performed via API.

Since the API provides no option for returning query coverage, which is a metric we want to use for filtering hits, query sequences must be passed to this function so that their lengths can be compared to the alignment length.

Parameters:	handle (list) – File handle (or file handle-like) object corresponding to BLAST results. Note that this function expects an iterable of tab-delimited lines and performs no validation/error checking sequences (dict) – Query sequences query_file (str) – Path to FASTA format query file query_ids (list) – NCBI sequence identifiers max_evalue (float) – Maximum e-value min_identity (float) – Minimum percent identity min_coverage (float) – Minimum percent query coverage
Returns:	Hit objects corresponding to criteria passing BLAST hits
Return type:	list

cblaster.remote.poll(rid, delay=60, max_retries=-1)¶

Poll BLAST API with given Request Identifier (RID) until results are returned.

As per NCBI usage guidelines, this function will only poll once per minute; this is calculated each time such that wait is constant (i.e. accounts for differing response time on the status check).

Parameters:	rid (str) – NCBI BLAST search request identifier (RID) delay (int) – Total delay (seconds) between polling max_retries (int) – Maximum number of polling attempts (-1 for unlimited)
Returns:	BLAST search results split by newline
Return type:	list

cblaster.remote.retrieve(rid, hitlist_size=5000)¶

Retrieve BLAST results corresponding to a given Request Identifier (RID).

Parameters:	rid (str) – NCBI BLAST search request identifiers (RID) hitlist_size (int) – Total number of hits to retrieve
Returns:	BLAST search results split by newline, with HTML parts removed
Return type:	list

cblaster.remote.search(rid=None, sequences=None, query_file=None, query_ids=None, min_identity=0.3, min_coverage=0.5, max_evalue=0.01, blast_file=None, hitlist_size=500, **kwargs)¶

Perform a remote BLAST search via the NCBI’s BLAST API.

This function launches a new search given a query FASTA file or list of valid NCBI identifiers, polls the API to check the completion status of the search, then retrieves and parses the results.

It is also possible to call other BLAST variants using the program argument.

Parameters:	rid (str) – NCBI BLAST search request identifier (RID) sequences (dict) – Query sequences query_file (str) – Path to FASTA format query file query_ids (list) – NCBI sequence identifiers min_identity (float) – Minimum percent identity min_coverage (float) – Minimum percent query coverage max_evalue (float) – Maximum e-value blast_file (str) – Path to file blast results are written to hitlist_size (int) – Number of database sequences to keep
Returns:	Hit objects corresponding to criteria passing BLAST hits
Return type:	list

cblaster.remote.start(sequences=None, query_file=None, query_ids=None, database='nr', program='blastp', megablast=False, filtering='F', evalue=0.1, nucl_reward=None, nucl_penalty=None, gap_costs='11 1', matrix='BLOSUM62', hitlist_size=500, threshold=11, word_size=6, comp_based_stats=2, entrez_query=None)¶

Launch a remote BLAST search using NCBI BLAST API.

Note that the HITLIST_SIZE, ALIGNMENTS and DESCRIPTIONS parameters must all be set together in order to mimic max_target_seqs behaviour.

Usage guidelines:

Don’t contact server more than once every 10 seconds
Don’t poll for a single RID more than once a minute
Use URL parameter email/tool
Run scripts weekends or 9pm-5am Eastern time on weekdays if >50 searches

For a full description of the parameters, see:

BLAST API documentation<https://ncbi.github.io/blast-cloud/dev/api.html>

BLAST documentation <https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp>

Parameters:	sequences (dict) – Query sequence dict generated by helpers.get_sequences() query_file (str) – Path to a query FASTA file query_ids (list) – Collection of NCBI sequence identifiers database (str) – Target NCBI BLAST database program (str) – BLAST variant to run megablast (bool) – Enable megaBLAST option (only with BLASTn) filtering (str) – Low complexity filtering evalue (float) – E-value cutoff nucl_reward (int) – Reward for matching bases (only with BLASTN/megaBLAST) nucl_penalty (int) – Penalty for mismatched bases (only with BLASTN/megaBLAST) gap_costs (str) – Gap existence and extension costs matrix (str) – Scoring matrix name hitlist_size (int) – Number of database sequences to keep threshold (int) – Neighbouring score for initial words word_size (int) – Size of word for initial matches comp_based_stats (int) – Composition based statistics algorithm entrez_query (str) – NCBI Entrez search term for pre-filtering the BLAST database
Returns:	Request Identifier (RID) assigned to the search rtoe (int): Request Time Of Execution (RTOE), estimated run time of the search
Return type:	rid (str)

cblaster.classes module¶

cblaster.context module¶

cblaster.database module¶

cblaster.extract module¶

cblaster.extract_clusters module¶

cblaster.formatters module¶

cblaster.genome_parsers module¶

cblaster.helpers module¶

cblaster.hmm_search module¶

cblaster.intermediate_genes module¶

cblaster.local module¶

cblaster.main module¶

cblaster.parsers module¶

cblaster.plot module¶

cblaster.plot_clusters module¶

cblaster.remote module¶

cblaster.sql module¶