Remote

This module handles all interaction with NCBI’s BLAST API, including launching new remote searches, polling for completion status, and retrieval of results.

cblaster.remote.check(rid)

Check completion status of a BLAST search given a Request Identifier (RID).

Parameters:

rid (str) – NCBI BLAST search request identifier (RID)

Returns:

Search has completed successfully and hits were reported

Return type:

bool

Raises:
  • ValueError – Search has failed. This is caused either by program error (in which case, NCBI requests you submit an error report with the RID) or expiration of the RID (only stored for 24 hours).
  • ValueError – Search has completed successfully, but no hits were reported.
cblaster.remote.parse(handle, query_file=None, query_ids=None, max_evalue=0.01, min_identity=30, min_coverage=50)

Parse Tabular results from remote BLAST search performed via API.

Since the API provides no option for returning query coverage, which is a metric we want to use for filtering hits, query sequences must be passed to this function so that their lengths can be compared to the alignment length.

Parameters:
  • handle (list) – File handle (or file handle-like) object corresponding to BLAST results. Note that this function expects an iterable of tab-delimited lines and performs no validation/error checking
  • query_file (str) – Path to FASTA format query file
  • query_ids (list) – NCBI sequence identifiers
  • max_evalue (float) – Maximum e-value
  • min_identity (float) – Minimum percent identity
  • min_coverage (float) – Minimum percent query coverage
Returns:

Hit objects corresponding to criteria passing BLAST hits

Return type:

list

cblaster.remote.poll(rid, delay=60, max_retries=-1)

Poll BLAST API with given Request Identifier (RID) until results are returned.

As per NCBI usage guidelines, this function will only poll once per minute; this is calculated each time such that wait is constant (i.e. accounts for differing response time on the status check).

Parameters:
  • rid (str) – NCBI BLAST search request identifier (RID)
  • delay (int) – Total delay (seconds) between polling
  • max_retries (int) – Maximum number of polling attempts (-1 for unlimited)
Returns:

BLAST search results split by newline

Return type:

list

cblaster.remote.retrieve(rid, hitlist_size=500)

Retrieve BLAST results corresponding to a given Request Identifier (RID).

Parameters:
  • rid (str) – NCBI BLAST search request identifiers (RID)
  • hitlist_size (int) – Total number of hits to retrieve
Returns:

BLAST search results split by newline, with HTML parts removed

Return type:

list

cblaster.remote.search(rid=None, query_file=None, query_ids=None, min_identity=0.3, min_coverage=0.5, max_evalue=0.01, **kwargs)

Perform a remote BLAST search via the NCBI’s BLAST API.

This function launches a new search given a query FASTA file or list of valid NCBI identifiers, polls the API to check the completion status of the search, then retrieves and parses the results.

It is also possible to call other BLAST variants using the program argument.

Parameters:
  • rid (str) – NCBI BLAST search request identifier (RID)
  • query_file (str) – Path to FASTA format query file
  • query_ids (list) – NCBI sequence identifiers
  • min_identity (float) – Minimum percent identity
  • min_coverage (float) – Minimum percent query coverage
  • max_evalue (float) – Maximum e-value
Returns:

Hit objects corresponding to criteria passing BLAST hits

Return type:

list

cblaster.remote.start(query_file=None, query_ids=None, database='nr', program='blastp', megablast=False, filtering='F', evalue=10, nucl_reward=None, nucl_penalty=None, gap_costs='11 1', matrix='BLOSUM62', hitlist_size=500, threshold=11, word_size=6, comp_based_stats=2, entrez_query=None)

Launch a remote BLAST search using NCBI BLAST API.

Note that the HITLIST_SIZE, ALIGNMENTS and DESCRIPTIONS parameters must all be set together in order to mimic max_target_seqs behaviour.

Usage guidelines: 1. Don’t contact server more than once every 10 seconds 2. Don’t poll for a single RID more than once a minute 3. Use URL parameter email/tool 4. Run scripts weekends or 9pm-5am Eastern time on weekdays if >50 searches

For a full description of the parameters, see:

  1. BLAST API documentation<https://ncbi.github.io/blast-cloud/dev/api.html>

2. BLAST documentation <https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp>

Parameters:
  • query_file (str) – Path to a query FASTA file
  • query_ids (list) – Collection of NCBI sequence identifiers
  • database (str) – Target NCBI BLAST database
  • program (str) – BLAST variant to run
  • megablast (bool) – Enable megaBLAST option (only with BLASTn)
  • filtering (str) – Low complexity filtering
  • evalue (float) – E-value cutoff
  • nucl_reward (int) – Reward for matching bases (only with BLASTN/megaBLAST)
  • nucl_penalty (int) – Penalty for mismatched bases (only with BLASTN/megaBLAST)
  • gap_costs (str) – Gap existence and extension costs
  • matrix (str) – Scoring matrix name
  • hitlist_size (int) – Number of database sequences to keep
  • threshold (int) – Neighbouring score for initial words
  • word_size (int) – Size of word for initial matches
  • comp_based_stats (int) – Composition based statistics algorithm
  • entrez_query (str) – NCBI Entrez search term for pre-filtering the BLAST database
Returns:

Request Identifier (RID) assigned to the search rtoe (int): Request Time Of Execution (RTOE), estimated run time of the search

Return type:

rid (str)