Helpers

cblaster.helpers.efetch_sequences(headers)

Retrieve protein sequences from NCBI for supplied accessions.

This function uses EFetch from the NCBI E-utilities to retrieve the sequences for all synthases specified in headers. It then calls fasta.parse to parse the returned response; note that extra processing has to occur because the returned FASTA will contain a full sequence description in the header line after the accession.

Parameters:headers (list) – Valid NCBI sequence identifiers (accession, GI, etc.).
cblaster.helpers.efetch_sequences_request(headers)

Launch E-Fetch request for a list of sequence accessions.

Parameters:headers (list) – NCBI sequence accessions.
Raises:requests.HTTPError – Received bad status code from NCBI.
Returns:Response returned by requests library.
Return type:requests.models.Response
cblaster.helpers.form_command(parameters)

Flatten a dictionary to create a command list for use in subprocess.run()

cblaster.helpers.get_program_path(aliases)

Get programs path given a list of program names.

Parameters:aliases (list) – Program aliases, e.g. [“diamond”, “diamond-aligner”]
Raises:ValueError – Could not find any of the given aliases on system $PATH.
Returns:Path to program executable.
cblaster.helpers.get_sequences(query_file=None, query_ids=None)

Convenience function to get dictionary of query sequences from file or IDs.

Parameters:
  • query_file (str) – Path to FASTA file containing query protein sequences.
  • query_ids (list) – NCBI sequence accessions.
Raises:

ValueError – Did not receive values for query_file or query_ids.

Returns:

Dictionary of query sequences keyed on accession.

Return type:

sequences (dict)

cblaster.helpers.parse_fasta(handle)

Parse sequences in a FASTA file.

Returns:Sequences in FASTA file keyed on their headers (i.e. > line)