Helpers¶
-
cblaster.helpers.
efetch_sequences
(headers)¶ Retrieve protein sequences from NCBI for supplied accessions.
This function uses EFetch from the NCBI E-utilities to retrieve the sequences for all synthases specified in headers. It then calls fasta.parse to parse the returned response; note that extra processing has to occur because the returned FASTA will contain a full sequence description in the header line after the accession.
Parameters: headers (list) – Valid NCBI sequence identifiers (accession, GI, etc.).
-
cblaster.helpers.
efetch_sequences_request
(headers)¶ Launch E-Fetch request for a list of sequence accessions.
Parameters: headers (list) – NCBI sequence accessions. Raises: requests.HTTPError
– Received bad status code from NCBI.Returns: Response returned by requests library. Return type: requests.models.Response
-
cblaster.helpers.
form_command
(parameters)¶ Flatten a dictionary to create a command list for use in subprocess.run()
-
cblaster.helpers.
get_program_path
(aliases)¶ Get programs path given a list of program names.
Parameters: aliases (list) – Program aliases, e.g. [“diamond”, “diamond-aligner”] Raises: ValueError
– Could not find any of the given aliases on system $PATH.Returns: Path to program executable.
-
cblaster.helpers.
get_sequences
(query_file=None, query_ids=None)¶ Convenience function to get dictionary of query sequences from file or IDs.
Parameters: - query_file (str) – Path to FASTA file containing query protein sequences.
- query_ids (list) – NCBI sequence accessions.
Raises: ValueError
– Did not receive values for query_file or query_ids.Returns: Dictionary of query sequences keyed on accession.
Return type: sequences (dict)
-
cblaster.helpers.
parse_fasta
(handle)¶ Parse sequences in a FASTA file.
Returns: Sequences in FASTA file keyed on their headers (i.e. > line)