Creating local sequence databases with the makedb moduleΒΆ

The makedb module is used to generate the databases used in local cblaster searches from genome files. makedb only takes two arguments: the genome files being used to build the databases, and some name to use when saving them. For example, generating a database from a set of genomes is as simple as:

$ cblaster makedb one.gbk two.gbk three.gbk four.gbk myDb

This will read in each GenBank file, then generate the files:

myDb.sqlite3 Local database used for looking up genomic context of hits
myDb.dmnd DIAMOND sequence search database
myDb.fasta All protein sequences parsed from genomes; used for HMMER searches

cblaster can also build databases from GFF3 files as above:

$ cblaster makedb one.gff two.gff three.gff four.gff myDb

In this case, cblaster will expect matching FASTA format files containing the nucleotide sequences for each sequence region in the corresponding GFF. For instance, in the above example, the working directory must also contain one.fasta, two.fasta, three.fasta, four.fasta.

Typically it is easiest to have all your genome files within a folder and use a wildcard to avoid having to type every file name, like so:

$ cblaster makedb genomes/*.gbk myDb

The shell will expand this automatically into a command that is functionally equivalent to the previous one. However, on Windows, we have run into some issues with this behaviour. For windows, instead use the command:

$ cblaster makedb (ls *.gbk | \% FullName) myDb