NuVs

NuVs is a novel workflow used for discovering potential novel viral sequences in a sample library.

Prerequisites

This tutorial assumes you have already done the following:

You will use these data sources to run a workflow for discovering potential novel virus sequences in a sample.

How Does it Work?

NuVs complements the rapid detection of known viruses provided by the Pathoscope workflow. It is able to find novel virus sequences that cannot be detected using BLAST, mapping, or other approaches.

NuVs relies on profile hidden Markov models (pHMMs) to predict viral domains in sequences assembled from sample libraries.

The first step of the NuVs workflow is eliminating reads associated with known OTUs or a host genome. First, reads are mapped against a Virtool reference and any matching reads are discarded. The remaining reads are mapped against a host subtraction genome and similarly removed.

The next step is assembly of the remaining, undiscarded reads using SPAdes.

HMMs

Profile hidden Markov models (HMM) are used in Virtool to discover potential novel viruses in sequencing data. The models are used to identify known viral motifs in translated open reading frames derived from sequencing data. The method depends on our NuVs bioinformatic workflow.

In Virtool, profile HMMs are stored in the file <data_path>/hmm/profiles.hmm. This file is generated by HMMER, a piece of software used for generating and searching with profile HMMs. The profiles.hmm file contains a number of models generated from clustered amino acid sequences sourced from GenBank.

To make Virtool as easy to use as possible, we distribute an official set of models derived from the vFam project. Since these models do not carry biological annotations, we also provide a comprehensive set of annotations for these models calculated from metadata associated with the amino acid sequences used to build the models.

Exploring and Managing HMMs in Virtool