KvarQ testsuites

Testsuites define positions to scan for as well as how to interpret mutations. They have to be loaded (see Loading Testsuites) selected prior to scanning but also to analyze .json data using the explorer. A .json file generated with a certain combination of testsuites can only be analyzed using the explorer if testsuites with the same version are used (this is because .json files only contain the names of the SNPs but the location within genes is saved in the testsuites).

A KvarQ testsuite is a python source file that defines a kvarq.genes.Testsuite with the same name as the python file. Several of these testsuites can be grouped together within a single directory. Any number of such directories containing testsuite python files can be stored in a well defined location from which it is then discovered in particular order.

For example, the testsuites spoligo, resistance, and phylo are grouped together in the directory MTBC/ and can be found in the directory testsuites/ of KvarQ:

  • testsuites/MTBC/_util.py : every file that starts with an underscore will not be loaded from KvarQ when loading testsuites from a directory, but can still be used from other python modules in the same directory via from _util import ancestor (which loads the hypothetical ancestor genome from a data file in the same directory
  • testsuites/MTBC/phylo.py : scans for phylogenetic markers in MTBC
  • testsuites/MTBC/resistance.py : tests for some common resistance mutations in MTBC
  • testsuites/MTBC/spoligo.py : in silico spoligo typing of MTBC

Rolling your own testsuite

KvarQ makes it very simple to write new testsuites. It is probably easiest to take a pre-existing testsuite and adapt it to your needs. All testsuites shipped with KvarQ are well annotated and there are some articles in the tutorial section that show how to adapt the testsuites in the testsuites/example/ directory.

Versions, Compatibility

The following problems can arise when different versions of testsuites are used

  • A testsuite is not compatible with the KvarQ version that loads it. To avoid this scenario, the module global GENES_COMPATIBILITY is compared with the module global kvarq.genes.COMPATIBILITY version of KvarQ running it. The first number must be matched exactly and the second number must be equal or smaller to the one defined in the genes package. Whenever KvarQ introduces new features that break the backwards-compatibility with the testsuites, the first number is increased.
  • A testsuite is loaded to display data from a .json file that was generated by testsuite with a different number. The moduel global VERSION tells the version of the testsuite defining it. Upon a backwards-compatible change (e.g. deletion of a previous test), the minor number is increased by one. Note that introductions of new tests are not backwards compatible because the new version of the testsuite will be looking for non-existing tests when loading data generated with an old version.

Annotated example

The following is a dump of the extensively annotated testsuite testsuites/examples/example.py included with KvarQ

# this is an example testsuite that illustrates how to write simple
# SNP/region based testsuite to be used with kvarq

# the testsuite can be included during the scanning by using the
# command line parameter '-t' or in the configuration window in the GUI

# see the kvarq documentation for more information:
# http://kvarq.readthedocs.org/en/latest/testsuites.html


# the version specifies the version of the testsuite itself; this version
# string is included in the .json scan results
# the minor number should be increased every time the file changes. the major
# number should be increased when the changes are not backwards compatible
# (e.g. when a new test is added)

VERSION = '0.1'

# this version is compared against the COMPATIBILITY global module variable
# defined in kvarq.genes
# as before, compatibility is warranted if the first number is equal and the
# second equal or lower (than the one defined in kvarq.genes)

GENES_COMPATIBILITY = '0.0'


# we use these classes to define our testsuite
from kvarq.genes import Genotype, Gene, Test, Testsuite, Reference, SNP, TemplateFromGenome

# load hypothetical MTB ancestor genome from '../MTBC' directory
# (shipped together with KvarQ)
from kvarq.genes import Genome
import os.path
MTBC_dir = os.path.join(os.path.dirname(__file__), os.pardir, 'MTBC')
ancestor_path = os.path.join(MTBC_dir, 'MTB_ancestor_reference.bases')
ancestor = Genome(ancestor_path, 'MTB ancestor')

# use this for loggging (displayed on console / in main GUI window)
from kvarq.log import lo


# references tell where more information ont he mutations can be found
tbdream = Reference('TBDReamDB : see http://tbdreamdb.com/')

# the first genotype simply signals isoniazid resistance
inhA = Genotype('Isoniazid resistance')
# the second genotype also signals isoniazid resistance but indicates
# the gene to which it belongs to -- this enables output of resistance
# mutation in the familiar gene.XposX format
katG = Genotype('Isoniazid resistance', Gene(ancestor,'katG', 2153889, 2156111, plus_strand=False))

# define two SNPs : 1673432TA and 1673432TC -- only specified mutations will
# be found (i.e. 1673432TATG would not be reported)
# note that the SNP is simply the "template" for that will be used when scanning
# for mutations in the FastQ reads; the "test" as a whole defines a template,
# a genotype (inhA in this case) and the resource from the information is drawn
SNP1 = Test(SNP(genome=ancestor, pos=1673432, orig='T', base='A'), inhA, tbdream)
SNP2 = Test(SNP(genome=ancestor, pos=1673432, orig='T', base='C'), inhA, tbdream)

# define a (short) region that should be scanned for ANY mutations here we're
# interested in the codon 2155167-2155169; by specifying where the gene is read
# from (minus strand) and the position of the amino acid is produced by this
# codon (in this case the gene starts at 2153889, therefore the amino acid is
# ((2155167-2153889)/3 +1)=427) it is later possible to check for (non)
# synonymous mutations as before, the "test" consists of a template, a genotype
# and a resource (but the "template" is a region and not a SNP as before)
katG_codon = Test(TemplateFromGenome(genome=ancestor, start=2155167,
    stop=2155169, direction='-', aa_pos0=(2155167-2153889)/3 +1), katG,
    tbdream)


# it's important to NAME the testsuite the SAME AS THE FILENAME up to the first
# dash !  (e.g. it's possible to rename this file to "example-0.1.py")
example = Testsuite([SNP1, SNP2, katG_codon], VERSION)


# note that this testsuite is very simple and will simply eport any mutations
# found in the FastQ file -- often it makes sense to subclass the Testsuite
# class (and redefine the _analyse method) to get a fine-grained control on how
# the mutations are synthesized into a result... see the source code of
# kvarq.genes.phylo as an example