Mgtaxa: MGT::SampDb::HdfSampInd Class Reference

Detailed Description

HDF data type to describe one sequence sample.
A sequence sample starts at some position in one sequence, possibly spans several
full consequtive sequences and can end in an arbitrary position of the last sequence.
Although, the sample length size is typically constant, we define it inside this type
to allow for variable size samples in a more general case. E.g. this will cover WGS contigs
for which we want to predict taxonomy without shredding them into contant size chunks.
The indSeq, begin and sampLen fields are sufficient to pull the sample out from sequence database.
The nSeq field is an optimization: it adds one byte to the size of the data structure.
If the sample spans less than 256 sequences (including sequences where it starts and where it ends),
nSeq holds that number. Otherwise, nSeq is set to zero. Knowing the number of sequences allows
for more simple and faster Python code that pulls the sequence data.
The reason for having this datatype at all instead of just creating another SeqInd is because
we need to insert spacers when we pull the sample spanning several sequences.

The documentation for this class was generated from the following file:

mgtaxa/MGT/SampDb.py