Class that computes k-mer frequences for k = 1...n at the same time. More...
#include <kmers.hpp>
Public Member Functions | |
KmerCounterLadder (int kmerLen, const AbcConvCharToInt *pAbcConv=0, RC_POLICY revCompPolicy=RC_DIRECT) | |
A constructor. | |
void | doCNuc (CNuc cnuc) |
This method is called to process the input sequence. | |
template<class IterCNuc > | |
void | process (IterCNuc first, IterCNuc last) |
Accumulate counts for a CNuc sequence represented by an interator range. | |
template<class IterInd , class IterVal > | |
void | counts (IterVal valObs, IterVal valExp, IterInd ind, IterInd sizes) |
Extract the counts accumulated so far and reset the internal state for the next round of accumulation. | |
template<class IterInd > | |
int | numKmers (IterInd sizes) const |
Store into the output iterator the number of unique k-mers found for each k and return their sum. | |
template<class IterInd > | |
int | maxNumKmers (ULong seqLen, IterInd sizes) const |
Store into the output iterator the maximum number of unique k-mers that can be found for each k for a given sequence length. | |
Protected Member Functions | |
void | makeTopDownKmerInd () |
Create internal indices that connect each top level k-mer with (k-1)-mer prefix and 1-mer suffix. | |
Protected Attributes | |
PKmerCounterArray | m_counters |
KmerCounter pointer array [0,m_kmerLen], with elements [1,m_kmerLen] initialized. | |
std::vector< indvec > | m_topBotDep |
m_topBotDep[k_mer_index]->(k-1)_mer_index array [0,m_kmerLen], with elements [1,m_kmerLen] initialized | |
std::vector< indvec > | m_topOneDep |
m_topOneDep[k_mer_index]->(1)_mer_index array [0,m_kmerLen], with elements [1,m_kmerLen] initialized | |
RC_POLICY | m_revCompPolicy |
How reverse complements are treated - one of RC_XXX. |
Class that computes k-mer frequences for k = 1...n at the same time.
It normalizes observed frequencies for a given k with expected frequencies calculated based on observed frequencies for (k-1) and 1. The implementation: 1. Compute counts for k = 1...n. Currently we just do it independently for each k through calls to doCNuc(). Future implementation may compute n-mers and then find (n-1)-mer counts by incrementing by one the count for (n-1) prefix of each n-mer plus special handling of the right boundary n-mer (and repeating recursively for (n-2) and so on). That will be more efficient for long sequences. 2. Compute expected counts for each n-mer found in (1). 3. Store normalized frequences into the output array.
MGT::KmerCounterLadder::KmerCounterLadder | ( | int | kmerLen, |
const AbcConvCharToInt * | pAbcConv = 0 , |
||
RC_POLICY | revCompPolicy = RC_DIRECT |
||
) |
A constructor.
kmerLen | is a max length of a k-mer. K-mers from 1 to k will be counted. In the current implementation, all kmers are precalculated and stored in memory, so be reasonable with this parameter. |
pAbcConv | is an alphabet convertor (stored inside this KmerCounter object but not managed). |
revCompPolicy | what to do about reverse complement k-mers (currently only RC_DIRECT is supported). |
void MGT::KmerCounterLadder::counts | ( | IterVal | valObs, |
IterVal | valExp, | ||
IterInd | ind, | ||
IterInd | sizes | ||
) |
Extract the counts accumulated so far and reset the internal state for the next round of accumulation.
ind | output iterator for observed k-mer ID values |
valObs | output iterator for observed k-mer counts |
valExp | output iterator for expected counts for each observed k-mer |
sizes | output iterator for number of unique observed k-mers for each k. First all unique observed k-mers for k=kmerLen will be appended to ind,valObs and valExp, and the number of written items for this k will be appended to sizes, and then it will be repeated for k=kmerLen-1 and so forth, up to k=1 inclusive. Expected counts for k=1 will be assigned assuming equal probability. This choice is arbitrary and does not affect the expected counts for higher order k-mers. |
void MGT::KmerCounterLadder::doCNuc | ( | CNuc | cnuc ) | [inline] |
This method is called to process the input sequence.
Series of calls to this method are interleaved with calls to result extraction methods such as counts().
cnuc | one nucleotide character value (such as 'A') |
void MGT::KmerCounterLadder::makeTopDownKmerInd | ( | ) | [protected] |
Create internal indices that connect each top level k-mer with (k-1)-mer prefix and 1-mer suffix.
Must be called from the constructor.
int MGT::KmerCounterLadder::maxNumKmers | ( | ULong | seqLen, |
IterInd | sizes | ||
) | const |
Store into the output iterator the maximum number of unique k-mers that can be found for each k for a given sequence length.
seqLen | sequence length |
output | iterator - will hold a number of unique k-mer for k,k-1,...,1 |
int MGT::KmerCounterLadder::numKmers | ( | IterInd | sizes ) | const |
Store into the output iterator the number of unique k-mers found for each k and return their sum.
output | iterator - will hold a number of unique k-mer for k,k-1,...,1 |
RC_POLICY MGT::KmerCounterLadder::m_revCompPolicy [protected] |
How reverse complements are treated - one of RC_XXX.