Class that computes k-mer frequences for k = 1...n at the same time. More...
#include <kmers.hpp>
Public Member Functions | |
| KmerCounterLadder (int kmerLen, const AbcConvCharToInt *pAbcConv=0, RC_POLICY revCompPolicy=RC_DIRECT) | |
| A constructor. | |
| void | doCNuc (CNuc cnuc) |
| This method is called to process the input sequence. | |
| template<class IterCNuc > | |
| void | process (IterCNuc first, IterCNuc last) |
| Accumulate counts for a CNuc sequence represented by an interator range. | |
| template<class IterInd , class IterVal > | |
| void | counts (IterVal valObs, IterVal valExp, IterInd ind, IterInd sizes) |
| Extract the counts accumulated so far and reset the internal state for the next round of accumulation. | |
| template<class IterInd > | |
| int | numKmers (IterInd sizes) const |
| Store into the output iterator the number of unique k-mers found for each k and return their sum. | |
| template<class IterInd > | |
| int | maxNumKmers (ULong seqLen, IterInd sizes) const |
| Store into the output iterator the maximum number of unique k-mers that can be found for each k for a given sequence length. | |
Protected Member Functions | |
| void | makeTopDownKmerInd () |
| Create internal indices that connect each top level k-mer with (k-1)-mer prefix and 1-mer suffix. | |
Protected Attributes | |
| PKmerCounterArray | m_counters |
| KmerCounter pointer array [0,m_kmerLen], with elements [1,m_kmerLen] initialized. | |
| std::vector< indvec > | m_topBotDep |
| m_topBotDep[k_mer_index]->(k-1)_mer_index array [0,m_kmerLen], with elements [1,m_kmerLen] initialized | |
| std::vector< indvec > | m_topOneDep |
| m_topOneDep[k_mer_index]->(1)_mer_index array [0,m_kmerLen], with elements [1,m_kmerLen] initialized | |
| RC_POLICY | m_revCompPolicy |
| How reverse complements are treated - one of RC_XXX. | |
Class that computes k-mer frequences for k = 1...n at the same time.
It normalizes observed frequencies for a given k with expected frequencies calculated based on observed frequencies for (k-1) and 1. The implementation: 1. Compute counts for k = 1...n. Currently we just do it independently for each k through calls to doCNuc(). Future implementation may compute n-mers and then find (n-1)-mer counts by incrementing by one the count for (n-1) prefix of each n-mer plus special handling of the right boundary n-mer (and repeating recursively for (n-2) and so on). That will be more efficient for long sequences. 2. Compute expected counts for each n-mer found in (1). 3. Store normalized frequences into the output array.
| MGT::KmerCounterLadder::KmerCounterLadder | ( | int | kmerLen, |
| const AbcConvCharToInt * | pAbcConv = 0, |
||
| RC_POLICY | revCompPolicy = RC_DIRECT |
||
| ) |
A constructor.
| kmerLen | is a max length of a k-mer. K-mers from 1 to k will be counted. In the current implementation, all kmers are precalculated and stored in memory, so be reasonable with this parameter. |
| pAbcConv | is an alphabet convertor (stored inside this KmerCounter object but not managed). |
| revCompPolicy | what to do about reverse complement k-mers (currently only RC_DIRECT is supported). |
| void MGT::KmerCounterLadder::counts | ( | IterVal | valObs, |
| IterVal | valExp, | ||
| IterInd | ind, | ||
| IterInd | sizes | ||
| ) |
Extract the counts accumulated so far and reset the internal state for the next round of accumulation.
| ind | output iterator for observed k-mer ID values |
| valObs | output iterator for observed k-mer counts |
| valExp | output iterator for expected counts for each observed k-mer |
| sizes | output iterator for number of unique observed k-mers for each k. First all unique observed k-mers for k=kmerLen will be appended to ind,valObs and valExp, and the number of written items for this k will be appended to sizes, and then it will be repeated for k=kmerLen-1 and so forth, up to k=1 inclusive. Expected counts for k=1 will be assigned assuming equal probability. This choice is arbitrary and does not affect the expected counts for higher order k-mers. |
| void MGT::KmerCounterLadder::doCNuc | ( | CNuc | cnuc ) | [inline] |
This method is called to process the input sequence.
Series of calls to this method are interleaved with calls to result extraction methods such as counts().
| cnuc | one nucleotide character value (such as 'A') |
| void MGT::KmerCounterLadder::makeTopDownKmerInd | ( | ) | [protected] |
Create internal indices that connect each top level k-mer with (k-1)-mer prefix and 1-mer suffix.
Must be called from the constructor.
| int MGT::KmerCounterLadder::maxNumKmers | ( | ULong | seqLen, |
| IterInd | sizes | ||
| ) | const |
Store into the output iterator the maximum number of unique k-mers that can be found for each k for a given sequence length.
| seqLen | sequence length |
| output | iterator - will hold a number of unique k-mer for k,k-1,...,1 |
| int MGT::KmerCounterLadder::numKmers | ( | IterInd | sizes ) | const |
Store into the output iterator the number of unique k-mers found for each k and return their sum.
| output | iterator - will hold a number of unique k-mer for k,k-1,...,1 |
RC_POLICY MGT::KmerCounterLadder::m_revCompPolicy [protected] |
How reverse complements are treated - one of RC_XXX.
1.7.2