Public Member Functions | Protected Member Functions | Protected Attributes

MGT::KmerCounterLadder Class Reference

Class that computes k-mer frequences for k = 1...n at the same time. More...

#include <kmers.hpp>

Inheritance diagram for MGT::KmerCounterLadder:
MGT::Kmers::KmerLadderSparseFeatures

List of all members.

Public Member Functions

 KmerCounterLadder (int kmerLen, const AbcConvCharToInt *pAbcConv=0, RC_POLICY revCompPolicy=RC_DIRECT)
 A constructor.
void doCNuc (CNuc cnuc)
 This method is called to process the input sequence.
template<class IterCNuc >
void process (IterCNuc first, IterCNuc last)
 Accumulate counts for a CNuc sequence represented by an interator range.
template<class IterInd , class IterVal >
void counts (IterVal valObs, IterVal valExp, IterInd ind, IterInd sizes)
 Extract the counts accumulated so far and reset the internal state for the next round of accumulation.
template<class IterInd >
int numKmers (IterInd sizes) const
 Store into the output iterator the number of unique k-mers found for each k and return their sum.
template<class IterInd >
int maxNumKmers (ULong seqLen, IterInd sizes) const
 Store into the output iterator the maximum number of unique k-mers that can be found for each k for a given sequence length.

Protected Member Functions

void makeTopDownKmerInd ()
 Create internal indices that connect each top level k-mer with (k-1)-mer prefix and 1-mer suffix.

Protected Attributes

PKmerCounterArray m_counters
 KmerCounter pointer array [0,m_kmerLen], with elements [1,m_kmerLen] initialized.
std::vector< indvec > m_topBotDep
 m_topBotDep[k_mer_index]->(k-1)_mer_index array [0,m_kmerLen], with elements [1,m_kmerLen] initialized
std::vector< indvec > m_topOneDep
 m_topOneDep[k_mer_index]->(1)_mer_index array [0,m_kmerLen], with elements [1,m_kmerLen] initialized
RC_POLICY m_revCompPolicy
 How reverse complements are treated - one of RC_XXX.

Detailed Description

Class that computes k-mer frequences for k = 1...n at the same time.

It normalizes observed frequencies for a given k with expected frequencies calculated based on observed frequencies for (k-1) and 1. The implementation: 1. Compute counts for k = 1...n. Currently we just do it independently for each k through calls to doCNuc(). Future implementation may compute n-mers and then find (n-1)-mer counts by incrementing by one the count for (n-1) prefix of each n-mer plus special handling of the right boundary n-mer (and repeating recursively for (n-2) and so on). That will be more efficient for long sequences. 2. Compute expected counts for each n-mer found in (1). 3. Store normalized frequences into the output array.


Constructor & Destructor Documentation

MGT::KmerCounterLadder::KmerCounterLadder ( int  kmerLen,
const AbcConvCharToInt pAbcConv = 0,
RC_POLICY  revCompPolicy = RC_DIRECT 
)

A constructor.

Parameters:
kmerLenis a max length of a k-mer. K-mers from 1 to k will be counted. In the current implementation, all kmers are precalculated and stored in memory, so be reasonable with this parameter.
pAbcConvis an alphabet convertor (stored inside this KmerCounter object but not managed).
revCompPolicywhat to do about reverse complement k-mers (currently only RC_DIRECT is supported).

Member Function Documentation

template<class IterInd , class IterVal >
void MGT::KmerCounterLadder::counts ( IterVal  valObs,
IterVal  valExp,
IterInd  ind,
IterInd  sizes 
)

Extract the counts accumulated so far and reset the internal state for the next round of accumulation.

Parameters:
indoutput iterator for observed k-mer ID values
valObsoutput iterator for observed k-mer counts
valExpoutput iterator for expected counts for each observed k-mer
sizesoutput iterator for number of unique observed k-mers for each k. First all unique observed k-mers for k=kmerLen will be appended to ind,valObs and valExp, and the number of written items for this k will be appended to sizes, and then it will be repeated for k=kmerLen-1 and so forth, up to k=1 inclusive. Expected counts for k=1 will be assigned assuming equal probability. This choice is arbitrary and does not affect the expected counts for higher order k-mers.
void MGT::KmerCounterLadder::doCNuc ( CNuc  cnuc ) [inline]

This method is called to process the input sequence.

Series of calls to this method are interleaved with calls to result extraction methods such as counts().

Parameters:
cnucone nucleotide character value (such as 'A')
void MGT::KmerCounterLadder::makeTopDownKmerInd (  ) [protected]

Create internal indices that connect each top level k-mer with (k-1)-mer prefix and 1-mer suffix.

Must be called from the constructor.

template<class IterInd >
int MGT::KmerCounterLadder::maxNumKmers ( ULong  seqLen,
IterInd  sizes 
) const

Store into the output iterator the maximum number of unique k-mers that can be found for each k for a given sequence length.

Parameters:
seqLensequence length
outputiterator - will hold a number of unique k-mer for k,k-1,...,1
Returns:
sum of values stored into sizes
template<class IterInd >
int MGT::KmerCounterLadder::numKmers ( IterInd  sizes ) const

Store into the output iterator the number of unique k-mers found for each k and return their sum.

Parameters:
outputiterator - will hold a number of unique k-mer for k,k-1,...,1

Member Data Documentation

How reverse complements are treated - one of RC_XXX.


The documentation for this class was generated from the following files: