Public Member Functions
def	__init__
def	defRecDtype
def	makeRecords
def	fromIdFile
def	fromFields
def	union
def	getSplits
def	join
def	selBySplits
def	selByLabs
def	selById
def	selByCondition
def	selDataInd
def	selData
def	setDataLab
def	balance
def	selAcrossSplits
def	remapIds

Detailed Description

Maps unique sample IDs into classification labels (many-to-one relationship) and their splits

Constructor & Destructor Documentation

def MGT::Svm::IdLabels::__init__	(	self,
		records = `None`,
		fileName = `None`,
		initMaps = `True`
	)

Ctor.
@param records Numpy record array with fields id,label,split. Conversion to recarray will be attempted if necessary.
@param fileName file created by save() call
@pre Either records or fileName must be None. id field is unique, e.g. obtained with UUID module

Member Function Documentation

def MGT::Svm::IdLabels::balance	(	self,
		maxCount = `0`,
		targets = `{}`,
		rndSeed = `None`
	)

Return a new Idlabels object that has id counts balanced within each split across labels.
@param maxCount default maximum count value for each (split,label) combination, @see balance() 
at the module level for the meaning of zero and negative values.
@param targets dict {split : { "maxCount" : value, "labTargets" : value },...} where each value is optional
and overrides the maxCount parameter for a specific split or specific (split,label) pair.
@return new IdLabels balanced object.

def MGT::Svm::IdLabels::defRecDtype ( klass )

Return a default numpy dtype that can be used to construct new record arrays

def MGT::Svm::IdLabels::fromFields	(	klass,
		id,
		label = `0`,
		split = `0`
	)

Create new object from ID and other field arrays.
@param id ID array
@param label Label array (or a constant value)
@param split Split array (or a constant value)

def MGT::Svm::IdLabels::fromIdFile	(	klass,
		featFile,
		label = `0`,
		split = `0`
	)

Create new object from IDs extracted from data file setting other fields to constant values.
@param featFile the name of the data file. The IDs will be loaded from the accompanying id file.

def MGT::Svm::IdLabels::getSplits ( self )

Return dict(split->IdLabel)

def MGT::Svm::IdLabels::join	(	self,
		other,
		colOther = `"split"`,
		requireMatch = `False`
	)

Perform inner join with another IdLabel object or a record array.
@param other IdLabel instance or record array with at least fields "id" and colOther
@param colOther Data column to take from other - the rest will be taken from self
@param requireMatch If True, will expect every ID from self to be present in other
@return new IdLabels instance resulting from join operation

def MGT::Svm::IdLabels::makeRecords	(	klass,
		nrec,
		dtype = `None`
	)

Return a zeroed out new record array of a given size.
The caller is responsible for filling it with values.
@param nrec size of returned array
@param dtype numpy dtype for array records, if None, the result of defRecDtype() will be used

def MGT::Svm::IdLabels::remapIds	(	self,
		idMap
	)

Return a new IdLabels object that has new id values created from IdMap argument.
Because IdMap represents a 1-to-N old-to-new relation, the returned object will have
multiple copies of the same records but with different new id value.

def MGT::Svm::IdLabels::selAcrossSplits ( self )

Return a new IdLabels object such that each label left is present in every split.
Rational: if we do cross-validation and label is not present in training split, but
present in testing, we will get zero true positives for that label during testing.
Therefore, it makes sense to do CV only when every label is present in every split.
Alternatively we could skip during testing all samples with labels not present in
training, but would negatively bias our specificity estimate (because corresponding
classes would have a chance of showing false positives only.

def MGT::Svm::IdLabels::selByCondition	(	self,
		condition
	)

Return a new IdLabels object that contains only records for which 'condition(record)' returns True. 
@param condition unary predicate that will be applied to every record
@return new IdLabels object

def MGT::Svm::IdLabels::selById	(	self,
		ids,
		rawRec = `False`
	)

Return a new IdLabels object (or only records array) that contains only records within a given ids sequence.

def MGT::Svm::IdLabels::selByLabs	(	self,
		labs
	)

Return a new IdLabels object that contains only records within a given labels sequence.
@todo write a single selByField() method and call that from other selByXXX().

def MGT::Svm::IdLabels::selBySplits	(	self,
		splits
	)

Return a new IdLabels object that contains only records with 'split' value within a given list
@param splits sequence of allowed split values
@return new IdLabels object

def MGT::Svm::IdLabels::selData	(	self,
		data,
		setLab = `True`
	)

Return records from data array that are referenced here.
@param data feature data array
@param setLab if True, labels if returned data records will be updated from this object

def MGT::Svm::IdLabels::selDataInd	(	self,
		data
	)

Return array of row indices for data rows that are referenced here

def MGT::Svm::IdLabels::setDataLab	(	self,
		data
	)

Update labels in the data feature array from this object

def MGT::Svm::IdLabels::union	(	klass,
		idLabs
	)

Return a union of several IdLabels objects.
@param idlabs a sequence of IdLabels objects.
@return IdLabels object that is a union.
@pre if there are identical ID values in input arrays, the corresponding records must be fully identical too.

The documentation for this class was generated from the following file:

mgtaxa/MGT/Svm.py

MGT::Svm::IdLabels Class Reference

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation