Cross Validation

Leave-one-out and k-fold

mlpy.cv_kfold(n, k, strat=None, seed=0)

Returns train and test indexes for k-fold cross-validation.

Parameters :
n : int (n > 1)

number of indexes

k : int (k > 1)

number of iterations (folds). The case k = n is known as leave-one-out cross-validation.

strat : None or 1d array_like integer (of length n)

labels for stratification. If strat is not None returns ‘stratified’ k-fold CV indexes, where each subsample has roughly the same label proportions of strat.

seed : int

random seed

Returns :
idx: list of tuples

list of k tuples containing the train and test indexes

Example:

>>> import mlpy
>>> idx = mlpy.cv_kfold(n=12, k=3)
>>> for tr, ts in idx: tr, ts
... 
(array([2, 8, 1, 7, 9, 3, 0, 5]), array([ 6, 11,  4, 10]))
(array([ 6, 11,  4, 10,  9,  3,  0,  5]), array([2, 8, 1, 7]))
(array([ 6, 11,  4, 10,  2,  8,  1,  7]), array([9, 3, 0, 5]))
>>> strat = [0,0,0,0,0,0,0,0,1,1,1,1]
>>> idx = mlpy.cv_kfold(12, k=4, strat=strat)
>>> for tr, ts in idx: tr, ts
... 
(array([ 1,  7,  3,  0,  5,  4,  8, 10,  9]), array([ 6,  2, 11]))
(array([ 6,  2,  3,  0,  5,  4, 11, 10,  9]), array([1, 7, 8]))
(array([ 6,  2,  1,  7,  5,  4, 11,  8,  9]), array([ 3,  0, 10]))
(array([ 6,  2,  1,  7,  3,  0, 11,  8, 10]), array([5, 4, 9]))

Random Subsampling (aka MonteCarlo)

mlpy.cv_random(n, k, p, strat=None, seed=0)

Returns train and test indexes for random subsampling cross-validation. The proportion of the train/test indexes is not dependent on the number of iterations k.

Parameters :
n : int (n > 1)

number of indexes

k : int (k > 0)

number of iterations (folds)

p : float (0 <= p <= 100)

percentage of indexes in test

strat : None or 1d array_like integer (of length n)

labels for stratification. If strat is not None returns ‘stratified’ random subsampling CV indexes, where each subsample has roughly the same label proportions of strat.

seed : int

random seed

Returns :
idx: list of tuples

list of k tuples containing the train and test indexes

Example:

>>> import mlpy
>>> ap = mlpy.cv_random(n=12, k=4, p=30)
>>> for tr, ts in ap: tr, ts
... 
(array([ 6, 11,  4, 10,  2,  8,  1,  7,  9]), array([3, 0, 5]))
(array([ 5,  2,  3,  4,  9,  0, 11,  7,  6]), array([ 1, 10,  8]))
(array([ 6,  1, 10,  2,  7,  5, 11,  0,  3]), array([4, 9, 8]))
(array([2, 4, 8, 9, 5, 6, 1, 0, 7]), array([10, 11,  3]))

All Combinations

mlpy.cv_all(n, p)

Returns train and test indexes for all-combinations cross-validation.

Parameters :
n : int (n > 1)

number of indexes

p : float (0 <= p <= 100)

percentage of indexes in test

Returns :
idx : list of tuples

list of tuples containing the train and test indexes

Example

>>> import mlpy
>>> idx = mlpy.cv_all(n=4, p=50)
>>> for tr, ts in idx: tr, ts
... 
(array([2, 3]), array([0, 1]))
(array([1, 3]), array([0, 2]))
(array([1, 2]), array([0, 3]))
(array([0, 3]), array([1, 2]))
(array([0, 2]), array([1, 3]))
(array([0, 1]), array([2, 3]))
>>> idx = mlpy.cv_all(a, 10) # ValueError: p must be >= 25.000

Table Of Contents

Previous topic

Dimensionality Reduction

Next topic

Metrics

This Page