Tutorial

If you are new in Python and NumPy see: http://docs.python.org/tutorial/ http://www.scipy.org/Tentative_NumPy_Tutorial and http://matplotlib.sourceforge.net/.

A learning problem usually considers a set of p-dimensional samples (observations) of data and tries to predict properties of unknown data.

Tutorial 1 - Iris Dataset

The well known Iris dataset represents 3 kinds of Iris flowers with 150 observations and 4 attributes: sepal length, sepal width, petal length and petal width.

A dimensionality reduction and learning tasks can be performed by the mlpy library with just a few number of commands.

Download Iris dataset

Load the modules:

>>> import numpy as np
>>> import mlpy
>>> import matplotlib.pyplot as plt # required for plotting

Load the Iris dataset:

>>> iris = np.loadtxt('iris.csv', delimiter=',')
>>> x, y = iris[:, :4], iris[:, 4].astype(np.int) # x: (observations x attributes) matrix, y: classes (1: setosa, 2: versicolor, 3: virginica)
>>> x.shape
(150, 4)
>>> y.shape
(150, )

Dimensionality reduction by Principal Component Analysis (PCA)

>>> pca = mlpy.PCA() # new PCA instance
>>> pca.learn(x) # learn from data
>>> z = pca.transform(x, k=2) # embed x into the k=2 dimensional subspace
>>> z.shape
(150, 2)

Plot the principal components:

>>> plt.set_cmap(plt.cm.Paired)
>>> fig1 = plt.figure(1)
>>> title = plt.title("PCA on iris dataset")
>>> plot = plt.scatter(z[:, 0], z[:, 1], c=y)
>>> labx = plt.xlabel("First component")
>>> laby = plt.ylabel("Second component")
>>> plt.show()
_images/iris_pca1.png

Learning by Kernel Support Vector Machines (SVMs) on principal components:

>>> linear_svm = mlpy.LibSvm(kernel_type='linear') # new linear SVM instance
>>> linear_svm.learn(z, y) # learn from principal components

For plotting purposes, we build the grid where we will compute the predictions (zgrid):

>>> xmin, xmax = z[:,0].min()-0.1, z[:,0].max()+0.1
>>> ymin, ymax = z[:,1].min()-0.1, z[:,1].max()+0.1
>>> xx, yy = np.meshgrid(np.arange(xmin, xmax, 0.01), np.arange(ymin, ymax, 0.01))
>>> zgrid = np.c_[xx.ravel(), yy.ravel()]

Now we perform the predictions on the grid. The pred() method returns the prediction for each point in zgrid:

>>> yp = linear_svm.pred(zgrid)

Plot the predictions:

>>> plt.set_cmap(plt.cm.Paired)
>>> fig2 = plt.figure(2)
>>> title = plt.title("SVM (linear kernel) on principal components")
>>> plot1 = plt.pcolormesh(xx, yy, yp.reshape(xx.shape))
>>> plot2 = plt.scatter(z[:, 0], z[:, 1], c=y)
>>> labx = plt.xlabel("First component")
>>> laby = plt.ylabel("Second component")
>>> limx = plt.xlim(xmin, xmax)
>>> limy = plt.ylim(ymin, ymax)
>>> plt.show()
_images/iris_svm_linear1.png

We can try to use different kernels to obtain:

_images/iris_svm_gaussian1.png _images/iris_svm_poly1.png

Table Of Contents

Previous topic

Introduction

Next topic

Linear Methods for Regression

This Page