SVM is a very diverse model. The general idea is to build a hyperplane that maximizes the margins between classifications. We will cover the the theory in a separate post, but you can think of a building a line in 2-d and a sheet of paper in 3-d.
There are a few different SVM models in Sklearn. We will start by using the simple
LinearSVC class. SVM also requires some tuning params. For now, we just specified a default
C (cost) of 1.0.
from sklearn.svm import LinearSVC from sklearn import datasets iris = datasets.load_iris() features = iris.data[:100,:2] target = iris.target[:100] svc = LinearSVC(C=1.0) model = svc.fit(features, target) print(model.score())
To get a better idea of what is happening, let's plot some a 2d version of our SVM to see what it is creating.
# Load library from matplotlib import pyplot as plt # Plot data points and color using their class color = ["black" if c == 0 else "lightgrey" for c in target] plt.scatter(features_standardized[:,0], features_standardized[:,1], c=color) # Create the hyperplane w = svc.coef_ a = -w / w xx = np.linspace(-2.5, 2.5) yy = a * xx - (svc.intercept_) / w # Plot the hyperplane plt.plot(xx, yy) plt.axis("off"), plt.show();
Here, you can see that we create a line through the two classes of observations. Classification won't always be this easy, but you can get a good idea of the intention.
In the example above, we left out preprocessing in case someone is just looking for the example code. However, it is usually good to at least standardized your data, espceially for SVM. Here is the example again with standardization.
# Load libraries from sklearn.svm import LinearSVC from sklearn import datasets from sklearn.preprocessing import StandardScaler import numpy as np # Load data with only two classes and two features iris = datasets.load_iris() features = iris.data[:100,:2] target = iris.target[:100] # Standardize features scaler = StandardScaler() features_standardized = scaler.fit_transform(features) # Create support vector classifier svc = LinearSVC(C=1.0) # Train model model = svc.fit(features_standardized, target)