Introduction
Any other Machine Learning Classifier's Decision Boundary
Support Vector Machine's Decision Boundary
Playground
You have opted Linear Kernel to find the Decision boundary. A Linear Kernel is not a kernel per se, but the simple SVM. As evident, from the name, it finds a linear decision boundary between the classes, and works well when the classes are linearly separable. However, since the points here, are not linearly separable, we cannot use it to find the perfect decision boundary. What we can do though, is to allow the SVM to perform some misclassifications in order to achieve the best 'possible' decision boundary. Here comes the parameter C. This parameter is a universal parameter for all kernels, allowing them to perform a few misclassifications in order to complete the task.
You have opted Polynomial Kernel to find the Decision boundary. Polynomial Kernels are used in specific cases, when you know the degree of the decision boundary, because in those cases the polynomial kernel exploits its advantage of being finite-dimensioned over the likes of the Gaussian RBF kernel. We use two parameters apart from the universal parameter C which is used for allowing misclassifications, and for what is called Soft-Margin Classification, that are 'a' and degree. The degree is used to control the dimension of the space where the kernel computes this dot product. 'a' on the other hand is merely a constant to produce a mixture of terms in that space. A higher degree tends to overfit the data, while a higher 'a', doesn't contribute much to the bias-variance tradeoff. Often, the polynomial kernel is used with a=1.
You have opted Gaussian(RBF) Kernel to find the Decision boundary. A Gaussian Kernel is a special case of the RBF kernel. It maps the features into a very high dimensional space, and as a result, can produce highly non-linear boundaries. Due to this fact, it is the most popular kernel in use. To control this nature of this kernel, we have two parameters, C and sigma. C is the usual universal parameter for allowing misclassifications, and for what is called Soft-Margin Classification. However, sigma is a parameter specific to Gaussian Kernel. It is used to basically control the variance of the higher order projection from the original point. That is, if sigma is higher, a point will influence and occupy a larger portion around itself. Thus, increasing sigma makes the decision boundary more flexible and smooth. As a result, the SVM does a bit of misclassifications as well. However, this is not a bad thing. Gaussian Kernels, being as powerful as they are, have a very high tendency of overfitting the data. Hence, increasing sigma tends to reduce overfitting.
You have opted Sigmoid Kernel to find the Decision boundary. It is interesting to note that a SVM model using a sigmoid kernel function is equivalent to a two-layer, perceptron neural network. This kernel is quite popular for support vector machines due to its origin from neural network theory. This Kernel is controlled using two parameters apart from the universal parameter C which is used for allowing misclassifications, and for what is called Soft-Margin Classification, that are 'c-sigma' and alpha. c is just a parameter added to the dot product to regulate the sum and doesn't contribute much to the bias-variance tradeoff. However, \(\alpha\) on the other hand, is a scaling parameter, and increases the area of influence of each point. It is not as good as the RBF kernel though.
You can see the Training statistics below.