Data that can be separated by a line or in general, a hyperplane is know as linearly separable data. The hyperplane acts as a linear classifier.
SVM can perfectly handle data which are linearly separable. How does SVM handle the case where the data in absolutely not linearly separable ?
So what do we do now ?
Project the data into a space where it is linearly separable and find a hyperplane in that space. This relates to the concept of kernel. A SVM kernel function gives the possibility to transform non-linear spaces into linear spaces. Most SVM libraries already come pre-packaged with some popular kernels like
The kernel trick involves mapping the input data into a higher-dimensional space where it is possible to find a linear separating hyperplane. This is based on the idea that while data might not be linearly separable in the original space, it could be linearly separable in a higher-dimensional space.