What is Bias?
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. The model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high errors in training and test data.
Error in Bias
The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict. Of course, you only have one model so talking about expected or average prediction values might seem a little strange. However, imagine you could repeat the whole model building process more than once: each time you gather new data and run a new analysis creating a new model. Due to randomness in the underlying data sets, the resulting models will have a range of predictions. Bias measures how far off, in general, these models’ predictions are from the correct value.
What is the variance?
Variance is the variability of model prediction for a given data point or a value that tells us the spread of our data. A model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but have high error rates on test data.
Error in Variance
The error due to variance is taken as the variability of a model prediction for a given data point. Again, imagine you can repeat the entire model building process multiple times. The variance is how much the predictions for a given point vary between different realizations of the model.
Essentially, bias is how removed a model’s predictions are from correctness, while variance is the degree to which these predictions vary between model iterations.
Also Read: Anomaly Detection in Machine Learning
But why is there Bias Variance Trade-off?
Let’s do a thought experiment:
- Imagine you’ve collected 5 different training sets for the same problem.
- Now imagine using one algorithm to train 5 models, one for each of your training sets.
- Bias vs. variance refers to the accuracy vs. consistency of the models trained by your algorithm.
Low variance (high bias) algorithms tend to be less complex, with a simple or rigid underlying structure.
- They train models that are consistent, but inaccurate on average.
- These include linear or parametric algorithms such as regression and naive Bayes.
On the other hand, low bias (high variance) algorithms tend to be more complex, with a flexible underlying structure.
- They train models that are accurate on average, but inconsistent.
- These include non-linear or non-parametric algorithms such as decision trees and nearest neighbors.
a. Low Variance algorithms tend to be less complex, with a simple or rigid underlying structure. Example: Regression, Naive Bayes, Linear algorithms, Parametric algorithms. Like, a regression can be regularized to further reduce complexity. Algorithms that are not complex enough to produce underfit models that can’t learn the signal from the data.
b. Low bias algorithms tend to more complex, with a flexible underlying structure. Example: Decision Tree, Nearest neighbors, non-linear algorithms, non-parametric algorithms. Like, decision trees can be pruned to reduce complexity. Algorithms that are too complex produce overfit models that memorize the noise instead of the signal.
To get good predictions, you will need to find a balance of Bias and Variance that minimizes total error.
Total Error = Bias^2 + Variance + Irreducible Error
(Irreducible error is “noise” that can’t be reduced by algorithms. It can be sometimes be reduced by better data cleaning)
In supervised learning, underfitting happens when a model unable to capture the underlying pattern of the data. These models usually have high bias and low variance. It happens when we have very less amount of data to build an accurate model or when we try to build a linear model with nonlinear data. Also, these kinds of models are very simple to capture the complex patterns in data like Linear and logistic regression.
In supervised learning, overfitting happens when our model captures the noise along with the underlying pattern in data. It happens when we train our model a lot over the noisy datasets. These models have low bias and high variance. These models are very complex, like Decision trees that are prone to overfitting.
The Bias-Variance Trade off is relevant for supervised machine learning – specifically for predictive modeling. It’s a way to diagnose the performance of an algorithm by breaking down its prediction error.
In machine learning, an algorithm is simply a repeatable process used to train a model from a given set of training data. You have many algorithms to choose from, such as Linear Regression, Decision Trees, Neural Networks, SVM’s, and so on.
There are 3 types of prediction error: bias, variance, and irreducible error.
Irreducible error is also known as “noise,” and it can’t be reduced by your choice in the algorithm. It typically comes from inherent randomness, a mis-framed problem, or an incomplete feature set.
The other two types of errors, however, can be reduced because they stem from your algorithm choice.