Machine Learning for beginner

본 토픽은 현재 준비중입니다. 공동공부에 참여하시면 완성 되었을 때 알려드립니다.

VII. Regularization (Week 3)

The Problem of Overfitting

Regularization is designed to address the problem of overfitting.

High bias or underfitting is when the form of our hypothesis maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features.

At the other extreme, overfitting or high variance is caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.

This terminology is applied to both linear and logistic regression.

There are two main options to address the issue of overfitting:

  1. Reduce the number of features.
    • Manually select which features to keep.
    • Use a model selection algorithm (studied later in the course).
  2. Regularization
    • Keep all the features, but reduce the parameters θj.

Regularization works well when we have a lot of slightly useful features.

Cost Function

If we have overfitting from our hypothesis function, we can reduce the weight that some of the terms in our function carry by increasing their cost.

 

댓글

댓글 본문
graphittie 자세히 보기