1.Problem and Loss Function

1.Problem and Loss Function

Linear Regression is a Supervised Learning Algorithm with input matrix X and output label Y. We train a system to make hypothesis, which we hope to be as close to Y as possible. The system we build for Linear Regression is :

h_{θ}(X)=θ^{T}X

From the initial state, we probably have a really poor system (may be only output zero). By using X and Y to train, we try to derive a better parameter θ. The training process (learning process) may be time-consuming, because the algorithm updates parameters only a little on every training step.

2. Cost Function?

Suppose driving from somewhere to Toronto: it is easy to know the coordinates of Toronto, but it is more important to know where we are now! Cost function is the tool giving us how different between Hypothesis and label Y, so that we can drive to the target. For regression problem, we use MSE as the cost function.

This can be understood from another perspective. Suppose the difference between Y and H is
ε, and
ε~N(0,σ^{2}). So,
y~N(θ^{T}X,σ^{2}). Then we do Maximum Likelihood Estimate, we can also get the same cost function. (https://stats.stackexchange.com/questions/253345/relationship-between-mle-and-least-squares-in-case-of-linear-regression)

3.Gradient Descent

The process of GD is quite like go downhill along the steepest direction on every dimension.

We take derivatives along every dimension

Then update all θ by a small learning rate alpha simultaneously：

4. Batch Learning, Stochastic and Mini Batch

In above, we use all the training examples together to calculate cost function and gradient. This method is called 'Batch Gradient Descent'. The issue here is: what if there is a exetremely large data set? The training process can be quitely long. A variant is called Stochastic Gradient Descent, also 'Online Learning'. Every time when it trains, the algorithm only uses a single training example, which may result in very zigzagged learning curve. Finally, the most popurlar version:' Mini-Batch Gradient Descent'. It chooses a small group of training example to learn, so the speed is OK, and the learning curve is more smooth.

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系我们删除。

線性回歸、梯度下降（Linear Regression、Gradient Descent）
Gradient Descent for Linear Regression，線性回歸的梯度下降算法
線性回歸、梯度下降（Linear Regression、Gradient Descent）
線性回歸、梯度下降、邏輯回歸（Linear Regression、Gradient Descent、Logistic Regression）
An Introduction to Gradient Descent and Linear Regression--梯度下降法和線性回歸介紹
斯坦福機器學習視頻筆記 Week1 線性回歸和梯度下降 Linear Regression and Gradient Descent
機器學習-線性回歸使用批量梯度下降。 - Machine learning - Linear regression using batch gradient descent
機器學習線性回歸（linear regression）/梯度下降法（gradient descent)/最大似然函數/--附python代碼
求解線性回歸的梯度下降法和法方程法給出了不同的解。 - Gradient descent and normal equation method for solving linear regression gives different solutions
機器學習 線性回歸(regression)、梯度下降(gradient descent)