Wednesday, February 14, 2018

Linear Regression

Linear Regression is one of the simplest supervised machine learning algorithm and the predicted output is continuous.It is used to determine the extent to which there is a linear relationship between a dependent variable and one or more independent variables.

So , just dive into the depth of the Linear Regression.

We will start with basic linear equation model.

The general linear model contains a relation between the independent Variable (x) and dependent variable (Y) .

So, what does it mean ,

case 1 .)  Y= x
 if ,
       x=3  then Y=3
       x=6  then Y=6
       x=9  then Y=9

case 2 .) Now multiply the independent variable x with a coefficient (a).
So , the equation will become   Y = ax and assume that a=2.

if,
       x=3  then Y=6
       x=6  then Y=12
       x=9  then Y=18


case 3 .) Now add a constant value b into the equation , we have got in case 2. So, the new equation will be Y=ax + b and b =3.

if,
      x=3  then Y=9
      x=6  then Y=15
      x=9  then Y=21

The linear regression can be represented by the below equation :-
                              Y= ax + b
Here ,

Y =Dependent Variable
a  =Regression Coefficient
x  =Independent Variable
b  =Constant

Now , let us understand the basic linear regression where the dependent variable x is called "feature"  while y is called "response".




We will create  the scatter plot of the above feature and response.


 Now ,the best fit line is the one which fits this scatter plot .

 The above line is called is Linear Regression.

  Equation to represent the above the Point.





here,
h(x_i) represent the predicted variable.
b_0  and b_1 represent the regression coefficient.
Now, Let us consider that the every observation contains some residual error associated with it.
 
The above equation can be written as   
 Hence , the residual error is :-





Hence , the residual error is :-

 
Now ,the focus should be on minimizing this error and so we need to take the sum of squared error (SSE)
The cost function or squared error can be defined by the below equation :-
 
After doing the calculation of the above equation , the result will be like :-


Python Implementation of the above code :-

import matplotlib.pyplot as plt
from sklearn import linear_model
x_train= [[2],[4],[6],[8],[10],[12],[14],[16]]
y_train =[[4],[6],[7],[10],[12],[15],[17],[19]]
regr = linear_model.LinearRegression()
regr.fit(x_train,y_train)
y_output =regr.predict(22)
print(regr.predict(22))
Output :- [[ 25.64285714]]

Further Reading :- https://en.wikipedia.org/wiki/Simple_linear_regression

You can download the code from my github id :- https://github.com/sangam92/Machine-Learning-tutorials

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...