Correlation explains us the association between two or more variables .The movement of one variable will impact the movement of another variable .
It is normally used in situation where we want to explore how two variables are related with each other.
Types of Correlation :-
Correlation can be classified in several ways but in most generic way it is divided into three ways.
(i) Positive and Negatives
(ii) Linear and Non Linear
(iii) Simple,partial and multiple
Positive and Negative : - Correlation can be positive and negative .If both the variable are moving in the same direction , we can termed them as positive correlation else it can be termed as negative correlation.
Linear and Non Linear :- If the change in one variable is accompanied by change in another variable in a constant ratio, it is a case of linear correlation.On the other hand, if the amount of change in one variable does not follow a constant ratio with the change in another variable, it is a case of non-linear or curvilinear correlation.
Simple,partial and multiple :- If only two variables are involved in a study, then the correlation is said to be simple correlation.When three or more variables are involved in a study, then it is a problem of either partial or multiple correlation. In multiple correlation, three or more variables are studied simultaneously. But in partial correlation we consider only two variables influencing each other while the effect of other variable(s) is held constant.
Let us implement a simple python code :-
It is normally used in situation where we want to explore how two variables are related with each other.
Types of Correlation :-
Correlation can be classified in several ways but in most generic way it is divided into three ways.
(i) Positive and Negatives
(ii) Linear and Non Linear
(iii) Simple,partial and multiple
Positive and Negative : - Correlation can be positive and negative .If both the variable are moving in the same direction , we can termed them as positive correlation else it can be termed as negative correlation.
Linear and Non Linear :- If the change in one variable is accompanied by change in another variable in a constant ratio, it is a case of linear correlation.On the other hand, if the amount of change in one variable does not follow a constant ratio with the change in another variable, it is a case of non-linear or curvilinear correlation.
Simple,partial and multiple :- If only two variables are involved in a study, then the correlation is said to be simple correlation.When three or more variables are involved in a study, then it is a problem of either partial or multiple correlation. In multiple correlation, three or more variables are studied simultaneously. But in partial correlation we consider only two variables influencing each other while the effect of other variable(s) is held constant.
Let us implement a simple python code :-
from pyspark import
SparkContext,SparkConf
from pyspark.mllib.stat import
Statistics
import numpy as np
conf=SparkConf().setAppName("test")
sc =SparkContext(conf=conf)
seriesX = sc.parallelize([1.0,
2.0, 3.0, 3.0, 5.0]) # a series
# seriesY must have the same
number of partitions and cardinality as seriesX
seriesY = sc.parallelize([11.0,
22.0, 33.0, 33.0, 555.0])
# Compute the correlation using
Pearson's method. Enter "spearman" for Spearman's method.
# If a method is not specified,
Pearson's method will be used by default.
print("Correlation is: "
+ str(Statistics.corr(seriesX, seriesY, method="pearson")))
data = sc.parallelize(
[np.array([1.0, 10.0,
100.0]), np.array([2.0, 20.0, 200.0]), np.array([5.0, 33.0, 366.0])]) # an RDD of Vectors
# calculate the correlation
matrix using Pearson's method. Use "spearman" for
Spearman's method.
# If a method is not specified,
Pearson's method will be used by default.
print(Statistics.corr(data,
method="pearson"))
No comments:
Post a Comment