Tuesday, January 30, 2018

Euclidean Distance using Python


As per Wikipedia , the Euclidean distance or Euclidean metric is the "ordinary" straight-line distance between two points in Euclidean space.

The euclidean distance between the point A and B will be the line segment joining the point AB.

In a simple 2 dimensional space  , euclidean distance can be calculated by the below formula:

Let P is a point  with co-ordinates as (p1,p2) and Q is another point with co-ordinates as (q1,q2).
Then , the Euclidean Distance PQ,.



Similarly , for a point in 3 dimensional space , with coordinates as P(p1,p2,p3) and Q(q1,q2,q3)
 the distance is,


Let us take a simple example to understand it :-

(Euclidean) distance between points (2, -1) and (-2, 2) is found like this:-



Python implementation of Euclidean Distnace using scipy:-

from scipy.spatial import distance
a = (1,2,3)
b = (4,5,6)
dst = distance.euclidean(a,b)
print(dst)

#output :-5.19615242271

Further reading :-

https://en.wikipedia.org/wiki/Euclidean_distance

https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.euclidean.html


Monday, January 22, 2018

An Introduction To Pandas DataFrames (Part 1)

Pandas is one of the few libraries in Python that has been widely used for the data science projects. Data Frames is a concept that has been used in the language like R. It is an efficient way of handling the data in rows and column form almost similar  like table in SQL.It is a 2 dimensional labeled data structure with columns can be of different types. We can pass index and columns label optionally.

So , Let us start our journey ,read a file and convert it into Dataframes.

Example:-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile.csv')
print(read_file)
Note :-  The data file is present in my github account  https://github.com/sangam92/python_tutorials

Output :-

    ID    Name  Age

0    1     Ram    2
1    1     Ram    2
2    1     Ram    2
3    1     Ram    2
4    2   Shyam    3
5    2   Shyam    3
6    2   Shyam    3
7    2   Shyam    3
8    2   Shyam    3
9    3  Laxman    5
10   3  Laxman    5
11   3  Laxman    5
12   3  Laxman    5
13   3  Laxman    5
14   3  Laxman    5
15   3  Laxman    5
16   3  Laxman    5

Similarly, we can read different kinds of file format  like (excel,json) .We can also  read a data and create a dataframe out of it.

Example :-
import pandas as pd
data =[2,3,4]
print(pd.DataFrame(data,index=['a','c','v'],columns=['test']))

Output :-
   test
a     2
c     3
v     4

Here , we have optionally assigned the index (a,c, v) and column (test). While creating dataframes , we can give index value or it can be automatically assigned by the pandas.

How can we select a particular column and index from a dataFrame ?

Let us read a file and try to find the value  present at some specific index .
For this we have taken a  csv file , which is available in my github.
https://github.com/sangam92/python_tutorials

Example :-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile2.csv')
print(read_file)
print(read_file.iloc[0][1])
print(read_file.loc[0:3]['Age'])

Output :-

Output of the first print statement.
   ID First_Name Last_Name  Age
0   1        Ram     Kumar   34
1   2    Praveen     Singh   27
2   3     Manish     Kumar   26
Output of the second Print statement
Ram
Output of the third  Print statement
0    34
1    27
2    26

Example :-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile2.csv')
print(read_file.loc[0:2,['First_Name','ID']])
Output :-
  First_Name  ID
0        Ram   1
1    Praveen   2
2     Manish   3

Example:-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile2.csv')
print(read_file.loc[:,['First_Name','ID','Age']])

Output :-

  First_Name  ID  Age
0        Ram   1   34
1    Praveen   2   27
2     Manish   3   26

Difference between loc ,iloc,at and iat :-

loc :- fetch the rows from the particular label from the index.in the above case , it is fetching data from the index ranges from 0 to 2 having label Age.
iloc :- fetch the rows from the particular position in the index. in the above scenario, it has fetched data from the index 0 and column index 1 which First_Name.
at :- It is one of the variant of loc but  it can fetch only singular values.
iat :- It is one of the variant of iloc but  it can fetch only singular values.

Adding rows and columns to a dataframe:-

Adding rows and columns in the dataframe works mainly on the three important function defined by pandas , loc, iloc and ix .We will have a glimpse of these three functions .
So , let us start our journey with the loc :-

Example :-
import pandas as pd
a = [2,3,4]
b = pd.DataFrame(a,columns=['test'],index=['a','c','v'])
b.loc['a']= 3
print(b)

Output :-
   test
a     3
c     3
v     4

In the above example, the index labelled as 'a' has been assigned the value of 3.so, the output has been changed and the new output has having the value of 3 as 'a'.

Now, take the example of iloc :-

Example  :-
import pandas as pd
a = [2,3,4]
b = pd.DataFrame(a,columns=['test'],index=['a','c','v'])
b.iloc[2]= 3
print(b)

Output :-
   test
a     2
c     3
v     3

Here , the index position 2  has been assigned the new value which is 3 so the older value of 4 has been replaced with the newer one 3.

The Curious case of ix :- 

ix works like both the loc and iloc . it is a situation dependent one.so, let us take the example of both these cases :-

Case 1 :- when ix works like a loc :
Example :-
import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['1','2','3'])
b.ix[2]= 3
print(b)
Output :-
   test
1    20
2    30
3     3

Here , ix checks for the index label 2 which is 40 here and assign it a value 3.Similar to what loc does.

Case 2 :- when ix works like a iloc :

import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['10','20','30'])
b.ix[1]= 3
print(b)

Output :-
    test
10    20
20     3
30    40

Here , it checks for the index position 1 which is 30 here , the ix checks for the position 1 which is 30 and assign it a value of 3.

So to add an extra row , try to use loc . let us take an example :-

Example :-

import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['10','20','30'])
b.loc[50]= 3
print(b)

Output :-
    test
10    20
20    30
30    40
50     3

Adding Column To Your Dataframe :-

Adding column in the dataframe is not a big task , again we have to take the help of loc to handle this stuff.
So ,let us add a column in the dataframe :-

Example :-

import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['10','20','30'])
b.loc[:,'test2']= 3
print(b)

Output :-

    test  test2
10    20      3
20    30      3
30    40      3I

In this case , we want to add a column "test2" to our dataframe with value 3.

We will come across more functionality of Data frames in our next tutorials.




Monday, January 15, 2018

Classification versus Regression

“Machine Learning” is the new devil in the market. Every day we used to listen some fancy words and algorithms .We have heard that people are able to classify the animal images and able to predict the stock prices.
We need to understand few jargon related to ML.
 So , let us start our discussion with the topic Classification Vs Regression. 

Classification :-

The main goal of classification is to predict the target class (Yes/ No). .Suppose,  we want to know whether a student fails or pass. It is a polarity based algorithm.
Examples :- To find whether a mail is spam or not.
                     Whether an image belongs to a cat or a dog.

Types of Classification:-

Binary Classification :- When we have only two target class labels to predict.
Example :- Pass or Fail 

Multi-Class Classification :- When there are more than two class label to predict.
Example :- image classification problems where there are more than thousands classes(cat, dog, fish, car,…). 

Algorithms For Classification:-
  •   KNN(K Nearest Neighbor)
  •   SVC(Support Vector Classifier)
  •  Decision Tree etc.

Regression:- In regression problems, we are trying to predict continuous valued output,.
Given a stock and to predict its value in next few months.

Algorithms For Regression :-
  •  Linear Regression
  •  SVR( Support Vector Regression) etc
Whenever we find machine learning problem first define whether we are dealing with a classification or regression problem and we can get to know that by analyzing the target variable (Y).






















































Thursday, January 11, 2018

File Handling Operations in Python


File handling operations in Python.
RAM (Random Access Memory) is a volatile memory and will wipe out all the information once the system is shut down.

In such a scenario, the file is used to store the information permanently on the disk (hard disk) and will be available in case of system shutdown.

The file handling in python consists  of three processes:-
1.) Opening the file.
2.) Reading/Writing the file.
3.) Closing the file.

The Closing process is essential as it will freed the resources allocated to it.
So, let us begin our journey on the topic called File.

Opening a file in Python.

openfile = open("irisdata.txt")
Modes
Description
r
Open text file for reading.  The stream is positioned at the
beginning of the file.(default)
w
Truncate file to zero length or create text file for writing 
(in case the file is not present). The stream is positioned 
at the beginning of the file.

a
Open  the file for appending. Creates the file in case 
+
Open a file for both reading and writing


Courtesy :- Stackoverflow.com

Closing  the file in python:-

After reading the file, the best practice is to close the file. Closing the file will release the resources attached with it.

openfile = open("irisdata.txt")
openfile.close()

However ,this method is not safe as there is a chance that there may be some exception.
So , we will have some other way to handle it.

try:
  openfile = open("irisdata.txt")
finally:
   openfile.close()      
                                                                                                         
Python has its own way to handle such file closing operation without using the close() function with the help of the with statement. Let us check this with an example :-

with open('test.txt','w') as f :
    f.write('hi,this is me')

Reading the file in Python :
There are various methods  which is available for reading the file in python. We can read the file line by line using the for loop.

Example:-
openfile = open('testdata.txt')
for i in openfile:
    print(i) 
                                   
#Output :-
This is a test.
We are learning.
python is a good programming language.

The second way is to learn via read() method .with the read() method , we can define the size means if we read(30) ,it will read first 30 data.

Example :-
openfile = open('testdata.txt')
openfile.read(6)

#Output :-
This I  
It means that the read will read first 6 data (This + Space + i).
However, if we do not mention the size ,then it will read the whole data.

Example :-
openfile = open('testdata.txt')
openfile.read()

#Output :-
This is a test.
We are learning.
python is a good programming language.

Now , we can have two more interesting functions which are associated  with the file that are tell() and seek().
Tell() ---provides you the current cursor location while seek() will change the current cursor location.

Example :-
openfile = open('testdata.txt')
print(openfile.read(15))
print(openfile.tell())
openfile.seek(0)
print(openfile.read(4))

#Output :-
This is a test.
15
This

In the above example, the  first 15 charcters are read . and to find the cursor position we can use the tell() method which tell us that the current cursor loication is at 15th position  seek(0) will bring the cursor back to  position 0.

readline()  is to read individual lines in a file. This method reads a file till the newline, including the newline character.

Example :-
openfile = open('testdata.txt')
print(openfile.readline())

#Output :
This is a test.

readlines() method returns a list of remaining lines of the entire file. All these reading method return empty values when end of file (EOF) is reached.

Example :-
openfile = open('testdata.txt')
print(openfile.readlines())

#output:-
['This is a test.\n', 'We are learning.\n', 'python is a good programming language.']


Friday, January 5, 2018

A Tutorial On Dictionary(Python)

Dictionary are the unordered collection of items.Dictionary has a key:value pair.

In dictionary , an item has a key and its corresponding value.They are enclosed in curly braces.

Dictionary are mutable.

#Initialization of Dictionary:

dict1={} #empty dictionary

dict2={1:"test",2:"rest"} #dictionary with integer keys

dict3={1:"test","name":"rest"} #dictionary with mixed type

dict4= dict({1:"test",2:"rest"}) # dictionary with dict built in function

dict5 = dict([(1,'apple'), (2,'ball')]) #from sequence having each item as a pair


#Accessing value in dictionary:

Unlike list and tuples , dictionary are accessed via key rather than index.

dict2={1:"test",2:"rest"}

print(dict2[2])

#output : rest

dict2={1:"test",2:"rest"}

print(dict2.get(2))

#output : rest

Note : while accessing a key through get() is similar to what we have through square bracket  but the difference is when the key is not present.

In such case ,it will return None rather than error.

Let us understand this with an example :

dict2={1:"test",2:"rest"}

print(dict2[3])

#output:    print(dict2[3])

KeyError: 3

dict2={1:"test",2:"rest"}

print(dict2.get(3))

#output: None


#Adding or Changing an element in dictionary:

Elements can be added or changed using the assignment operator.

if the element is not present ,it will get added otherwise it get changed.

#updated :
dict2={1:"test",2:"rest"}

dict2[1]= "set"

print(dict2)

#output:{1: 'set', 2: 'rest'}

#Changed :

dict2={1:"test",2:"rest"}

dict2[3]= "set"

print(dict2)

#output : {1: 'test', 2: 'rest', 3: 'set'}


# Removing or deleting item from a dictionary:

There are many ways in which an element can be removed or deleted from a dictionary.

1.)Pop()
2.)popitem()
3.)del()
4.)clear()

Now , Let us go one by one and check each of them with some examples.


#pop() :- In order to remove a particular value from dictionary ,we can go for pop(). It will take key as input and delete the corresponding value.

dict2={1:"test",2:"rest"}

dict2.pop(2)

print(dict2)

#output :-  {1: 'test'}

#popitem() :-  The popitem() function removes an item from the dictionary arbitarily.

dict2={1:"test",2:"rest"}

dict2.popitem()

print(dict2)

#output :- {2: 'rest'}


#del :-  It removes an individual item and also the whole dictionary.

dict2={1:"test",2:"rest"}

del dict2[2]

print(dict2)

#Output :- {1: 'test'}

dict2={1:"test",2:"rest"}

del dict2

print(dict2)

#Output :- NameError: name 'dict2' is not defined

#clear ;- All the items can be removed at once using the clear() method.

dict2={1:"test",2:"rest"}


dict2.clear()

print(dict2)

#Output :- {}

#Dictionary membership test :-

We can test whether a key is available in dictionary or not using the in  keyword.

dict2={1:"test",2:"rest"}

print(1 in dict2)

#output :- True

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...