Monday, January 22, 2018

An Introduction To Pandas DataFrames (Part 1)

Pandas is one of the few libraries in Python that has been widely used for the data science projects. Data Frames is a concept that has been used in the language like R. It is an efficient way of handling the data in rows and column form almost similar  like table in SQL.It is a 2 dimensional labeled data structure with columns can be of different types. We can pass index and columns label optionally.

So , Let us start our journey ,read a file and convert it into Dataframes.

Example:-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile.csv')
print(read_file)
Note :-  The data file is present in my github account  https://github.com/sangam92/python_tutorials

Output :-

    ID    Name  Age

0    1     Ram    2
1    1     Ram    2
2    1     Ram    2
3    1     Ram    2
4    2   Shyam    3
5    2   Shyam    3
6    2   Shyam    3
7    2   Shyam    3
8    2   Shyam    3
9    3  Laxman    5
10   3  Laxman    5
11   3  Laxman    5
12   3  Laxman    5
13   3  Laxman    5
14   3  Laxman    5
15   3  Laxman    5
16   3  Laxman    5

Similarly, we can read different kinds of file format  like (excel,json) .We can also  read a data and create a dataframe out of it.

Example :-
import pandas as pd
data =[2,3,4]
print(pd.DataFrame(data,index=['a','c','v'],columns=['test']))

Output :-
   test
a     2
c     3
v     4

Here , we have optionally assigned the index (a,c, v) and column (test). While creating dataframes , we can give index value or it can be automatically assigned by the pandas.

How can we select a particular column and index from a dataFrame ?

Let us read a file and try to find the value  present at some specific index .
For this we have taken a  csv file , which is available in my github.
https://github.com/sangam92/python_tutorials

Example :-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile2.csv')
print(read_file)
print(read_file.iloc[0][1])
print(read_file.loc[0:3]['Age'])

Output :-

Output of the first print statement.
   ID First_Name Last_Name  Age
0   1        Ram     Kumar   34
1   2    Praveen     Singh   27
2   3     Manish     Kumar   26
Output of the second Print statement
Ram
Output of the third  Print statement
0    34
1    27
2    26

Example :-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile2.csv')
print(read_file.loc[0:2,['First_Name','ID']])
Output :-
  First_Name  ID
0        Ram   1
1    Praveen   2
2     Manish   3

Example:-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile2.csv')
print(read_file.loc[:,['First_Name','ID','Age']])

Output :-

  First_Name  ID  Age
0        Ram   1   34
1    Praveen   2   27
2     Manish   3   26

Difference between loc ,iloc,at and iat :-

loc :- fetch the rows from the particular label from the index.in the above case , it is fetching data from the index ranges from 0 to 2 having label Age.
iloc :- fetch the rows from the particular position in the index. in the above scenario, it has fetched data from the index 0 and column index 1 which First_Name.
at :- It is one of the variant of loc but  it can fetch only singular values.
iat :- It is one of the variant of iloc but  it can fetch only singular values.

Adding rows and columns to a dataframe:-

Adding rows and columns in the dataframe works mainly on the three important function defined by pandas , loc, iloc and ix .We will have a glimpse of these three functions .
So , let us start our journey with the loc :-

Example :-
import pandas as pd
a = [2,3,4]
b = pd.DataFrame(a,columns=['test'],index=['a','c','v'])
b.loc['a']= 3
print(b)

Output :-
   test
a     3
c     3
v     4

In the above example, the index labelled as 'a' has been assigned the value of 3.so, the output has been changed and the new output has having the value of 3 as 'a'.

Now, take the example of iloc :-

Example  :-
import pandas as pd
a = [2,3,4]
b = pd.DataFrame(a,columns=['test'],index=['a','c','v'])
b.iloc[2]= 3
print(b)

Output :-
   test
a     2
c     3
v     3

Here , the index position 2  has been assigned the new value which is 3 so the older value of 4 has been replaced with the newer one 3.

The Curious case of ix :- 

ix works like both the loc and iloc . it is a situation dependent one.so, let us take the example of both these cases :-

Case 1 :- when ix works like a loc :
Example :-
import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['1','2','3'])
b.ix[2]= 3
print(b)
Output :-
   test
1    20
2    30
3     3

Here , ix checks for the index label 2 which is 40 here and assign it a value 3.Similar to what loc does.

Case 2 :- when ix works like a iloc :

import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['10','20','30'])
b.ix[1]= 3
print(b)

Output :-
    test
10    20
20     3
30    40

Here , it checks for the index position 1 which is 30 here , the ix checks for the position 1 which is 30 and assign it a value of 3.

So to add an extra row , try to use loc . let us take an example :-

Example :-

import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['10','20','30'])
b.loc[50]= 3
print(b)

Output :-
    test
10    20
20    30
30    40
50     3

Adding Column To Your Dataframe :-

Adding column in the dataframe is not a big task , again we have to take the help of loc to handle this stuff.
So ,let us add a column in the dataframe :-

Example :-

import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['10','20','30'])
b.loc[:,'test2']= 3
print(b)

Output :-

    test  test2
10    20      3
20    30      3
30    40      3I

In this case , we want to add a column "test2" to our dataframe with value 3.

We will come across more functionality of Data frames in our next tutorials.




No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...