Pandas is one of the few libraries in Python that has been widely used for the data science projects. Data Frames is a concept that has been used in the language like R. It is an efficient way of handling the data in rows and column form almost similar like table in SQL.It is a 2 dimensional labeled data structure with columns can be of different types. We can pass index and columns label optionally.
So , Let us start our journey ,read a file and convert it into Dataframes.
Example:-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile.csv')
print(read_file)
Note :- The data file is present in my github account
https://github.com/sangam92/python_tutorials
Output :-
ID Name Age
0 1 Ram 2
1 1 Ram 2
2 1 Ram 2
3 1 Ram 2
4 2 Shyam 3
5 2 Shyam 3
6 2 Shyam 3
7 2 Shyam 3
8 2 Shyam 3
9 3 Laxman 5
10 3 Laxman 5
11 3 Laxman 5
12 3 Laxman 5
13 3 Laxman 5
14 3 Laxman 5
15 3 Laxman 5
16 3 Laxman 5
Similarly, we can read different kinds of file format like (excel,json) .We can also read a data and create a dataframe out of it.
Example :-
import pandas as pd
data =[2,3,4]
print(pd.DataFrame(data,index=['a','c','v'],columns=['test']))
Output :-
test
a 2
c 3
v 4
Here , we have optionally assigned the index (a,c, v) and column (test). While creating dataframes , we can give index value or it can be automatically assigned by the pandas.
How can we select a particular column and index from a dataFrame ?
Let us read a file and try to find the value present at some specific index .
For this we have taken a csv file , which is available in my github.
https://github.com/sangam92/python_tutorials
Example :-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile2.csv')
print(read_file)
print(read_file.iloc[0][1])
print(read_file.loc[0:3]['Age'])
Output :-
Output of the first print statement.
ID First_Name Last_Name Age
0 1 Ram Kumar 34
1 2 Praveen Singh 27
2 3 Manish Kumar 26
Output of the second Print statement
Ram
Output of the third Print statement
0 34
1 27
2 26
Example :-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile2.csv')
print(read_file.loc[0:2,['First_Name','ID']])
Output :-
First_Name ID
0 Ram 1
1 Praveen 2
2 Manish 3
Example:-
import pandas as pd
read_file =pd.read_csv('C:/Users/sangam/Desktop/python_tutorial/Datafile2.csv')
print(read_file.loc[:,['First_Name','ID','Age']])
Output :-
First_Name ID Age
0 Ram 1 34
1 Praveen 2 27
2 Manish 3 26
Difference between loc ,iloc,at and iat :-
loc :- fetch the rows from the particular label from the index.in the above case , it is fetching data from the index ranges from 0 to 2 having label Age.
iloc :- fetch the rows from the particular position in the index. in the above scenario, it has fetched data from the index 0 and column index 1 which First_Name.
at :- It is one of the variant of loc but it can fetch only singular values.
iat :- It is one of the variant of iloc but it can fetch only singular values.
Adding rows and columns to a dataframe:-
Adding rows and columns in the dataframe works mainly on the three important function defined by pandas , loc, iloc and ix .We will have a glimpse of these three functions .
So , let us start our journey with the loc :-
Example :-
import pandas as pd
a = [2,3,4]
b = pd.DataFrame(a,columns=['test'],index=['a','c','v'])
b.loc['a']= 3
print(b)
Output :-
test
a 3
c 3
v 4
In the above example, the index labelled as 'a' has been assigned the value of 3.so, the output has been changed and the new output has having the value of 3 as 'a'.
Now, take the example of iloc :-
Example :-
import pandas as pd
a = [2,3,4]
b = pd.DataFrame(a,columns=['test'],index=['a','c','v'])
b.iloc[2]= 3
print(b)
Output :-
test
a 2
c 3
v 3
Here , the index position 2 has been assigned the new value which is 3 so the older value of 4 has been replaced with the newer one 3.
The Curious case of ix :-
ix works like both the loc and iloc . it is a situation dependent one.so, let us take the example of both these cases :-
Case 1 :- when ix works like a loc :
Example :-
import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['1','2','3'])
b.ix[2]= 3
print(b)
Output :-
test
1 20
2 30
3 3
Here , ix checks for the index label 2 which is 40 here and assign it a value 3.Similar to what loc does.
Case 2 :- when ix works like a iloc :
import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['10','20','30'])
b.ix[1]= 3
print(b)
Output :-
test
10 20
20 3
30 40
Here , it checks for the index position 1 which is 30 here , the ix checks for the position 1 which is 30 and assign it a value of 3.
So to add an extra row , try to use loc . let us take an example :-
Example :-
import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['10','20','30'])
b.loc[50]= 3
print(b)
Output :-
test
10 20
20 30
30 40
50 3
Adding Column To Your Dataframe :-
Adding column in the dataframe is not a big task , again we have to take the help of loc to handle this stuff.
So ,let us add a column in the dataframe :-
Example :-
import pandas as pd
a = [20,30,40]
b = pd.DataFrame(a,columns=['test'],index=['10','20','30'])
b.loc[:,'test2']= 3
print(b)
Output :-
test test2
10 20 3
20 30 3
30 40 3I
In this case , we want to add a column "test2" to our dataframe with value 3.
We will come across more functionality of Data frames in our next tutorials.