Sunday, December 31, 2017

A tutorial on Measure of Central Tendency(Mean,Median,Mode).

As a beginner in the field of  data science , we should clear our foundation and require some basic knowledge of Statistics.

What is the Measure of Central Tendency ?
The main idea behind computing  central tendency is to find a common a value for a given set of variable.

The three common measures of central Tendency are the Arithmetic Mean , Median and Mode.

Mean:-

The arithmetic mean, or simply the mean, is more commonly known as the
average of a set of values.It is calculated by adding up all the values and dividing by the number of all the values.

The mean of a population is denoted by the Greek letter mu (σ) while the mean of a sample
is typically denoted by a bar over the variable symbol x and pronounced as x -bar.

Now ,Let us have an example :-

Suppose we have the batting score of Virat Kohli in the last 5 matches .

23,134,78,03,176

and we need to calculate the  mean of his scores.For this , we need to add the all the scores in the 5 matches and divide it by 5.


x-bar = (23 + 134 + 78 + 03 + 176)/5 = 414/5 = 82.8

Mean is considered as the easy measure of central tendency.However,mean is not the best measure for every data set.Such problems occurs when we come across some outliers.

One way to lessen the influence of outliers is by calculating a trimmed mean.As the name implies, a trimmed mean is calculated by trimming or discarding a certain percentage of the extreme values in a distribution, and calculating the mean of the remaining values.


Median:- 

The median of a data set is the middle value when the values are ranked in ascending or descending order. there are n values, the median is formally defined as the (n+1)/2th value.If n = 9, the middle value is the (9+1)/2th or fifth value.If there is an even number of values, the median is the average of the two middle values.This is formally defined as the average of the (n/2)th and ((n/2)+1)th value.
The median is a better measure of central tendency than the mean for data that is asymmetrical or contains outliers.This is because the median is based on the ranks of data points rather than their actual values: 50 percent of the data values in a distribution lie below the median, and 50 percent above the median, without regard to the actual values in question.Therefore it does not matter if the data set contains some extremely large or small values, because they will not affect the median more than less extreme values.

Mode:-

It refers to the most frequently occurring data in a given data set.It is most useful in describing a categorical data.

Example :- 2,2,3,3,4,4,4,4,4,4,5,5,5

So,the Median of the above data set is 4.

Please provide your suggestion so that i will improve my Tutorials.

Thanks :)

Thursday, December 28, 2017

Input/Output In Python

Python provides a very nice way to provide input and output .Some of the functions like input() and print() are widely used for standard input and output operations respectively.

print('This is my tutorial')
# Output: This is my tutorial

a = 3

print('The value of a is', a)
# Output: The value of a is 3


The actual syntax of the print() function is:

print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)
 
objects :- Values to be printed.
Sep :- The Separator is used between the Values.Default it is Space.
end:- Every Line ends with a default end line.
The file is the object where the values are printed and its default value is sys.stdout.


print(1,2,3,4)
# Output: 1 2 3 4

print(1,2,3,4,sep='*')
# Output: 1*2*3*4

print(1,2,3,4,sep='#',end='&')
# Output: 1#2#3#4&


 In Python , we can even format the output also. This property can be achieved
by the placeholder {}.

print('I eat {0} and {1}'.format('bread','rice'))
# Output: I love bread and rice

print('I love {1} and {0}'.format('bread','rice'))
# Output: I love rice and bread

print("This is {an} first {cn}".format(an="my",cn="Tutorial"))
# Output: This is my first Tutorial
 
 
 

Python Input : 

In Python, we have the input() function to allow the Input . The syntax for input() is
 
input([prompt]) 
 
>>> num = input('Enter a number: ')
Enter a number: 5
>>> num
'5'
 
 
we can see that the entered value 10 is a string, not a number. To convert this into a number we can use int() or float() functions. 
 
Example:- 
a= int(input())

print(a**3)
 

Saturday, December 23, 2017

List in Python

The List are the another kind of data structure present in the python. They are more like arrays but they are not array.a list is created by placing all the items (elements) inside a square bracket [ ], separated by commas.It can have any number of items and they may be of different types (integer, float, string etc.).

#1.)Initializing an empty list.

a=[]

print(a)

# The output will be  [].

#Initializing a list with some elements.

a=[2,3,4]

print(a)

# The output will be  [2,3,4].

Another way of creating list is by list comprehension .we will see this point later.

# 2.) Indexing with List.

a=[2,3,4]

print(a[2])

#output : 4

#Negative Indexing :

a=[2,3,4]

print(a[-2])

#output : 3

# 3).Slicing in List

a=[2,3,4]

print(a[:2])

#output : [2,3]

a=[2,3,4,6,9,0,5]

print(a[2:4]) 

#output : [4, 6]

a=[2,3,4,6,9,0,5]

print(a[:])

#output : [2, 3, 4, 6, 9, 0, 5]

a=[2,3,4,6,9,0,5]

print(a[:-2])

#output :[2, 3, 4, 6, 9]

#4.) Nesting in List.

#We can create a list inside a list and there is no limit on the number of nesting.

a=[2,3,4,[6,9],0,5]

print(a[3])

#output : [6, 9]

a=[2,3,4,[6,9],0,5]

print(a[2:4])

#output : [4, [6, 9]]

#5.) Chnaging and updating a List.

#With the helpof assignment operator , we can be able to change the contents of a list unlike tuples.

a=[2,3,4,[6,9],0,5]

a[2] = 7

print(a)

#output : [2, 3, 7, [6, 9], 0, 5]

a=[2,3,4,[6,9],0,5]

a[2:4] = [7,5,6]

print(a)

#output : [2, 3, 7, 5, 6, 0, 5]


#We can also use + operator to combine two lists. This is also called concatenation.


a =[2,34,4]

b =[4,5,6]

print(a + b)

#output= [2, 34, 4, 4, 5, 6]

#The * operator repeats a list for the given number of times.

b =["san"]

print(b*3)

#output= ['san', 'san', 'san']


#Append in the List.

b =[2,3,4,5,6,7,8]
b.append(34)
print(b)

#output : [2, 3, 4, 5, 6, 7, 8, 34]

#count in the List.

b =[2,3,4,5,6,7,8,4]
print(b.count(4))

#output : 2

#Insert in the List at a specified index.

b =[2,3,4,5,6,7,8,4]
b.insert(4,4)
print(b)

#output :[2, 3, 4, 5, 4, 6, 7, 8, 4]

# Adding one List into another one.

a=[2,3,4]
b =[2,3,4,5,6,7,8,4]
b.extend(a)
print(b)

#output : [2, 3, 4, 5, 6, 7, 8, 4, 2, 3, 4]

#Note : There is no method in python which allow us to add a list in the specified position.

#Delete the first occurence of an element in python.

b =[2,3,4,5,6,7,8,4]
b.remove(4)
print(b)

#Output : [2, 3, 5, 6, 7, 8, 4]

#Index of first occurence of a number.

b =[2,3,4,5,6,7,8,4]

print(b.index(4))

#output : 2

Note : if the number is not available in the List , we will get a ValueError.

b =[2,3,4,5,6,7,8,4]

print(b.index(33))

#output : ValueError: 33 is not in list

#Sorting in the List.

b =[2,3,4,5,6,7,8,4]
b.sort()
print(b)

#Output : [2, 3, 4, 4, 5, 6, 7, 8]

#reverse  the order in the List.

b =[2,3,4,5,6,7,8,4]
b.reverse()
print(b)

#output : [4, 8, 7, 6, 5, 4, 3, 2]

Friday, December 22, 2017

Tuples in Python

Tuples in Python

The data structure in python consist mainly of :- tuples ,list , dict and sets.Each of these types have some different behavior and suits some particular kind of data which we see in detail.The main concept which we need to take care while handling such data structure is the mutability and immutability. Mutable means that can be changed while immutable is just opposite of mutability.

#Tuples :- Tuples are immutable ordered sequence of elements and the individual element can be of any type.They can be represented by parenthesis and the element contained in the tuple can be of any type.

#How to initialize a tuple ?
#empty tuple
a =()

#Tuple with values.

a =(2,"san",6)

#Tuple with one value.

a= (5,)

Note :- The Tuple with single value should have a comma.

#2.) Accessing values in tuple.

#The value in tuple can be accessed via indexing and slicing. The value in the tuple starts from the zero index.

#Example :-

a =(2,"san",6,7,9)

print(a[2])

# The Output will be 6.

print(a[0])

# The Output will be 2.

print(a[2:4])

# The Output will be 6,7.

#3.) Nested Tuples.
#We can have a tuple inside a tuple but to access this is a tricky one.
nestedtuple = (2,3,(4,6,7))
print(nestedtuple[2])
# The output will be (4,6,7).

nestedtuple = (2,3,(4,6,(3,5),7))
print(nestedtuple[2][0])
#The output will be 4.

#Negative Indexing:
nestedtuple = (2,3,(4,6,(3,5),7))
print(nestedtuple[-2]#
The Output will be 3.

#Slicing a tuple:

a =(2,"san",6,7,9)
print(a[2:4])

#The output is 6,7

print(a[:3])

#The output is(2, 'san', 6).

print(a[:])

#The output is (2,"san",6,7,9)

print(a[:-1])

(2, 'san', 6, 7)

# The tuples are immutable but if the element inside tuple is mutable .it can be changed.

a =(2,"san",6,7,9)

a[2]= 4

#we get the below error;
#TypeError: 'tuple' object does not support item assignment.
#but in case of a list ,the result is different.

my_tuple = (4, 2, 3, [6, 5])


my_tuple[3][1]=9

print(my_tuple)


#The output is (4, 2, 3, [6, 9]).

#deleting a tuple.



my_tuple = (4, 2, 3, [6, 5])


del my_tuple

print(my_tuple)

# The output is NameError: name 'my_tuple' is not defined.


#concatenation of Tuples.


a =(2,3,4)

b =(4,5,6)

print(a + b)

# The output is (2, 3, 4, 4, 5, 6)

#  The count function to find the number of a particular element.
a =(2,3,4,4,5)

print(a.count(4))

# The output is 2.

# To find the index of any particular element.
a =(2,3,4,4,5)

print(a.index(3))

# The output is 1.
 

 

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...