Sumit Chawan-137160323850Student id : 18200549

Question :

Suppose a hospital tested the age and body fat data for 18 randomly selected adults with the following results.

Age 23 23 27 27 39 41 47 49 50

%Fat 9.5 26.5 7.8 17.8 31 4 25.9 27.4 31.2

Age 52 54 54 56 57 58 58 60 61

%Fat 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7

Q1) Calculate the mean, median and standard deviation of age and %fat.

MeanAge=1Total Number of Records*sum of Ages Mean(Age)=23+23+27+27+39+41+47+49+50+52+54+54+56+57+58+58+60+6118 MeanAge =83618MeanAge =46.4444Mean%Fat=1Total Number of Records*sum of %Fat

Mean(%Fat)=9.5+26.5+7.8+17.8+31+4+25.9+27.4+31.2+34.6+42.5+28.8+33.4+30.2+34.1+32.9+41.2+35.718 Mean%Fat =494.518Mean%Fat =27.4722Median : Median is the middle value for the given list of data.

Note : The Data must be ordered before making the calculations for median.

Here the as the number of records are even , median can be calculated as the average of the 9th and 10th term of the data.

MedianAge=9th Term+10th Term2=50+522=51Median%Fat=9th Term+10th Term2=30.2+312=30.6Standard Deviation

Standard Deviation defines how the measurements of the group spread about the mean value of the data set.

Std(age) = ( ((23-46.4)2 + (23-46.4)2 + (27-46.4)2 + (27-46.4)2 + (39-46.4)2 + (41-46.4)2 + (47-46.4)2 + (49-46.4)2 + (50-46.4)2+ (52-46.4)2 + (54-46.4)2 + (54-46.4)2 + (56-46.4)2 + (57-46.4)2 + (58-46.4)2 + (58-46.4)2 + (60-46.4)2 + (61-46.4)2 )/ 18)1/2 = 12.94

Standard Deviation of % Fat

Std(%fat) ( ((9.5-27.47)2 + (26.5-27.47)2 + (7.8-27.47)2 + (17.8-27.47)2 + (31.4-27.47)2 + (25.9-27.47)2 + (27.4-27.47)2 + (27.2-27.47)2 + (31.2-27.47)2+ (34.6-27.47)2 + (42.5-27.47)2 + (27.47-27.47)2 + (33.4-27.47)2 + (30.2-27.47)2 + (34.1-27.47)2 + (39.9-27.47)2 + (41.2-27.47)2 + (35.7-27.47)2 )/ 18)1/2 = 10.63(Approx)

Q2) Box Plot

Box plot Summary for Age :

Q1 = 36 ,Q3 = 57.25, Mean =46.44, Median = 51 (As Calculated in Q1),Min value = 23 ,Max value= 61.

Box plot Summary for %Fat :

Q1 = 23.875 ,Q3 = 34.225, Mean =27.47, Median = 30.6 (As Calculated in Q1),Min value = 9.5 ,Max value= 42.5.

Q3)Scatter Plot based on Age and %Fat.

Q-Q plot based on Age and %Fat

Code Snippet for Q-Q plot in R-Programming

Q-Q Plot

Q4)Normalize the two variables based on Z-Score Normalization.

Z-Score Normalization : Z-score Normalization converts all the values in dataset to a common scale with an average of Zero and standard deviation of 1.

The Z- score is calculated as : z=x-??Where ?=Mean of x and ?=standard deviation.Example : For x = 23 , Mean of age = 46.44 ,Std. of Age = 12.94

Z=23-46.412.94= -1.80835

Similarly we can calculate the Z-transform for all values for the two variables .This result is populated in table below.

Q5) Calculate the Co-relation co-efficient.

The Pearson co-efficient measures the strength of the linear relationship between two variables. It is calculated as :

r=xi-xmean*(yi-ymean)xi-xmean2*(yi-ymean)2Where r = co-relation co-efficient

xmean= mean of x-variables

ymean = mean of y-variables

xmean = 46.4 and ymean = 27.47 (Calculated in Question 1)

(xi-xmean)² * (yi-ymean)²= 6044262.007 Square Root of ((xi-xmean)² * (yi-ymean)² )= 2458.508086

We also have the value of ximean * yimean from the table = 2376.724

Hence ,r= 0.96

Q6)Are “age” and “%Fat” positively or negatively co-related ?

Ans:

1) Yes the two variables “Age” and “%Fat” are positively co-related .

2)The Pearson co-relation co-efficient of 0.96 indicates a strong relationship between the two variables .i.e. As the age increases the %Fat will also increase and vise versa.

References :

1) http://onlinestatbook.com/2/describing_bivariate_data/pearson.html2) http://howto.commetrics.com/methodology/statistics/normalization/3) https://stat.ethz.ch/R-manual/R-devel/library/stats/html/qqnorm.html