Unit 2 Summarizing and Describing the Numerical Data

Measures of central tendency

An “average” is a single value which is the representative of the entire distribution and it lies between the two extreme observations (i.e. the largest and smallest observations) of the distribution and give us an idea about the concentration of the values in the central part of the distribution. The measures of such single value is known as the ‘Measures of Central Tendency” or “measures of location” Thus, Measures of central tendency are used to describe the middle or Centre of data set.

Various Measures of Central Tendency

The following are the measures of central tendency or measures of location:

1. Arithmetic mean

(i) Simple Arithmetic Mean(̅X)

(ii) Weighted Arithmetic Mean (̅X_w)

2. Median (Md)

3. Mode(Mo)

4. Geometric mean(G. M. ) and

5. Harmonic mean(H. M. )

Note: Geometric mean(G. M. ) and Harmonic mean(H. M. ) are beyond of our syllabus.

Arithmetic Mean (A.M.)

The arithmetic mean is the most popular and widely used measure of central tendency. It is also called simply ‘the mean’ or ‘the average’. It is also considered as an ideal measure of central tendency or the best-known measures of central tendency because it satisfies almost all requisites of ideal measure of central tendency given by Prof. Yule.

Arithmetic mean may either be

(i) Simple arithmetic mean or (ii) Weighted arithmetic mean

Simple arithmetic mean

In case of simple arithmetic mean, all the items in the distribution are equally important. It is denoted by ̅X (X bar)

Calculation of Arithmetic mean:

Individual Series

(i) Direct method

Where ∑ X = the sum of observations n = the number of observations.

(ii) Short-cut method or assumed mean method or change of origin method

Where a = assumed mean or assumed value d = X – a = Deviations of the items from the assumed mean. n = no. of observations.

There is no any hard and fast rule for the selection of 'a' but better to take between highest and lowest values.

Discrete Series

(i) Direct method

Where N = Σf = Total frequency

(ii) Short-cut method or assumed mean method or coding method or change of origin method

Where a = Assumed mean d = X - a = Deviation of the items from the assumed mean

N = ∑ f = Total frequency

Continuous Series (Grouped Data)

(ii) Direct method

Where X = midpoint of the class interval N = Σf = Total frequency

mid. value (X) =

(ii) Short-cut method or assumed mean method or coding method or change of origin method

Where a = Assumed mean

d = X - a = Deviation of the items from the assumed mean

N = ∑ f = Total frequency

(iii) Step-deviation method or change of origin and scale method or coding method

Where, d' =

X = Mid.value a = Assumed mean

h = Class size or class width

Note:

(i) For unequal class size, h is taken as common factor

(ii) For mean, it is not necessary to be equal class size and exclusive class (i.e. adjusted class)

Weighted Arithmetic Mean

While calculating simple arithmetic mean, it is based on the assumption that all the items in the distribution are equally important. But in practice, this may not be so. The relative importance of some items in a distribution are more important than others. So, when the weights are assigned for individual items with their relative importance or priorities (or weights), then the arithmetic mean calculated with respect to their priorities is called weighted arithmetic mean.

Then, weighted arithmetic mean is given by

Where,

X = Value of variable in rate (per) w = given weight or proportion or frequency

Combined mean or Mean of combined Series

For two groups or two series

Combined mean (

For three groups or three series

Combined mean (

where

n₁ =Size of first group n₂= Size of second group

n₃ = Size of third group

̅X₁ = Mean of first group

̅X₂= Mean of second group

̅X₃= Mean of third group

Corrected mean

Correct mean (

Where,

Incorrect ∑ X = n × ̅X = n incorrect mean

Correct ∑ X = Incorrect ∑ X - Incorrect items + correct items

Median or Positional average (Md)

The variate value which divides the total number of observations into two equal parts is called the median. It is denoted by Md.

— Md is suitable measure of central tendency (or average) for the qualitative characteristics such as knowledge, intelligent, beauty, honesty, talent, good, bad, defective, etc.

— It is also more appropriate (or suitable) average (or measure of central tendency) for the open ended classified data.

Note:

(i)The classes should be exclusive type

(ii)For calculation of M_d. It is not necessary to be equal class size.

Calculation of median depends upon the given series

For Individual series

At first, arranging the given set of observations (data) in ascending order of magnitude.

Median (Md.) = Value of item

Where n = no. of observations

In discrete series:

— At first, arrange the given data in ascending order of their magnitudes.

— Obtain the less than cumulative frequency (c.f.)

Median (Md.) = Value of item

Where N = ∑ f = Total frequency

For continuous series

— Prepare the less than cumulative frequency distribution.

— Find

— See cumulative frequency equal to or just greater than the value of and note the corresponding frequency.

— The corresponding class contains the median value and is called the median class.

Md = L +

Where,

N = ∑ f = Total frequency L = Lower limit of median class f = frequency of median class h = with of median class or class size of median class.

c.f. = Less than cumulative frequency preceding the median class

Note:

(i)The classes should be exclusive type

(ii)For calculation of M_d. It is not necessary to be equal class size.

Mode or Modal value or Most repeated value or most usual value (Mo)

Mode is that variate value which repeats maximum number of times.

It is used to find the most common size of pen drive, size of shoes, size of T-shirts and other readymade garments.

Calculation of mode

The mode for various distributions is given below.

For Individual series:

Mode = Value of variable X which repeats maximum number of times

For discrete series

Case I: If the distribution is regular and unimodal (i.e. only one maximum frequency).

Mode = Value of variable X corresponding to maximum frequency

Case II: When the distribution is regular and bimodal or multimodal, the mode can be determined by using empirical relation

M_o = 3Md - 2̅X

Case III: When the distribution is irregular, mode is determined by using grouping method.

For continuous series (Grouped frequency distribution)

Case I: If the distribution is regular and unimodal, mode is calculated by using the following formula.

Where

f₁ = maximum frequency or modal class frequency. f₀ = preceding frequency of modal class. f₂ = following frequency of modal class.

L = lower limit of modal class.

h = class size or width of modal class.

Case II: When the distribution is regular and bimodal or multimodal, the mode can be determined by using empirical relation

M_o = 3Md - 2̅X

Case III: When the distribution is irregular, mode is determined by using grouping method.

Note: Case III is beyond of our syllabus.

Note: For Mode

(i) It is necessary to be equal class size as well as exclusive class intervals.

Note: To construct class intervals if mid values are given

If the mid. values of the distribution are given. So, at first we need to construct the class intervals. Class size (h) = difference between two successive mid-values

= . . .

Subtract from the first middle value for lower limit of first class interval and add to the same mid value for the upper limit of first class interval and so on. Other class intervals are constructed in the similar fashion.

The Partition Values

The values which divide the total number of observations into a number of equal parts are called partition values. Thus, median may also be regarded as a particular partition value because it divides the given data into two equal parts.

Depending upon the equal number of parts, the important amongst these partition values are

— Quartiles

— Deciles

— percentiles

Note: (i) For all series, for partition values, at first arranging the given data in ascending order.

(ii) For all partition values, no need to be equal class size but it is necessary to be exclusive class.

(iii) For all partition values, at first arrange the given data in ascending order of magnitude.

(iv) 𝐐_𝟏 = 𝐏_𝟐𝟓 , Md = 𝐐_𝟐 = 𝐃_𝟓= 𝐏_𝟓𝟎, 𝐏_𝟕𝟓= 𝐐_𝟑

Quartiles

Individual series

After arranging the given data in ascending order of magnitudes,

Quartiles can be obtained by the following formula

Q_i = value of item.

Where, i = 1, 2, 3 n = No. of observations

Discrete series

Q_i = value of item.

N = ∑ f = Total frequency

i = 1, 2 & 3

Continuous series or Grouped frequency distribution

where, i = 1, 2, 3

= the size for i^th quartile’s class

L = lower limit of i^th quartile's class f = frequency of i^th quartile's class h = class size or width of ith quartile's class

c.f. = preceding c.f. of i^thquartile's class.

Deciles:

Individual series

Deciles:

After arranging the given data in ascending order of magnitudes,

Individual series

D_j = value of item.

Where j = 1, 2, 3. . ., 9 n = No. of observations

Discrete series

D_j = value of item. where, N = ∑ f = Total frequency i= 1, 2 ,3 . . . . . ,9

Continuous series or Grouped frequency distribution

Where, j =1, 2,3, . . . . .,9

Where,

= the size for j^th decile’s class

L = lower limit of j^th decile’s class

f = frequency of j^th decile’s class h = class size or width of jth decile's class

c.f. = preceding c.f. of j^thdecile’s class.

Percentiles:

The variate values which divide the total number of observations into 100 equal parts are called percentiles.

Case I: To find the highest value (maximum value) of % failed, lowest earner, poorest, flattest, shortest etc.

i.e. The highest income of the poorest 40% of the people is given by 40^th percentile i.e. P₄₀.

Case II: To find the limits (Range) of middle %

i.e. The limits of income of middle 50% of families is given by the 25^th and 75^th percentiles. i.e. P₂₅ and P₇₅.

Case III: To find the lowest value (or minimum value) of % top, pass, richest, highest earner, longest, tallest etc.

i.e. The lowest income of the richest 40% of the people is given by 60^th percentile i.e. P₆₀.

Percentiles:

Individual series

Percentiles:

After arranging the given data in ascending order of magnitudes,

Individual series

P_k = value of item.

Where k = 1, 2, 3. . . . . . . ,99 n = No. of observations Discrete series

P_k = value of item.

where, N = ∑ f = Total frequency k= 1, 2 ,3 . . . . . . . . ,99

Continuous series or Grouped frequency distribution

Where, k =1, 2,3, . . . . . . .,99

Where,

= the size for k^th percentile’s class

L = lower limit of k^th percentile’s class f = frequency of k^th percentile’s class h = class size or width of kth percentile's class

c.f. = preceding c.f. of k^thpercentile’s class.

Note: 𝐐_𝟏 = 𝐏_𝟐𝟓 , Md = 𝐐_𝟐 = = _𝟓𝟎, 𝐏_𝟕𝟓= 𝐐_𝟑

Measure of Variation (Measures of Dispersion)

The variability or the scatterness of the items from the central values is called dispersion and its measure is the measure of dispersion or the measure of variation.

Thus, measures of dispersion are statistical tools i.e. descriptive statistical measures which are used to measure the variation or spread or scatterness or deviation of data from the central value. So, it gives an idea of homogeneity or heterogeneity of the distribution.

Measures of Dispersion

The various measures of dispersion are as follows.

1. Range

2. Quartile deviation or Semi-interquartile range

3. Mean deviation or Average deviation.

4. Standard deviation

5. Lorenz curve

6. Ginni’s coefficient

Note: But Mean deviation or Average deviation, Lorenz curve and Ginni’s coefficient are beyond of our syllabus.

Range

Range is the simplest of all the measures of dispersion. It is defined as the difference between largest (maximum) value and smallest (minimum) value for the given observations of the distribution.

For all series

Range (R) = L – S

Where, L = Largest item or observation

S = Smallest item or observation

Its relative measure of dispersion is known as coefficient of range and the coefficient of range is given by

Coefficient of range =

Quartile Deviation or Semi-interquartile Range (Q.D.)

Quartile deviation is a measure of dispersion based on the upper quartile and lower quartile Q₁. The difference between the upper quartile Q₃ and lower quartile Q₁ is known as inter-quartile range.

Inter-quartile range = Q₃-Q₁

The half of the inter-quartile range is called semi-interquartile range, which is also known as quartile deviation.

Quartile Deviation (Q.D) =

Coefficient of Q.D. =

Note:

—	Quartile deviation (Q.D.) is the most suitable or appropriate measure of dispersion for open end classes.
—	Less the coefficient of Q.D. implies more will be the uniformity or less will be the variability.
—	Greater the coefficient of Q.D. implies less will be the uniformity or greater will be the

variability.

For individual series

Quartile deviation (Q.D.) =

Where , Q₁ = value of item.

Q₃ = value of item n = No. of observations

For discrete series

Quartile deviation (Q.D.) =

Where, Q₁ = value of item.

Q₃ = value of item

N = ∑ f = Total frequency

For continuous series

Quartile deviation (Q.D.) =

where

3 N

N = ∑ f = Total frequency

Its relative measure is known as coefficient of quartile deviation and is given by

Coefficient of quartile deviation =

Standard Deviation:

Standard deviation is defined as “the positive square root of the arithmetic mean of the square of the deviations of the given set of observations from their arithmetic mean.” It is usually denoted by Greek alphabet (sigma).

Standard deviation is said to be the best measure of dispersion (or ideal measure of dispersion) as it satisfies almost all the requisites (or characteristics) of an ideal or a good measure of dispersion.

For Individual Series

(i) S.D. (- direct method

(ii) S.D. (- Short cut method

Where, d = X – a

a = Assumed mean n = number of observations

For Discrete Series

(i) S.D. (- Direct method

(ii) S.D. (- Short cut method

Where, d = X – a

a = Assumed mean

N = ∑ f = Total frequency

For Continuous Series

(i) S.D. - direct method

(ii) S.D. (- Short cut method

(iii) S.D. (- Step deviation method

Where, d = X – a

X = mid value

a = Assumed mean

N = ∑ f = Total frequency

h = class size or width of class size

Note: But for unequal class size h is taken as common factor

Variance

The square of the standard deviation is known as variance. It is denoted by 𝜎² and given by σ² = V(X) Where V(X) = variance of variable X

⟹ σ = √V(X)

Coefficient of Variation (C.V.)

100 times the coefficient of standard deviation is called coefficient of variation. In other words, the coefficient of standard deviation expressed in percentage is known as coefficient of variation. Symbolically,

C.V. = × 100%

It is a relative measure of dispersion, so it is independent of units of measurement. It is always expressed in percentage. Therefore, C.V. can betterly be used to compare two or more than two distributions with regard to their variability, consistency, uniformity, homogeneity, equitability, stability etc.

Coefficient of variation (C.V.) is applicable for the comparison of variability of two or more than two distributions (series) as follows

Less C.V. is considered as	More C.V. is considered as
More consistent	Less consistent
More homogeneous	Less homogeneous
More uniform	Less uniform
More stable	Less stable
More representative to mean	Less representative to mean
More equitable	Less equitable
Less variable	More variable
Less disparity	More disparity

Sample standard deviation (s):

A standard deviation which is based on sample observations is called sample standard deviation. It is denoted by ‘s’.

(ii) s =

(iii) s =

Sample coefficient of variation (C.V.) =

Sample variance (𝐬^𝟐) :

The square of sample standard deviation is called sample variance. It is denoted by s².

Combined Standard deviation:

For two groups (two series)

Where, d₁= ̅X₁- ̅X₁₂ d₂= ̅X₂- ̅X₁₂

For three groups (Three series)

Combined standard deviation is

Where, d₁= ̅X₁- ̅₁₂₃ d₂= ̅X₂- ̅₁₂₃

d₃= ̅X₂- ̅X₁₂₃

Five-Number Summary

The five-number summary provides the five descriptive measures of the given data set. So, it consists of the smallest value (X _smallest), the first quartile or lower quartile (Q₁), Median (Md or Q₂), third quartile or upper quartile (Q₃) and the largest value (X _largest). Therefore, the five number summary is

(Xsmallest , Q1 , Median , Q3 , Xlargest)

The Box -and –Whisker plot

A five-number summary can be represented in a diagram known as a box and whisker plot. Therefore, a box- and –whisker plot is a graphical representation of the data based on the five number summary. That is, smallest value, Q_1,Md, Q₃ and largest value. It is the graphical method of measuring skewness of the distribution.

The vertical line drawn at the left side of the box represents the location of Q₁ and the vertical line at the right side of box represents the location of Q₃. Thus, the box contains the middle 50% of the values. The lower 25% of the data are represented by a line (known as whisker) connecting the left side of the box to the location of the smallest value, X _smallest. Similarly, the upper 25% of the data are represented by a line( known as whisker) connecting the right side of the box to X _largest as shown in

Comparison	Left skewed	Right skewed	Symmetric
1. The distance from X_smallest to the median verses the distance from the median to X_largest	The distance from the X _smallest to the median is greater than the distance from the median to X largest	The distance from X smallest to the median is less than the distance from the median to X largest	Both distances are the same
2. The distance from X _smallest to Q₁ verses the distance from Q₃ to X largest	The distance from X smallest to Q₁is greater than the distance from Q3 to X largest	The distance from X smallest to Q₁is less than the distance from Q₃ to X largest	Both distances are same.
3. The distance from Q₁to the median verses the distance from the median to Q_3.	The distance from Q₁ to the median is greater than the distance from the median to Q₃	The distance from Q₁ to the median is less than the distance from the median to Q₃	Both distances are same

Numerical problems

Example1: Compute mean, median and mode of the following data

55	39	45	55	41	35	60
40	55	35	37	55	55	65

Solution:

Arranging given data in ascending order:

X: 35, 35, 37, 39, 40, 41, 45, 55, 55, 55, 55, 55, 60, 65

Mean,

= 48

Median (Md.) = Value of item

= Value of item

= Value of 7.5^th item

= Value of

= 50

Mode (M_o) = Value of variable X which repeats maximum number of times

= 55

Example 2: Find Q₁, D₃ and P₆₅ from the given data: 8, 6, 5, 4, 10, 15, 3, 16

Solution

Here, the number of observation, i.e. n = 8

First, the data are arranged in ascending order: 3, 4, 5, 6, 8, 10, 15, 16.

Q₁ = Value of item

= value of item

= value of 2.25^th item

= 2^nd item +0.25 (3^rd – 2^nd) item

Q₁ = 4 + 0.25 (5 – 4) = 4.25

D₃ = Value of item

= value of item

= value of 2.7^th item

= value of 2^nd item + 0.7 (3^rd – 2^nd) item

D₃= 4 + 0.7 (5 – 4) = 4.7

P₆₅ = Value of item

= value of item

= value of 5.85^th item

= value of 5^th item + 0.85 (6th item - 5^th item)

= 8 + 0.85 (10 – 8) = 9.7

P₆₅= 9.7

Example 3: The number of telephone calls received at an exchange for 200 successive one-minute intervals are given below.

No. of calls	0	1	2	3	4	5	6	Total
Frequency	15	22	28	35	42	34	24	200

Compute the mean, median and mode.

Solution:

No. of calls (X)

Frequency (f)

Less than c.f.

100

142

176

200

N = ∑f = 200

Mean,

= 3.325

Median (Md.) = Value of item

= Value of item

= Value of 100.5^th item

= 4

Mode (Mo) = Value of variable X corresponding to maximum frequency

= 4

Example 4: Find upper quartile and upper decile from the given data. Also obtain P₇₇.

X	1	2	3	4	5	6	7	8	9	10	11
F	2	5	8	10	12	8	6	4	3	2	1

Solution

Calculation of partition values

Less than c.f.

N =61

For Q₃,

Q₃ = value of item

= value of item

= value of 46.5th item. The value in c.f. just greater than 46.5 is 51.

So upper quartile Q₃ = 7.

For D₉ :

(D₉) = value of item

= value of item

= Value of 55.8^thitem

The value of c.f. just greater than 55.8 is 58.

So, D₉ = 9. For P₇₇

P₇₇ = value of item

= value of item

= value of 47.74th item .

The value of c.f. just greater than 47.74 is 51.

P₇₇ = 7.

Example 5: The length power failure in minute are recorded in the following table.

Power Failure time	22	23	24	25	26	27	28	Total
Frequency	2	5	7	10	4	3	2	33

Find Q₃, D₂ and P₄₀ and interpret the results.

Solution:

Power failure time (X)

Frequency (f)

Less than c.f.

N = 33

For Q₃,

Q₃ = value of item

= value of item

= value of 25.5^th item. The value in c.f. just greater than 25.5 is 28.

So upper quartile Q₃ = 26 minutes. For D₂ :

D₂ = value of item

= value of item

= Value of 6.8^thitem

The value of c.f. just greater than 6.8 is 7.

So, D₂ = 23 minutes For P₄₀

P₄₀ = value of item

= value of item

= value of 13.6^th item.

The value of c.f. just greater than 13.6 is 14.

P₄₀ = 24 minutes

Example 6: The length in meter of 100 VGA Cable used in a company are measured to the nearest 0.01 meter and the results are given below.

Length in meter	Frequency	Length in meter	Frequency
3.80-3.89	3	4.20-4.29	28
3.90-3.99	8	4.30-4.39	18
4.00-4.09	14	4.40-4.49	10
4.10-4.19	19	4.50-4.59	8

Find the value of mean, mode and median.

Solution:

Correction factor =

Length in meter

Frequency (f)

Less than c.f.

Mid .value (X)

f X

3.80-3.89 3.90-3.99 4.00-4.09 4.10-4.19 4.20-4.29 4.30-4.39 4.40-4.49

4.50-4.59

100

108

3.845 3.945 4.045 4.145 4.245 4.345 4.445

4.545

11.535

31.56

56.63

78.755 118.86

78.21 44.45

36.36

N= ∑ f = 108

∑ fx =456.36

Mean (

= 4.225 meters

For Mode:

Since, the given frequency distribution is regular and unimodal and maximum frequency is 28. So, modal class is 4.20-4.29 but its exclusive class is 4.195-4.295

L= 4.195, h = 0.1, f₁= 28, f₀= 19, f₂= 18

Mode (Mo) = L + × h

= 4.195 +

= 4.24 meters

For median (Md):

∴ Median class is 4.20-4.29 but its exclusive class is 4.195-4.295 L = 4.195, f = 28, h = 0.1, c.f. = 44

Median (Md) = L +

= 4.195 +

= 4.23 meters

Example 7: The percentage age distribution of urban male population of Nepal from 2011 census is given below:

Age group

Male population

Age group

Male population

0-4

5-9

10-14

15-19

20-24

25-29

30-34

11.8 12.9 12.5 11.2

10.7

8.9

7.2

35-39

40-44

45-49

50-54

55-59

60 and above

6.2 4.7 4.0 2.9 2.3

4.7

Compute the first and third quartiles, 8^th decile and 70^th percentile.

Solution:

Correction factor =

Age group

Male population (f)

Less than c.f.

0-4

5-9

10-14

15-19

20-24

25-29

30-34

35-39

40-44

45-49

50-54

55-59

60 & above

11.8 12.9 12.5 11.2

10.7

8.9 7.2 6.2 4.7 4.0 2.9 2.3

4.7

11.8 24.7 37.2 48.4 59.1 68.0 75.2 81.4 86.1 90.1 93.0

95.3

100

N = ∑ f = 100

For lower quartile or first quartile (𝑄₁)

𝑄₁ lies in class 10-14 but its exclusive class (adjusted class) is 9.5-14.5)

L = 9.5, f = 12.5, c.f. = 24.7, h = 5

= 9.5+

= 9.62

For third quartile (𝑄₃)

𝑄₃ lies in class 30-34 but its exclusive class (adjusted class) is 29.5-34.5 ,

L = 29.5, f = 7.2, c.f. = 68, h = 5

= 29.5 +

= 34.36

8^th decile (D₈)

D₈ lies in class 35- 39 but its exclusive class is 34.5-39.5

L = 34.5, f =6.2 , c.f. = 75.2, h = 5

= 34.5 +

= 38.37

70^th percentile (P₇₀)

P₇₀ lies in class 30-34 but its exclusive class is 29.5-34.5

L = 29.5, f = 7.2, c.f. = 68, h = 5

= 29.5 +

Example 8: The marks distribution of 100 students of a college is as follows.

Marks	10-20	20-40	40-70	70-90	90-100
No. of students	15	20	30	20	15

(i) Find the highest mark of the weakest 30% of the students.

(ii) Find the lowest mark of top 40 % of the students.

(iii) Find the lowest marks of top 20% of the students.

(iv) Find the limits and range of marks of middle 50% of students. Solution:

Marks

No. of students (f)

Less than c.f.

10-20

20-40

40-70

70-90

90-100

100

N = ∑ f = 100

(i) The highest marks of the weakest 30% of the students is given by P₃₀

30^th percentile (P₃₀)

P₃₀ lies in class 20-40

L = 20, f = 20, c.f. = 15, h = 20

= 20 +

(ii) The lowest mark of top 40 % of the students is given by P₆₀

60^th percentile (P₆₀)

P₆₀ lies in class 40-70

L = 40, f = 30, c.f. = 35, h = 30

60 N

= 40 +

(iii) The lowest marks of top 20% of the students is given by P₈₀

80^th percentile (P₈₀)

P₈₀ lies in class 70- 90,

L = 70, f = 20, c.f. = 65, h = 20

= 70 +

= 85 marks

(iv) The limits of marks of middle 50% of students are given by P₂₅ & P₇₅

25^th percentile (P₂₅)

P₂₅ lies in class 20- 40,

L = 20, f = 20, c.f. = 15, h = 20

= 20 +

= 30 marks

75^th percentile (P₇₅)

P₇₅ lies in class 70- 90,

L = 70, f = 20, c.f. = 65, h = 20

= 70 +

= 80 marks

Lower limit, P₂₅ = 30 marks

Upper limit, P₇₅ = 80 marks

Range = ₇₅ - P₂₅ = 80 -30 = 50 marks

T.U. 2017 (Spring)

1. (b) The temperature in a chemical reactor was measured every half hour under the same conditions. The results were 78.1, 79.2, 78.9, 80.2, 78.3, 78.8, 79.4. Calculate the mean, median, lower quartile, upper quartile, standard deviation and coefficient of variation. Solution: Arranging the given data in ascending order of magnitude

Temperature (in (X)

𝑋2

78.1 78.3 78.8 78.9 79.2 79.4

80.2

6099.61

6130.89

6209.44

6225.21

6272.64

6304.36

6432.04

∑ 𝑋 = 552.9

∑ 𝑋² =43674.19

Mean,

= 78.985

Median (Md.) = Value of item

= Value of item

= Value of 4^th item

= 78.9

Mode (M_o) = Value of variable X which repeats maximum number of times

= no mode

Lower quartile, Q₁ = Value of item

= Value of item

= Value of 2^nd item = 78.3

Upper quartile, Q₃ = Value of item

= Value of item

= Value of 6^th item = 79.4

Standard deviation

= √6239.17 − 6238.743

= 0.6534

Coefficient of variation, C.V. =

= 0.8272%

PU 2018 (Spring)

1. (a) The following data set represents the number of new computer accounts registered during ten consecutive days.

43, 37, 50, 51, 58, 105, 52, 45, 45, 10

i. Compute the mean, median and standard deviation.

ii. Draw a box and whisker plot and identify whether it is skewed or not. Solution:

Arranging the given data in ascending order of magnitude.

No. of new computer accounts (X)

𝑋2

105

100

1369

1849

2025

2500

2601

2704

3364

11025

∑ 𝑋 = 496

∑ 𝑋² = 29562

(i) Mean,

= 49.6

Median (Md.) = Value of item

= Value of item

= Value of 5.5^th item

= Value of

= 47.5 OR

Median (Md.) = Value of item

= Value of item

= Value of 5.5^th item

= Value of 5^th item + 0.5 (6^th item - 5^th item )

= 45 + 0.5 (50-45)

= 45 + 0.5 × 5

= 47.5

Mode (M_o) = Value of variable X which repeats maximum number of times = 45

Standard deviation

= √2956.2 − 2460.16

= √496.04

= 22.271

(ii) To construct box and whisker plot:

At first we have to find five number summary

Smallest value = 10

Largest value = 105

Lower quartile, Q₁ = Value of item

= Value of item

= Value of 2.75^th item

= Value of 2^nd item + 0.75 (3^rd item - 2^nd item )

= 37 + 0.75 (43-37)

= 37 + 0.75 × 6

= 41.5

upper quartile, Q₃ = Value of item

= Value of 3 item

= Value of 8.25^th item

= Value of 8^th item + 0.25 (9^th item - 8^th item )

= 52 + 0.25 (58-52)

= 52 + 0.25 × 6

= 53.5

Hence, the five-number summary, (smallest, Q_1,Md, Q₃, largest) is

(10,41.5, 47.5,53.5,105)

(i) Length of left whisker (i.e. the distance from the smallest value to Q₁) =41.5-10 = 31.5

Length of right whisker (i.e. the distance from Q₃ to the largest value) = 105-53.5 = 51.5

(ii) The distance from the smallest value to the Md = 47.5 -`10 = 37.5

The distance from Md to largest value = 105-47.5 = 57.5

Since, Length of left whisker < Length of right whisker

& the distance from the smallest value to the Md < The distance from Md to largest value. Therefore, the distribution is positively skewed (i.e. right skewed).

PU 2017 (Spring)

Q.No.1 (a) Over a period of 40 days the percentage relative humidity in a vegetable storage building was measured. Mean daily values were recorded as shown below:

60	63	64	71	67	73	79	80	83	81
86	90	96	98	98	99	89	80	77	78
71	79	74	84	85	82	90	78	79	79
78	80	82	83	86	81	80	76	66	74

(i) Prepare a stem and leaf display for these data. Show the leaves sorted in order of increasing magnitude on each stem.

(ii) Draw a box plot for these data and interpret the data in practical manner. Solution:

(i) Arranging the given data in ascending order of magnitude:

Percentage relative humidity(X):

60, 63, 64, 66, 67, 71, 71, 73, 74, 74, 76, 77, 78, 78,78,79,79,79,79, 80, 80, 80, 80, 81, 81,82, 82, 83, 83, 84, 85, 86, 86, 89, 90, 90, 96, 98, 98,99

Stem and leaf display

Stem	Leaves
6	0 3 4 6 7
7	1 1 3 4 4 6 7 8 8 8 9 9 9 9
8	0 0 0 0 1 1 2 2 3 3 4 5 6 6 9
9	0 0 6 8 8 9

Stem and leaf display shows the ordered values from the smallest value to the largest (i.e. leaves sorted in order of increasing magnitude on each stem) and where the concentration of the data occurs.

(ii) To construct box and whisker plot (Box plot):

At first we have to find five number summary

Smallest value = 60 Largest value = 99

Q₁ = Value of item

= Value of item

= Value of 10.25^th item

= Value of 10^th item + 0.25 (11^th item - 10^th item )

= 74 + 0.25 (76-74)

= 74 + 0.25 × 2

= 74.5

Median (Md.) = Value of item

= Value of item

= Value of 20.5^th item

= Value of 20^th item + 0.5 (21^th item - 20^th item )

= 80 + 0.5 (80-80)

= 80 + 0.5 × 0

= 80

Upper quartile ,Q₃ = Value of item

= Value of 3 item

= Value of 30.75^th item

= Value of 30^th item + 0.75 (31^th item - 30^th item )

= 84 + 0.75 (85-84)

= 84 + 0.75 × 1

= 84.75

Hence, the five-number summary, (smallest, Q_1,Md, Q₃, largest) is (60, 74.5, 80, 84.75, 99)

= 74.5-60= 14.5

Length of right whisker (i.e. the distance from Q₃ to the largest value) = 99-84.75 = 14.25

(ii) The distance from the smallest value to the Md = 80-`60 = 20

The distance from Md to largest value = 99-80 = 19

Since, Length of left whisker > Length of right whisker

& the distance from the smallest value to the Md > The distance from Md to largest value.

Therefore, the distribution is negatively skewed (i.e. left skewed). It indicates there is a high frequency of high values of percentage relative humidity in a vegetable storage building are concentrated on the right side and low frequency of less values which are on the left tailed. In other words, there is a high frequency of high values and low frequency of less values of percentage relative humidity in a vegetable storage building.

PU 2014 (Spring)

Q.No.2:The following are the number of minutes that a person had to wait for the bus to work on 15 working days :

10, 1, 13, 9, 5, 9, 2, 10, 3, 8, 6, 17, 2, 10, 15 Draw a box plot and interpret the result.

Solution: To construct box and whisker plot (Box plot): At first we have to find five number summary

Arranging the given data in ascending order of magnitude

1, 2, 2, 3, 5, 6, 8, 9, 9, 10, 10, 10, 13, 15, 17

Smallest value = 1 minute

Largest value = 17 minutes

Q₁ = Value of item

= Value of item

= Value of 4^th item = 3

Median (Md.) = Value of item

= Value of item

= Value of 8^th item

= 9

Q₃ = Value of item

= Value of 3 item

= Value of 12^th item

= 10

Length of right whisker (i.e. the distance from Q₃ to the largest value) = 17- 10 = 7

(ii) The distance from the smallest value to the Md = 9- 1 = 8

The distance from Md to largest value = 17-9 = 7

Since, Length of left whisker < Length of right whisker

But, the distance from the smallest value to the Md > The distance from Md to largest value. Therefore, the the distribution is not uniformly distributed.

PU 2016(Fall)

1.(b) A random sample was taken of the thickness of insulation in transformer windings, and the following thickness (in millimetres) were recorder:

18 21 22 29 25 31 37 38 41 39 44 48 54 56 56 57 47 38 35 36 29 37 32 42 43 40 48 36 37 37 (i) Prepare a stem-and leaf display for these data.

(ii) Prepare a box plot for these data.

Solution: Arranging the given data in ascending order of magnitude:

18, 21, 22, 25, 29, 29, 31, 32, 35, 36, 36, 37, 37, 37, 37 38, 38, 39, 40, 41, 42, 43, 44, 47, 48, 48, 54, 56, 56, 57

(i) Stem and leaf display

Stem	Leaves
1	8
2	1 2 5 9 9
3	1 2 5 6 6 7 7 7 7 8 8 9
4	0 1 2 3 4 7 8 8
5	4 6 6 7

(ii) To construct box and whisker plot (Box plot):

At first we have to find five number summary

Smallest value = 18 millimetres

Largest value = 57 millimetres

Q₁ = Value of item

= Value of item

= Value of 7.75^th item

= Value of 7^th item + 0.75 (8^th item - 7^th item )

= 31 + 0.75 (32-31)

= 31 + 0.75 × 1

= 31.75 millimetres

Median (Md.) = Value of item

= Value of item

= Value of 15.5^th item

= Value of 15^th item + 0.5 (16^th item - 15^th item )

= 37 + 0.5 (38-37)

= 37+ 0.5 × 1 = 37.5 millimetres

Q₃ = Value of item

= Value of 3 item

= Value of 23.25^th item

= Value of 23^th item + 0.25 (24^th item - 23^th item )

= 44 + 0.25 (47-44)

= 44 + 0.25 × 3

= 44.75 millimetres

Hence, the five-number summary is (smallest, Q_1,Md, Q₃, largest) is (18, 31.5, 37.5, 44.75, 57)

(ii) The distance from the smallest value to the Md = 37.5- 18 = 19.5

The distance from Md to largest value = 57—37.5 = 19.5

Since, Length of left whisker > Length of right whisker

But the distance from the smallest value to the Md = The distance from Md to largest value.

Therefore, the distribution is slightly left skewed. (i.e. the distribution is not uniformly distributed)

PU 2018 (Fall)

1 (a) An investigator wants to study the speed of cars at Araniko high and he collected the speed of 30 vehicles and speeds were:

35, 37, 42, 45, 47, 48, 50, 55, 67, 70, 75, 80, 90, 95, 94, 48, 55, 60, 71, 63, 70, 65, 80, 55, 40, 35, 36, 85, 79, 30.

(i) Present the above data in stem and leaf display.

(ii) Construct continuous frequency distribution using Struge’s rule and

Construct the cumulative curve and find median speed, speed of first 25% vehicles, speed of first 75% vehicles and also compute the percentage of vehicles whose speed lies between 40 to 70 km.

Solution: Arranging the given data in ascending order of magnitude:

Speed (in km) X :30, 35, ,35, 36, 37, 40, 42, 45, 47, 48, 48, 50, 55, 55, 55, 60, 63, 65, 67, 70, ,70, ,71, 75, ,79, 80, 80, 85, 90, 94, 95

Stem and leaf display

Stem	Leaves
3	0 5 5 6 7
4	0 2 5 7 8 8
5	0 5 5 5
6	0 3 5 7
7	0 0 1 5 9
8 9	0 0 5 0 4 5

(ii) Since, class size (h) is not given, therefore at first it needs to find the approximate number of class intervals (k) and class size (h)

Number of observations, n = 30

S = smallest value = 30 L = Largest value = 95

By Struge’s formula,

Number of classes, k = 1 + 3.322logn

= 1 + 3.322 log30

= 1+3.322× 1.4771

= 5.9069 ≈ 6

Class width or class size, h =

Continuous frequency distribution:

Speed (in km)	Tally bar	Frequency (f)
30 – 41	\|\|\|\| \|	6
41– 52	\|\|\|\| \|	6
52 – 63	\|\|\|\|	4
63 – 74	\|\|\|\| \|	6
74– 85	\|\|\|\|	4
85 – 96	\|\|\|\|	4

Less than cumulative frequency distribution

Speed (in km)	Less than c.f.
Less than 41	6
Less than 52	12
Less than 63	16
Less than 74	22
Less than 85	26
Less than 96	30

Median speed, Md = 58 km

The speed of first 25% vehicles is given by

P₂₅ = = 44 km

& the speed of first 75% vehicles is given by

P₇₅= Q₃ = 76 km

The number of vehicles whose speed lies between 40 to 70 km

= 4 + 5 + 5

= 14

∴ The percentage of vehicles whose speed lies between 40 to 70 km

= 46.67%

Note: 𝐐_𝟏 = 𝐏_𝟐𝟓 , Md = 𝐐_𝟐 = 𝐃_𝟓= 𝐏_𝟓𝟎, 𝐏_𝟕𝟓= 𝐐_𝟑

PU 2018 (Spring)

Q.No.1.(b): After the implementation of an economic program to uplift the economic condition of a community following information were found.

Monthly income (Rs. 000)

4-6

6-8

8-10

10-12

12-14

16-16

16-18

After the plan (no. of

families)

Construct an ogive to find

(i) Find the number of families whose monthly income is between Rs. 8,000 to Rs. 14,000

(ii) Find the number of families whose monthly income is above Rs. 12,000

Solution: Less than cumulative frequency distribution

‘Monthly income (Rs. 000)	Less than c.f.
Less than 6	8
Less than 8	73
Less than 10	110
Less than 12	125
Less than 14	140
Less than 16	145
Less than 18	150

(i) The number of families whose monthly income is between Rs. 8,000 to Rs. 14,000 = 9+ 20+20+20+1 = 70

(ii) The number of families whose monthly income is above Rs. 12,000 = 17 +10 =27 OR

The number of families whose monthly income is above Rs. 12,000

= 150-123

= 27

PU 2015 (Spring)

Q.No.1 (a): The following table shows length of eighty bally bridge:

68	84	73	82	68	90	62	88	76	93
73	79	75	73	60	93	71	59	85	75
61	65	88	87	74	62	95	78	63	72
66	78	75	75	94	77	69	74	68	60
96	78	82	61	75	95	60	79	83	71
79	62	89	97	78	85	76	65	71	75
65	80	67	57	88	78	62	76	53	74
86	67	73	81	72	63	76	75	85	77

With the reference of above table.

(i)Construct the grouped frequency distribution having class width 10.

(ii) Draw less than ogive and more than ogive in same graph and hence locate median.

(iii) By the help of less than ogive , find the number of bridge having length less than 65 meters.

Solution: (i) Grouped frequency distribution having class width 10

Length of bally bridge	Tally bar	Frequency (f)
50-60	\|\|\|	3
60 –70	\|\|\|\| \|\|\|\| \|\|\|\| \|\|\|\| \|	21
70 –80	\|\|\|\| \|\|\|\| \|\|\|\| \|\|\|\| \|\|\|\| \|\|\|\| \|\|\|	33
80 –90	\|\|\|\|\| \|\|\|\| \|\|\|\|	15
90 –100	\|\|\|\| \|\|\|	8
		N = ∑ f = 80

Solution:

(ii) Less than and more than cumulative frequency distribution

Length of bally bridge	Less than c.f.	Length of bally bridge	More than c.f.
Less than60	3	More than 50	80
Less than 70	24	More than 60	77
Less than80	57	More than 70	56
Less than 90	72	More than 80	23
Less than100	80	More than 90	8

From ogive curve median (Md) = 75 metres

(iii)By the help of less than ogive, the number of bridge having length less than 65 meters = 10+3 = 13 metres

PU 2017 (Fall)

Q.No.1(a): Following data represents the tensile strength of steel-rod manufactured by company A located at Biratnagar.

65	36	49	84	79	56	28	43	67	36
43	78	37	40	68	72	70	55	62	82
88	50	60	56	57	46	39	57	22	65
59	48	76	74	80	69	51	40	56	45
35	21	62	52	63	32	86	64	53	34

Construct a frequency distribution and represent the data by means of Cumulative frequency curve. Identify the median and first quartile from the curve. Also interpret the result of 1^st quartile

Solution:

Since, class size (h) is not given, therefore at first it needs to find the approximate number of class intervals (k) and class size (h)

Number of observations, n = 50

S = smallest value = 21 L = Largest value = 88

By Struge’s formula,

Number of classes, k = 1 + 3.322logn

= 1 + 3.322 log50

= 1+3.322× 1.6989

= 6.643 ≈ 7

Class width or class size, h =

Frequency distribution

Tensile strength of steel rod	Tally bar	Frequency (f)
20 – 30	\|\|\|	3
30– 40	\|\|\|\| \|\|	7
40 – 50	\|\|\|\| \|\|\|	8
50 – 60	\|\|\|\| \|\|\|\| \|	11
60– 70	\|\|\|\| \|\|\|\|	10
70 – 80	\|\|\|\| \|	6
80-90	\|\|\|\|	5
		N = ∑ f = 50

ess than cumulative frequency distribution

Tensile strength of steel rod	Less than c.f.
Less than 30	3
Less than 40	10
Less than 50	18
Less than 60	29
Less than 70	39
Less than 80	45
Less than 90	50

Less than ogive curve (or Less than cumulative frequency curve)

From ogive curve,

Median (Md) = 59 First quartile (Q₁) = 44

Thus, the first quartile indicates that the tensile strength of first 25% steel-rod is 44. PU 2013 (Fall)

(1) (a) Following information shows the daily wage of workers of certain hydropower company, prepare suitable ogive that helps to give the answers of following questions

Daily wages	0-20	20-40	40-60	60-80	80-100
No. of workers	41	51	64	38	7

(i) About what wage above that 50% workers earn?

(ii) What would be the daily wage limit of middle 30% workers?

Additional question:

(iii) If 20% workers are the lowest earners, find the highest wage of them.

(iv) If 10% workers are the highest earners, find the lowest wage of highest 10 % of the workers.

Solution:

Less than cumulative frequency distribution

Daily wages	No. of workers (Less than c.f.)
Less than 20	41
Less than 40	92
Less than 60	156
Less than 80	194
Less than 100	201

(i) From ogive curve,

The wage above that 50% workers earn is Rs 42.

(ii) The daily wage limits of middle 30% workers are given by P₃₅ and P₆₅

From ogive curve,

P₃₅ = Rs. 28

P₆₅ = Rs.52

Lower limit, P₃₅ = Rs. 28

Upper limit, P₆₅ = Rs.52

(iii) The highest wage of 20% lowest earners (workers) is given by P₂₀

From ogive curve,

₂₀ = Rs. 19

(iv) The lowest wage of highest 10 % of the workers is given by P₉₀.

∴ P₉₀ = Rs. 73

PU 2013 (Fall)

1. (b) The following information shows the income distribution.

Income ($’000)	0-10	10-20	20-30	30-40	40-50	50-60
No. of persons	5	10	18	23	7	6

Construct less than ogive. Also use it to find (i) the number of persons having income less than $35,000 and (ii) percentage of persons having income between $20,000 and $50,000

Additional:

(iii) Percentage of persons having income more than $40,000 Solution:

Less than cumulative frequency distribution

Income ($ 000)	No. of persons(Less than c.f.)
Less than 10	5
Less than 20	15
Less than 30	33
Less than 40	56
Less than 50	63
Less than 60	69

(i) The number of persons having income less than $35000 = 10 +10+10+10+ 5 = 45

(ii) The number of persons having income between $ 20000 and $50000 = 4 +10+10+10+10+2 = 46 The Percentage of persons having income between $20,000 and $50,000

= 66.67%

(iii) The number of persons having income more than $40,000

= 10+10+10+10+10+6

= 56

∴ The percentage of persons having income more than $40,000

= 81.159%

PU 2016(spring)

Q.No.1 (a) From the following frequency distribution,

Income (Rs 000)	0-10	10-20	20-30	30-40	40-50	50-60
No. of persons	5	10	18	23	7	6

Construct an ogive that will help you the answer to find the number of persons having income:

(i) Less than Rs. 35000

(ii). Between Rs. 20000 and Rs.50000 (iii). More than Rs.25000 Solution:

Less than cumulative frequency distribution

Income (Rs.000)	No. of persons(Less than c.f.)
Less than 10	5
Less than 20	15
Less than 30	33
Less than 40	56
Less than 50	63
Less than 60	69

Less than ogive curve (or Less than cumulative frequency curve)

From ogive curve,

(i) The number of persons having income less than Rs. 35000 = 10 +10+10+10+ 5 = 45

(ii) The number of persons having income between Rs. 20000 and Rs.50000 = 4 +10+10+10+10+2 = 46

(iii) The number of persons having income more than Rs.25000

= 7+10+10+10+9= 46

PU 2015 1(a):

The test scores of the students in probability and statistics are listed below. Construct a stem-and leaf plot of the scores.

92 78 73 89 98 89 83 75 83 94 99 69 71 96 67 81 73 88 86 82 63 73 76 82 84 89 92 95 78 87

Also, find the lowest score of the best 25% of the students. Solution:

Arranging the given data in ascending order of magnitude:

63, 67, 69, 71, 73, 73, 73, 75, 76, 78, 78, 81, 82, 82, 83, 83, 84, 86, 87 88, 89, 89, 89, 92, 92, 94, 95, 96, 98, 99

Stem and leaf plot

Stem	Leaves
6	3 7 9
7	1 3 3 3 5 6 8 8
8	1 2 2 3 3 4 6 7 8 9 9 9
9	2 2 4 5 6 8 9

The lowest score of the best 25% of the students is given by P₇₅

P₇₅ = Value of item

= Value of 75 item

= Value of 23.25^th item

= Value of 23^th item + 0.25 (24^th item - 23^th item )

= 89 + 0.25 (92-89)

= 89 + 0.25 × 3

= 89.75 scores

PU 2013(Spring)

1. (a) The weight (in lbs) of 40 boys in a class are as follows:

138 172 145 147 150 119 158 152 168 142

157 147 102 144 165 136 164 163 128 135

126 150 146 148 145 125 146 153 138 156

173 140 135 149 140 144 132 154 142 135 (i) Construct a frequency distribution.

(ii) Draw less than ogive and find no. of boys whose weight is less than 165 lbs.

Solution: Solution:

Since, class size (h) is not given, therefore at first it needs to find the approximate number of class intervals (k) and class size (h)

Number of observations, n = 40

S = smallest value = 102 L = Largest value = 173

By Struge’s formula,

Number of classes, k = 1 + 3.322logn

= 1 + 3.322 log40

= 1+3.322× 1.602

= 6.322 ≈ 6

Class width or class size, h =

Frequency distribution

Weight (in lbs)	Tally bar	No.of boys(f)
102 – 113	\|	1
113–124	\|	1
124 –135	\|\|\|\|	4
135 –146	\|\|\|\| \|\|\|\| \|\|\|\|	14
146 –157	\|\|\|\| \|\|\|\| \|\|	12
157–168	\|\|\|\|	5
168 –179	\|\|\|	3
		N = ∑ f = 40

(ii) Less than cumulative frequency distribution

Weight (in lbs)	Less than c.f.
Less than 113	1
Less than 124	2
Less than 135	6
Less than 146	20
Less than 157	32
Less than 168	37
Less than 179	40

PU 2013(spring)

Q.No.1.(b): From the following distribution of mark of 500 students of a college, find the minimum pass mark if only 20% of student had failed and also the minimum mark obtained by the top 25% of the students.

Represent the data by histogram.

Marks	0-20	20-40	40-50	50-60	60-80	80-100
No. of students	50	100	150	90	60	50

Marks

No. of students (f)

Less than c.f.

0-20

20-40

40-50

50-60

60-80

80-100

100

150

300

390

450

500

N = ∑ f = 500

If 20 % of the students failed i.e. 80% students passed, the minimum marks of 20% of the failed students is given by P₈₀

80^th percentile (P₈₀)

P₈₀ lies in class 60- 80,

L = 60, f = 60, c.f. = 390, h = 20

= 60 +

= 63.33 marks

& the minimum marks obtained by the top 25 % of the students is given by P₇₅

75^th percentile (P₇₅)

P₇₅ lies in class 50- 60,

L = 50, f = 90, c.f. = 300, h = 10

= 50 +

= 58.33 marks

For histogram

This is the case of unequal class interval, therefore adjustment of the frequencies must be made. The class size of third and fourth class intervals is 10, that of first, second, fifth and sixth is 20 which is double of 10. So, the frequencies of first, second, fifth and sixth classes are divided by 2 i.e. 50/2 = 25

PU2014 (fall)

1. (a) Represent the following data by means of histogram, frequency curve and polygon.

Salaries

300-310

310-320

320-330

330-350

350-370

370-400

No. of

worker

Solution:

This is the case of unequal class interval; therefore adjustment of the frequencies must be made. The class size of first three class intervals is 10, that of fourth and fifth is 20 which is double of 10. So, the frequencies of fourth and fifth classes are divided by 2 i.e. 15/2 =7.5 and 12 /2 = 6. Also, class size of

1. (a) The daily wages of workers of a factory are given below:

Wages (Rs.)	300-310	310-320	320-330	330-350	350-370	370-410
No. of workers	8	10	20	18/2	16/2	12/4

(i) Construct a histogram and frequency polygon for the data.

(ii) Draw an ogive for the data and estimate the median age.

Solution:

(i) This is the case of unequal class interval; therefore adjustment of the frequencies must be made. The class size of first three class intervals is 10, that of fourth and fifth is 20 which is double of 10. So, the frequencies of fourth and fifth classes are divided by 2 i.e. 18/2 =9 and 16 /2 = 8. Also, class size of last class is 40 which is 4 times of 10. so, the frequency of last class is divided by 4 i.e. 12/4 = 3.

Wages (Rs.)	No. of workers(Less than c.f.)
Less than 310	8
Less than 320	18
Less than 330	38
Less than 350	56
Less than 370	72
Less than 410	84

Note: For ogive curve, it not necessary to be equal class size. So, no need of adjustment.

Md = Rs. 337

Example 9: from the following data, obtain interquartile range, Q.D. & coefficient of Q.D. Daily production: 25, 20, 23, 18, 22, 17, 26 Solution:

Arranging the given data in ascending order, we get,

17, 18, 20, 22, 23, 25, 26 Now,

Q₁ = value of item.

= value of item.

= value of 2^nd item. = 18

Q₃ = value of 3 item.

th = value of item.

= value of 6^th item.

= 25

Interquartile range = Q₃− Q₁

= 25 -18 = 7

Quartile deviation or Semi-interquartile range (Q.D.) =

= 3.5

Coefficient of Q.D. =

Example 10: Find the interquartile range, Q.D. and Coefficient of Q.D. from the following series X : 9, 10, 5, 6, 7, 2 , 8, 4

Solution: Arranging the given data in ascending order X : 2, 4, 5, 6, 7, 8, 9, 10

Q₁ = value of item.

= value of item.

= value of 2.25^th item.

= Value of 2^nditem + 0.25( 3^th item - 2^nd item)

Q₁= 4 + 0.25 (5 – 4)

= 4.25

Q₃ = value of item.

= value of item.

= value of 6.75^th item.

= Value of 6^th item + 0.75( 7^th item - 6^th item)

Q₃= 8 + 0.75 (9 – 8) = 8.75 Interquartile range = 𝑄₃− 𝑄₁

= 8.75 4.25

= 4.5

Quartile deviation or Semi-interquartile range (Q.D.) =

= 2.25

Coefficient of Q.D. =

Example11:Compute the quartile deviation of the following distribution giving the screen size of Laptop available in Nepalese Laptop Market.

Size of Screen (cm)

No. of Laptop

Size of Screen (cm)

No. of Laptop

9.5

10.0 10.5 11.0 11.5 12.0 12.5

13.0

110

150

13.5 14.0 14.5 15.0 15.5 16.0 16.5

17.0

200

250

280

245

Solution:

Size of screen (cm) X	No. of Laptop (f)	Less than c.f.
9.5 10 10.5 11 11.5	1 8 20 30 50	1 9 29 59 109
12 12.5 13 13.5 14 14.5 15 15.5 16 16.5 17	95 110 150 200 250 280 245 80 40 35 5	204 314 464 664 914 1194 1432 1519 1559 1594 1599
	N= ∑ f = 1599

Quartile deviation (Q.D.) = Q₁ = value of item.

= value of item

= value of 400^th item = 13 cm.

Q₃ = value of item.

= value of item

= value of 1200^th item = 15 cm.

Quartile deviation (Q.D.) =

= 1 cm

Example12. The following frequency distribution represents the weight of 200 laptops.

Weight in lbs

Frequency

Weight in lbs

Frequency

4-5

5-6

6-7

7-8

8-9

9-10

10-11

11-12

Compute the first three quartiles and quartile deviation. Solution:

Weight in lbs

Frequency (f )

Less than c.f.

4-5

5-6

6-7

7-8

8-9

9-10

10-11

11-12

N= ∑ f = 193

For lower quartile or first quartile (𝑄₁)

𝑄₁ lies in class 6-7

L = 6, f = 35, c.f. = 44, h = 1

= 6.12 lbs

For 2^nd quartile (Q₂)

Q₂ lies in class 7-8

L = 7, f = 48, c.f. = 79, h = 1

= 7.36 lbs

For 3^rd quartile (Q₃)

Q₃ lies in class 8-9

L = 8, f = 32, c.f. = 127, h = 1

= 8.55 lbs

Quartile deviation (Q.D.) =

=1.215 lbs

Coefficient of Q.D. =

Example 13: The scores obtained by 10 students in Statistics I of an IT college are given below. Compute range and standard deviation

55 35 60 55 55 65 40 45 35 42

Solution:

Score (X)

3025

1225

3600

3025

4225

1600

2025

1225

1764

∑ X= 487

X² = 24739

Range (R) = L-S

= 65 -35

= 30 score

Standard deviation

= 10.109 score

Example 14: Find standard deviation (S.D.) and variance of the following data.

Variable (X)	10	14	15	18	20
Frequency (f)	3	5	7	6	4

Solution:

Variable (X)

Frequency (f )

f𝑋²

105

108

300

980

1575

1944

1600

N = ∑ 𝑓 = 25

∑ f X= 393

∑ f X² = 6399

Standard deviation (

= √8.841 = 2.973

Variance (

= 8.841

Example 15: The frequency distribution of time required to open the operating system of 200 computers is given below.

Time in seconds

No. of computers

Time in seconds

No. of computers

0-4

5-9

10-14

15-19

20-24

25-29

30-34

35-39

Compute the standard deviation.

Solution: Let a = 17.5

Customer service time (in minutes)

No.of customers (f)

Mid. value (X)

f𝑑^′

f𝑑′²

0-5

5-10

10-15

15-20

20-25

25-30

2.5

7.5

12.5 17.5 22.5

27.5

-3

-2

-1

-6

-16

-26

N = ∑ 𝑓 = 100

∑ 𝑓𝑑^′ = -8

∑ f𝑑′² = 128

Standard deviation (σ ) = √∑ fd′2 − (∑ fd′)2 × h

N N

= 5.64 minutes

Example 16: The following data gives on temperature of Kathmandu for a week in summer. Compute the range and quartile deviation.

Day	Sun	Mon	Tue	Wed	Thu	Fri	Sat
Temp.()	34	35	32	35	36	34	35

Solution:

Range (R) = L – S

= 36 – 32

= 4

Arranging the given data in ascending order Temp.() X : 32, 34, 34, 35, 35, 35, 36 Q₁ = value of item.

= value of item.

= value of 4^th item. = 35

Q₃ = value of item.

= value of item.

= value of 6^th item.

= 35

Quartile deviation (Q.D.) =

= 0.5

Example 17: The number of runs scored by two group of cricket players in a test match are

Group A	10	25	85	72	115	80	52	45	30	10
Group B	120	15	30	35	42	65	80	34	25	15

Test which group is more consistent.

Solution:

For Group A

No. of runs(X)

𝑋²

115

100

625

7225

5184

13225

6400

2704

2025

900

100

∑ 𝑋= 524

∑ 𝑋²=38488

̅X = ^{∑ X} = ⁵²⁴= 52.4 n 10

S.d.

C.V. (Group A) =

= 63.38%

For Group B

No. of runs(X)

𝑋²

120

14400

225

900

1225

1764

4225

6400

1156

625

225

∑ 𝑋= 461

∑ 𝑋²=29929

̅X = ^{∑ X} = ⁴⁶¹= 46.1 n 10

S.d.

C.V. (Group B) =

= 63.86%

Since, C.V. (Group A) < CV. (Group B).Therefore, group A is more consistent.

For Group B

No. of runs(X)

𝑋²

120

14400

225

900

1225

1764

4225

6400

1156

625

225

∑ 𝑋= 461

∑ 𝑋²=29929

̅X = ^{∑ X} = ⁴⁶¹= 46.1 n 10

S.d.

C.V. (Group B) =

= 63.86%

Since, C.V.(Group A) < CV( Group B).Therefore, group A is more consistent.

Example 18: The following data represents the scores made in an intelligent test by two groups of students from section A and section B of a college.

Students no.

Section A

Section B

Students no.

Section A

Section B

Test which group is more consistent.

Example 19:What are the roles of measure of dispersion in descriptive statistics? Following table gives the frequency distribution of thickness of computer chips (in nanometre) manufactured by two companies.

Thickness of computer chips		5	10	15	20	25	30
Number of chips by	Company A	10	15	24	20	18	13
Number of chips by	Company B	12	18	20	22	24	4

Which company may be considered more consistent in terms of thickness of computer chips? Apply appropriate descriptive statistics.

Solution: For company A

Thickness of computer chips(X)

No. of chips (f)

d = X-15

fd²

-10

-5

-100

-75

100

180

195

1000

375

500

1800

2925

N =∑ f = 100

∑ fd= 300

∑ fd²= 6600

Mean (

= 15 + = 18

S.d.

C.V. (Company A) =

= 41.93%

For company B

Mean (

= 15 +

= 17

Thickness of computer chips(X)

No. of chips (f)

d = X-15

fd²

-10

-5

-120

-90

110

240

1200

450

550

2400

900

N =∑ f = 100

∑ fd= 200

∑ fd²= 5500

S.d.

C.V. (Company B) =

= 42.005%

Since, C.V.(Company A) < CV( Company B).Therefore, company A is considered more consistent than company B in terms of thickness of computer chips.

Example 20: The following table shows the monthly expenditure of ward no.1 and ward no. 2 of Kathmandu Metropolitan City in certain locality.

Expenditure (in 000 Rs.)	0-5	5-10	10-15	15-20	20-25	25-30
No. of families (ward no.1)	5	12	50	20	10	3
No. of families (ward no.2)	7	15	40	18	12	8

Which ward of people has uniform expenditure?

Solution: The ward of people has more uniform expenditure whose Coefficient of variation (C.V.) is less.

Expenditure (in 000 Rs.)

No. of families

(ward no.1) f

mid. value (x)

f𝑑^′

f𝑑′2

0-5

5-10

10-15

15-20

20-25

25-30

2.5

7.5

12.5 17.5 22.5

27.5

-2

-1

-10

-12

N = 100

∑ f𝑑^′= 27

∑ f𝑑^′2= 119

For ward no.1

C.V. (Ward no. 1) = 19.3223%

For ward no.2

Expenditure (in 000 Rs.)

No. of families

(ward no.1) f

mid. value

(x)

f𝑑^′

f𝑑′2

0-5

5-10

10-15

15-20

20-25

25-30

2.5

7.5

12.5 17.5 22.5

27.5

-2

-1

-14

-15

N = 100

∑ f𝑑^′= 37

∑ f𝑑^′2= 181

C.V.(Ward no. 2) = 45.06911%

Since, C.V.(Ward no. 1) < C.V.(Ward no. 2). Therefore, people of ward 1 has more uniform expenditure than ward no. 2.

Example 21: The following table gives the two bike models and their corresponding life:

Life (in years)		0-2	2-4	4-6	6-8	8-10
No. of bikes	Model T ₁	1	9	12	11	8
No. of bikes	Model T₂	5	7	11	19	9

Which model of bike has greater uniformity?

Solution:

We have to compute coefficient of variation (C.V.) to determine the uniformity.

Computation of Sum of Values for Mean and S.D.

For Model 𝑻_𝟏

Life

(in years)

No. of bikes

(f)

mid. value (x)

f𝑑^′

f𝑑′2

0-2

2-4

4-6

6-8

8-10

-2

-1

-2

-9

N = ∑ 𝑓 = 41

∑ f𝑑^′= 16

∑ f𝑑^′2= 56

5 +0.78 = 5.78 years

= 2.202 years

C.V. (Model38.09%

For Model 𝑻_𝟐

Life

(in years)

No. of bikes (f)

mid. (x)

value

f𝑑^′

f𝑑′2

0-2

2-4

4-6

6-8

8-10

-2

-1

-10

-7

N = ∑ 𝑓 = 51

∑ f𝑑^′= 20

∑ f𝑑^′2= 82

̅X = a +^∑
fd′× h = 5+ ²⁰× 2 = 5 +0.7843 = 5.7843 years

N 51 √∑ fd′2 − (∑ fd′)2 × h

σ =

N N

= 2.4116 years

C.V. (Model41.692%

Since, C.V. (Model T₁) < C.V. (Model T₂). Therefore, model T₁ of bike has greater uniformity than model T₂

PU 2014 (Spring), 2015 (Spring), 2018 (Fall)

b) The lives of two models (A and B) of refrigerators in recent survey are shown below:

Life (No. of years)

No. of refrigerators

Model A

Model B

0-2

2-4

4-6

6-8

8-10

10-12

i. What is the average life of each model of these refrigerators?

ii. Which models has greater uniformity?

Solution: We have to compute coefficient of variation (C.V.) to determine the uniformity.

For Model A

Life

(in years)

No. of refrigerators

(f)

mid. (x)

value

f𝑥²

0-2

2-4

4-6

6-8

8-10

10-12

144

325

343

405

484

N = ∑ f = 50

∑ fx = 256

∑ fx²= 1701

̅X = ^{∑ f𝑥} = ²⁵⁶ = 5.12 years

N 50

= 2.793 years

C.V. (Model A) = %

For Model B

Life

(in years)

No. of refrigerators (f)

mid. value (x)

f𝑥²

0-2

2-4

4-6

6-8

8-10

10-12

133

300

931

729

121

N = ∑ f = 50

∑ fx = 308

∑ fx²= 2146

̅X = ^{∑ f𝑥} = ³⁰⁸ = 6.16 years

N 50

= 2.2303 years

C.V. (Model B) = %

(i) The average life of each model of these refrigerators are

̅X (Model A) = 5.12 years

& ̅X (Model B) = 6.16 years

(ii)Since, C.V. (Model A) > C.V. (Model B). Therefore, model B of refrigerator has greater uniformity than model A. PU 2015 (Fall)

1. b) Lives of two models A & B of objects in a recent survey are:

Life	0-2	2-4	4-6	6-8	8-10	10-12
Model A	5	16	13	7	5	4
Model B	2	7	12	19	9	1

Which model has greater uniformity?

PU 2016 (Fall)

1. (a) For a computer controlled lathe whose performance was below par, workers record the following causes and their frequencies:

Power fluctuation 6

Controller not stable 22

Operator error 13

Worn tool not replaced 2

Other 5

Construct Pareto chart.

(i) What percentage of the cases are due to an unstable controller?

(ii) What percentage of the cases is due to either unstable controller or operator error?

Solution

Arrange data in descending order and obtain frequencies and percentage cumulative frequencies as follow;

Categories	Frequency	Cumulative frequency	% cumulative frequency
Controller not stable	22	22	46
Operator error	13	35	73
Power fluctuation	6	41	85
Worn tool not replaced	5	46	96
Others	2	48	100

(i) The percentage of the cases are due to an unstable controller = 100 = 45.83 %

(ii) The number of cases is due to either unstable controller or operator error = 22 + 13 = 35

∴ The percentage of the cases is due to either unstable controller or operator error 100

= 72.92 %

PU 2016 (Spring)

1.(b) An analysis of monthly wages paid to the workers in two firms A and B belonging to the same industry gives the following results: (use population)

	Firm A	Firm B
No. of workers	500	600
Average monthly wages (Rs)	186	175
Variance of distribution of wages (Rs)	81	100

i. Which firm, A or B has a larger wage bill?

ii. In which firm, A or B is there greater variability in individual wages?

iii.Calculate (a) the average monthly wages (b) the variance of the distribution of wages, of all the workers in the firm A and B taken together. Solution:

For Firm A For firm B n₁= 500 n₂= 600

̅X₁= Rs 186 ̅X₂= Rs175 σ₁² = 81 σ₂² = 100

σ₁= √81 = 9 σ₂= √100 = 10

(i) For firm A

or, ∑ X₁ = n₁× ̅X₁ = 500× 186 = Rs. 93000 For firm B

or, ∑ X₂ = n₂× ̅X₂ = 600× 175 = Rs.105000

Since ∑ X₁ < X₂ therefore, firm B has a larger wage bill than firm A.

(ii)

C.V.

= 4.838%

C.V.

= 5.714%

Since, C.V. (Firm A) < C.V.(Firm B). Therefore, in firm B there greater variability in individual wages than firm A .

(iii) (a) The average monthly wages of all the workers in the firm A and B taken together is given by

= Rs. 180

(b) The variance of the distribution of wages of all the workers in the firm A and B taken together is

= 121.363

Where d₁= ̅X₁− ̅X₁₂= 186−180 = 6

d₂= ̅X₂− ̅X₁₂ = 175−180 =- 5

Example

For a group of 200 candidates, the mean and standard deviation were found to be 40 and 15. Later on it was discovered that the score 53 was misread as 35. Find the correct mean and standard deviation corresponding to the correct figures.

Solution:

We have given,

n = 200 Mean (𝑋̅) = 40

Standard deviation = 15

Wrong observation (i.e. wrong score) = 35 Corrected observation (i.e. correct score) = 53 Corrected Mean (𝑋̅ correct) =?

Corrected standard deviation(𝜎_{𝑐𝑜𝑟𝑟𝑒𝑐𝑡}) =?

We know,

or, 40 =

or, ∑ X = 200 × 40

or, ∑ X = 8000

Corrected ∑ X = ∑ X – Wrong observation + Correct observation

= 8000 – 35 + 53 = 8018

Correct mean

Again,

S.D.

or, 15 = or, 15 =

or, 15 =

Squaring both sides

or, 225 = or, 225 + 1600 = or, 1825 =

or, ∑ 𝑋² = 1825× 200 = 365000

Corrected ∑ 𝑋²= ∑ 𝑋² – (Wrong observation)² + (Correct observation)² = 365000 – (35)² + (53)² = 366584

Corrected S.D. =

= 15.02

Example

The mean and standard deviation of a set of 100 workers were found to be 40 and 12 respectively. On checking, it was found that two workers were wrongly taken as 23 and 15 instead of 43 and 18.

Calculate the correct mean and standard deviation. Also, find correct variance.

Solution:

We have given,

Total no. of observations (n) = 100

Mean (𝑋̅)= 40

Standard deviation = 12

Wrong observations = 23 and 15

Correct observations = 43 and 18

We know,

or, 40 =

or, ∑ 𝑋 = 100 × 40

or, ∑ 𝑋 = 4000

Corrected ∑ 𝑋 = ∑ 𝑋 – Wrong observations + Correct observations = 8000 – 23–15 + 43+18 = 4023

Correct mean

Again,

S.D.

or, 12 = or, 12 =

or, 12 =

Squaring both sides

or, 144 = or, 144 + 1600 = or, 1744 = or, ∑ 𝑋² = 1744× 100 = 174400

Corrected ∑ 𝑋²= ∑ 𝑋² – (Wrong observations)² + (Correct observations)²

= 174400 – (23)² – (15)² + (43)² + (18)²

= 174400 – 529 – 225 + 1849 + 324 = 175819

Corrected S.D. =

= 11.82

Correct variance (σ²_correct ) = (σ_correct)² = (11.82)² = 139.737

Corrected mean (̅X) = 40.23

Corrected standard = 11.82

& Correct variance (σ²_correct ) = 139.737 Additional question

A factory produces two types of CFL bulbs A and B . The following results were obtained relating to their life

	Bulb A	Bulb B
No. of bulbs	100	90
Average length of life	900 hours	1000 hours
Variance	121	144

(a) Compare the variability of life of two types of CFL bulbs.

(b) Calculate the standard deviation of both types of CFL bulbs taken together.

(a) For Bulb A			For Bulb B
n₁= 100			n₂= 90
̅X₁= 900 hours			̅X₂= 1000 hours

σ₁= √121 = 11 σ₂= √144 = 12

C.V. C.V.

= 1.222% = 1.2%

Since, C.V. (Bulb A) > C.V.(Bulb B). Therefore, the life of type of Bulb A is more variability than type of Bulb B. That is, the life of type of Bulb B is more consistent than type of Bulb A.

The standard deviation of both types of CFL bulbs taken together (i.e. combined standard deviation) is

Where,

= 947.3684

₁= ̅X₁− ̅X₁₂= 900947.3684 = - 47.3684

d₂= ̅X₂− ̅X₁₂ = 1000947.3684= 52.631

c) Coefficient of variation of both types of CFL bulbs taken together (i.e. combined C.V.) is

Combined C.V. =

= 0.392%

PU2014 (fall)

1. (b) The first two groups have 100 items with mean 45 and variance 49. If the combined group has 250 items with mean 51 and variance 130, find the mean and standard deviation of the second group.

Solution:
first group			second group Combined group
n₁= 100			n₂= 150 n₁+ n₂ = 250
̅X₁= 45			̅X₂= ? ̅X₁₂= 51

σ₁² = 49 σ₂ = ? ₁₂² = 130

or, or,

or, 4500 + 150̅X₂ = 12750 or, 150̅X₂ = 12750 - 4500 or, 150̅X₂ = 8250

or,

̅X₂ = 55

And

or, 130 = or,

or, 10900 + 150 σ₂² = 19500 or, 150 σ₂² = 19500 – 10900 or, 150 σ₂² = 8600

or,

or, σ₂² = 57.33

𝜎₂ = 7.571 Where,

d₁= ̅X₁− ̅X₁₂ = 45 −51 = - 6

d₂= ̅X₂− ̅X₁₂ = 5551 = 4

The End.

Notes of pokhara university

About Me

Thursday, August 8, 2024

Probability and statistics Chapter-2

Unit 2 Summarizing and Describing the Numerical Data

Numerical problems

Popular Posts

Contact Form

Wikipedia

60	63	64	71	67	73	79	80	83	81
86	90	96	98	98	99	89	80	77	78
71	79	74	84	85	82	90	78	79	79
78	80	82	83	86	81	80	76	66	74

68	84	73	82	68	90	62	88	76	93
73	79	75	73	60	93	71	59	85	75
61	65	88	87	74	62	95	78	63	72
66	78	75	75	94	77	69	74	68	60
96	78	82	61	75	95	60	79	83	71
79	62	89	97	78	85	76	65	71	75
65	80	67	57	88	78	62	76	53	74
86	67	73	81	72	63	76	75	85	77

65	36	49	84	79	56	28	43	67	36
43	78	37	40	68	72	70	55	62	82
88	50	60	56	57	46	39	57	22	65
59	48	76	74	80	69	51	40	56	45
35	21	62	52	63	32	86	64	53	34

60	63	64	71	67	73	79	80	83	81
86	90	96	98	98	99	89	80	77	78
71	79	74	84	85	82	90	78	79	79
78	80	82	83	86	81	80	76	66	74

68	84	73	82	68	90	62	88	76	93
73	79	75	73	60	93	71	59	85	75
61	65	88	87	74	62	95	78	63	72
66	78	75	75	94	77	69	74	68	60
96	78	82	61	75	95	60	79	83	71
79	62	89	97	78	85	76	65	71	75
65	80	67	57	88	78	62	76	53	74
86	67	73	81	72	63	76	75	85	77

65	36	49	84	79	56	28	43	67	36
43	78	37	40	68	72	70	55	62	82
88	50	60	56	57	46	39	57	22	65
59	48	76	74	80	69	51	40	56	45
35	21	62	52	63	32	86	64	53	34

About Me

Thursday, August 8, 2024

Probability and statistics Chapter-2

Unit 2 Summarizing and Describing the Numerical Data

Numerical problems

Popular Posts

Subscribe To

Contact Form

Wikipedia

60	63	64	71	67	73	79	80	83	81
86	90	96	98	98	99	89	80	77	78
71	79	74	84	85	82	90	78	79	79
78	80	82	83	86	81	80	76	66	74

68	84	73	82	68	90	62	88	76	93
73	79	75	73	60	93	71	59	85	75
61	65	88	87	74	62	95	78	63	72
66	78	75	75	94	77	69	74	68	60
96	78	82	61	75	95	60	79	83	71
79	62	89	97	78	85	76	65	71	75
65	80	67	57	88	78	62	76	53	74
86	67	73	81	72	63	76	75	85	77

65	36	49	84	79	56	28	43	67	36
43	78	37	40	68	72	70	55	62	82
88	50	60	56	57	46	39	57	22	65
59	48	76	74	80	69	51	40	56	45
35	21	62	52	63	32	86	64	53	34