Unit 2 Summarizing and Describing the
Numerical Data
Measures of central tendency
An “average” is a single value which
is the representative of the entire distribution and it lies between the two
extreme observations (i.e. the largest and smallest observations) of the
distribution and give us an idea about the concentration of the values in the
central part of the distribution. The measures of such single value is known as the ‘Measures of Central Tendency”
or “measures of location” Thus, Measures of central tendency are used
to describe the middle or Centre of data set.
Various Measures of Central Tendency
The following are the measures of central tendency or
measures of location:
1. Arithmetic mean
(i) Simple Arithmetic Mean(̅X)
(ii) Weighted Arithmetic Mean (̅Xw)
2. Median (Md)
3. Mode(Mo)
4. Geometric mean(G.
M. ) and
5. Harmonic mean(H.
M. )
Note: Geometric mean(G.
M. ) and Harmonic mean(H.
M. ) are beyond of our
syllabus.
Arithmetic Mean (A.M.)
The arithmetic mean is the most popular and widely used
measure of central tendency. It is also called simply ‘the mean’ or ‘the
average’. It is also considered as an ideal measure of central tendency or the
best-known measures of central tendency because it satisfies almost all
requisites of ideal measure of central tendency given by Prof. Yule.
Arithmetic mean may either be
(i) Simple arithmetic mean or (ii) Weighted arithmetic
mean
Simple arithmetic mean
In case of simple arithmetic mean, all the items in the
distribution are equally important. It is denoted by ̅X (X bar)
Calculation of Arithmetic mean:
Individual Series
(i)
Direct method
Where ∑ X = the sum of
observations n = the number of
observations.
(ii)
Short-cut method or assumed mean
method or change of origin method
Where a = assumed mean or assumed value d = X – a = Deviations of the
items from the assumed mean.
n = no. of observations.
There is no any hard and fast rule for the selection of 'a'
but better to take between highest and lowest values.
Discrete Series
(i)
Direct method
Where N = Σf = Total frequency
(ii)
Short-cut method or assumed mean
method or coding method or change of origin method
Where a = Assumed
mean d = X - a = Deviation
of the items from the assumed mean
N = ∑ f
= Total frequency
Continuous Series (Grouped Data)
(ii) Direct method
Where X = midpoint of
the class interval N = Σf =
Total frequency
mid. value (X) =
(ii)
Short-cut method or assumed mean
method or coding method or change of origin method
Where a = Assumed mean
d = X - a
= Deviation of the items from the assumed mean
N = ∑ f
= Total frequency
(iii)
Step-deviation method or change of origin and scale method or coding
method
h
Where,
d'
=
X = Mid.value a
= Assumed mean
h = Class size or class width
Note:
(i)
For unequal class size, h is taken as
common factor
(ii)
For mean, it is not necessary to be
equal class size and exclusive class (i.e. adjusted class)
Weighted Arithmetic Mean
While calculating simple arithmetic mean, it is based on the
assumption that all the items in the distribution are equally important. But in
practice, this may not be so. The relative importance of some items in a
distribution are more important than others. So, when the weights are assigned
for individual items with their relative importance or priorities (or weights),
then the arithmetic mean calculated with respect to their priorities is called
weighted arithmetic mean.
Then, weighted arithmetic mean is given by
Where,
X = Value of variable in rate (per)
w = given weight or proportion
or frequency
Combined mean or Mean of combined Series
For two groups or two series
Combined mean (
For three
groups or three series
Combined mean (
where
n1 =Size of first
group n2
= Size of second group
n3 = Size of third group
̅X1
= Mean of first group
̅X2 = Mean of second group
̅X3 = Mean of third group
Corrected mean
Correct mean (
Where,
Incorrect ∑ X = n × ̅X = n incorrect mean
Correct ∑ X = Incorrect ∑ X - Incorrect items + correct items
Median or Positional average (Md)
The variate value which divides the total number of
observations into two equal parts is called the median. It is denoted by Md.
Md is suitable measure of central
tendency (or average) for the qualitative characteristics such as knowledge,
intelligent, beauty, honesty, talent, good, bad, defective, etc.
It is also more appropriate (or
suitable) average (or measure of central tendency) for the open ended
classified data.
Note:
(i)The classes should be exclusive type
(ii)For calculation of Md. It is not necessary to
be equal class size.
Calculation of median depends upon
the given series
For Individual series
At first, arranging the given set of observations (data) in ascending order of
magnitude.
Median (Md.) = Value of item
Where n = no. of observations
In discrete series:
At first, arrange the given data in
ascending order of their magnitudes.
Obtain the less than cumulative
frequency (c.f.)
Median (Md.) = Value of item
Where N = ∑ f
= Total frequency
For continuous series
Prepare the less than cumulative
frequency distribution.
Find
See cumulative frequency equal to or
just greater than the value of and note the
corresponding frequency.
The corresponding class contains the
median value and is called the median class.
Md = L +
Where,
N = ∑ f = Total frequency L = Lower limit of median class f = frequency of median class h = with of median class or class size of median class.
c.f. = Less than
cumulative frequency preceding the median class
Note:
(i)The classes should be exclusive type
(ii)For calculation of Md. It is not necessary to
be equal class size.
Mode or Modal value or Most repeated value or most usual value (Mo)
Mode is that variate value which repeats maximum number of
times.
It is used to find the most common size of pen drive, size of
shoes, size of T-shirts and other readymade garments.
Calculation of mode
The mode for
various distributions is given below.
For Individual series:
Mode = Value of variable X which repeats maximum number of
times
For
discrete series
Case I: If the distribution is regular and unimodal (i.e. only one maximum
frequency).
Mode = Value of variable X corresponding to maximum frequency
Case II: When the distribution is regular and bimodal or
multimodal, the mode can be determined by using empirical relation
Mo = 3Md - 2̅X
Case III: When the distribution is irregular,
mode is determined by using grouping method.
For continuous series (Grouped
frequency distribution)
Case I: If the distribution is regular and
unimodal, mode is calculated by using the following formula.
Where
f1 = maximum
frequency or modal class frequency. f0 = preceding frequency of
modal class. f2 =
following frequency of modal class.
L = lower limit of modal class.
h = class size or width of modal class.
Case II: When the distribution is regular
and bimodal or multimodal, the mode can be determined by using empirical
relation
Mo = 3Md - 2̅X
Case III:
When the distribution is irregular, mode is determined by using grouping
method.
Note: Case III is beyond of our syllabus.
Note: For Mode
(i) It is necessary to be equal class size as well as exclusive
class intervals.
Note: To construct class intervals if
mid values are given
If the mid. values of the
distribution are given. So, at first we need to construct the class
intervals. Class size (h) = difference
between two successive mid-values
= . . .
= . . .
Subtract from the first middle value for lower limit of first
class interval and add to the same mid value for the upper limit of first class
interval and so on. Other class intervals are constructed in the similar
fashion.
The Partition Values
The values which divide the total number of observations into
a number of equal parts are called partition values. Thus, median may also be
regarded as a particular partition value because it divides the given data into
two equal parts.
Depending upon the equal number of parts, the important
amongst these partition values are
Quartiles
Deciles
percentiles
Note: (i) For all series, for
partition values, at first arranging the given data in ascending order.
(ii)
For all partition values, no need to
be equal class size but it is necessary to be exclusive class.
(iii)
For all partition values, at first
arrange the given data in ascending order of magnitude.
(iv)
𝐐𝟏 = 𝐏𝟐𝟓 , Md = 𝐐𝟐 = 𝐃𝟓= 𝐏𝟓𝟎, 𝐏𝟕𝟓
= 𝐐𝟑
Quartiles
Individual series
After arranging the given data in ascending order of
magnitudes,
Quartiles can be obtained by the following formula
Qi = value of item.
Where, i = 1, 2, 3 n = No. of observations
Discrete series
Qi = value
of item.
N = ∑ f
= Total frequency
i = 1, 2 & 3
Continuous series or Grouped
frequency distribution
where, i = 1, 2, 3
= the size for ith quartile’s class
L = lower limit of ith quartile's
class f = frequency of ith quartile's
class h = class size or width of ith quartile's
class
c.f. = preceding c.f. of ith quartile's class.
Deciles:
Individual series
Deciles:
After arranging the given data in ascending order of
magnitudes,
Individual series
Dj = value of item.
Where j = 1, 2, 3. . ., 9 n = No. of observations
Discrete
series
Dj = value of item. where,
N = ∑ f = Total frequency i= 1, 2 ,3 . . . . . ,9
Continuous series or Grouped
frequency distribution
Where, j =1, 2,3, . . . . .,9
Where,
= the size for jth decile’s class
L
= lower limit of jth decile’s class
f = frequency of jth decile’s
class h
= class size or width of jth decile's
class
c.f. = preceding c.f. of jth decile’s
class.
Percentiles:
The variate values
which divide the total number of observations into 100 equal parts are called
percentiles.
Case I: To find the highest value (maximum value) of % failed, lowest earner, poorest, flattest, shortest
etc.
i.e. The highest income of the poorest 40% of the people is
given by 40th percentile i.e. P40.
Case II: To find the limits (Range) of middle
%
i.e. The
limits of income of middle 50% of families is given by the 25th and
75th percentiles. i.e. P25 and P75.
Case III: To find the lowest value (or
minimum value) of % top, pass, richest, highest earner, longest, tallest etc.
i.e. The lowest income of the richest 40% of the people is given by 60th percentile i.e. P60.
Percentiles:
Individual series
Percentiles:
After arranging the given data in ascending order of
magnitudes,
Individual series
Pk = value of item.
Where k = 1, 2, 3. . . . . . . ,99 n = No. of observations Discrete
series
Pk = value of item.
where, N = ∑ f = Total frequency k= 1, 2 ,3 . . . . . . . . ,99
Continuous series or Grouped
frequency distribution
Where, k =1, 2,3, . . . . . . .,99
Where,
= the size for kth percentile’s class
L
= lower limit of kth percentile’s class f
= frequency of kth percentile’s class h
= class size or width of kth percentile's class
c.f. = preceding c.f. of kth percentile’s
class.
Note: 𝐐𝟏 = 𝐏𝟐𝟓 , Md = 𝐐𝟐 = = 𝟓𝟎, 𝐏𝟕𝟓
= 𝐐𝟑
Measure of Variation (Measures of Dispersion)
The variability or the scatterness of the items from the
central values is called dispersion and its measure is the measure of
dispersion or the measure of variation.
Thus, measures of dispersion are statistical tools i.e.
descriptive statistical measures which are used to measure the variation or
spread or scatterness or deviation of data from the central value. So, it gives an idea of homogeneity or
heterogeneity of the distribution.
Measures of Dispersion
The various measures of dispersion are as follows.
1. Range
2. Quartile deviation or
Semi-interquartile range
3. Mean deviation or Average deviation.
4. Standard deviation
5.
Lorenz
curve
6. Ginni’s coefficient
Note: But Mean deviation or Average
deviation, Lorenz curve and Ginni’s coefficient are beyond of our syllabus.
Range
Range is the simplest of all the measures of dispersion. It
is defined as the difference between largest (maximum) value and smallest
(minimum) value for the given observations of the distribution.
For all series
Range (R) = L – S
Where, L = Largest
item or observation
S
= Smallest item or observation
Its relative measure of dispersion is known as coefficient of
range and the coefficient of range is
given by
Coefficient of range
=
Quartile Deviation or Semi-interquartile Range (Q.D.)
Quartile deviation is a measure of dispersion based on the
upper quartile and lower quartile Q1.
The difference between the upper quartile Q3 and lower quartile Q1
is known as inter-quartile range.
Inter-quartile
range = Q3-Q1
The half of the inter-quartile range is called
semi-interquartile range, which is also known as quartile deviation.
Quartile Deviation
(Q.D) =
Coefficient of Q.D. =
Note:
|
Quartile
deviation (Q.D.) is the most suitable or appropriate measure of dispersion
for open end classes. |
|
Less the
coefficient of Q.D. implies more will be the uniformity or less will be the
variability. |
|
Greater the
coefficient of Q.D. implies less will be the uniformity or greater will be
the |
variability.
For individual series
Quartile
deviation (Q.D.) =
Where , Q1
= value of item.
Q3
= value of item n = No. of observations
For discrete series
Quartile deviation (Q.D.) =
Where, Q1 = value of item.
Q3 = value of item
N = ∑ f = Total frequency
For continuous series
Quartile deviation (Q.D.) =
where
3 N
N = ∑ f
= Total frequency
Its relative measure is known as coefficient of quartile
deviation and is given by
Coefficient
of quartile deviation =
Standard Deviation:
Standard
deviation is defined as “the positive square root of the arithmetic mean of the square of the
deviations of the given set of observations from their arithmetic mean.” It is
usually denoted by Greek alphabet (sigma).
Standard deviation is said to be the best measure of
dispersion (or ideal measure of dispersion) as it satisfies almost all the
requisites (or characteristics) of an ideal or a good measure of
dispersion.
For Individual Series
(i)
S.D. (- direct method
(ii)
S.D. (- Short cut method
Where,
d = X – a
a = Assumed
mean n =
number of observations
For Discrete Series
(i)
S.D. (- Direct method
(ii)
S.D. (- Short cut method
Where,
d = X – a
a = Assumed mean
N = ∑ f
= Total frequency
For Continuous Series
(i)
S.D. - direct method
(ii)
S.D. (- Short cut method
(iii)
S.D. (- Step deviation
method
Where,
d = X – a
X = mid value
a = Assumed mean
N = ∑ f
= Total frequency
h = class size or width of class size
Note: But for unequal class size h is taken
as common factor
Variance
The square of the standard deviation is known as variance. It
is denoted by 𝜎2 and given by σ2 = V(X) Where V(X) = variance of variable X
⟹ σ
= √V(X)
Coefficient of Variation (C.V.)
100 times the
coefficient of standard deviation is called coefficient of variation. In other
words, the coefficient of standard deviation expressed in percentage is known
as coefficient of variation. Symbolically,
C.V. = × 100%
It is a
relative measure of dispersion, so it is independent of units of measurement.
It is always expressed in percentage.
Therefore, C.V. can betterly be used to compare two or more than two
distributions with regard to their variability, consistency, uniformity,
homogeneity, equitability, stability etc.
Coefficient
of variation (C.V.) is applicable for the comparison of variability of two or
more than two distributions (series) as follows
Less C.V. is
considered as |
More C.V. is
considered as |
More consistent
|
Less consistent |
More homogeneous |
Less homogeneous |
More uniform |
Less uniform |
More stable |
Less stable |
More representative to mean |
Less representative to mean |
More equitable |
Less equitable |
Less variable |
More variable |
Less disparity |
More disparity |
Sample standard deviation (s):
A standard deviation which is based on sample observations is
called sample standard deviation. It is denoted by ‘s’.
(ii) s =
(iii) s =
Sample coefficient of variation
(C.V.) =
Sample
variance (𝐬𝟐) :
The square of sample
standard deviation is called sample variance. It is denoted by s2.
Combined Standard deviation:
For two groups (two series)
Where, d1 = ̅X1- ̅X12 d2
= ̅X2- ̅X12
For three groups (Three series)
Combined standard deviation is
Where, d1 = ̅X1- ̅123 d2
= ̅X2- ̅123
d3 = ̅X2- ̅X123
Five-Number Summary
The five-number summary provides the five descriptive
measures of the given data set. So, it consists of the smallest value (X smallest),
the first quartile or lower quartile (Q1), Median (Md or Q2),
third quartile or upper quartile (Q3) and the largest value (X largest).
Therefore, the five number summary is
(Xsmallest , Q1
, Median ,
Q3 , Xlargest)
The Box -and –Whisker plot
A
five-number summary can be represented in a diagram known as a box and whisker
plot. Therefore, a box- and –whisker
plot is a graphical representation of the data based on the five number
summary. That is, smallest value, Q1, Md, Q3 and largest
value. It is the graphical method of measuring skewness of the distribution.
The vertical
line drawn at the left side of the box represents the location of Q1
and the vertical line at the right side of box represents the location of Q3.
Thus, the box contains the middle 50% of the values. The lower 25% of the data
are represented by a line (known as whisker) connecting the left side of the box to the location of the
smallest value, X smallest. Similarly, the upper 25% of the data are
represented by a line( known as whisker) connecting the right side of the box
to X largest as shown in
Comparison |
Left skewed |
Right skewed |
Symmetric |
1. The distance
from Xsmallest to the median verses the distance from the median to X largest |
The distance from the X smallest to the median is greater than the distance
from the median to X largest |
The distance from X smallest to the median is less than the
distance from the median to X largest |
Both distances are the same |
2. The distance from X smallest to Q1 verses the
distance from Q3 to X largest |
The distance from X smallest to Q1
is greater than the distance from Q3 to X largest |
The distance from X smallest to Q1 is less than the distance from Q3 to X largest |
Both distances are same. |
3. The distance from Q1 to
the median verses the distance from
the median to Q3. |
The distance from Q1 to the median is greater than
the distance from the median to Q3 |
The distance from
Q1 to the median is less
than the distance from the median to Q3 |
Both distances are same |
Numerical problems
Example1: Compute mean, median and mode of the
following data
55 |
39 |
45 |
55 |
41 |
35 |
60 |
40 |
55 |
35 |
37 |
55 |
55 |
65 |
Solution:
Arranging
given data in ascending order:
X: 35, 35,
37, 39, 40, 41, 45, 55, 55, 55, 55, 55, 60, 65
Mean,
= 48
Median (Md.)
= Value of item
= Value of item
= Value of 7.5th item
= Value of
= 50
Mode (Mo) = Value of variable X which repeats
maximum number of times
= 55
Example 2: Find Q1, D3 and
P65 from the given data: 8, 6, 5, 4, 10, 15, 3, 16
Solution
Here, the
number of observation, i.e. n = 8
First, the
data are arranged in ascending order: 3, 4, 5, 6, 8, 10, 15, 16.
Q1
= Value of item
= value of item
= value of 2.25th item
= 2nd item +0.25 (3rd
– 2nd) item
Q1 = 4 + 0.25 (5 – 4) = 4.25
D3 = Value of item
= value of item
= value of 2.7th item
= value of 2nd item + 0.7 (3rd
– 2nd) item
D3 = 4 + 0.7 (5 – 4) = 4.7
P65 = Value of item
= value of item
= value of 5.85th item
= value of 5th item + 0.85
(6th item - 5th item)
= 8 + 0.85 (10 – 8) = 9.7
P65 = 9.7
Example 3: The number of telephone calls received at an exchange
for 200 successive one-minute intervals are given below.
No. of calls |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
Total |
Frequency |
15 |
22 |
28 |
35 |
42 |
34 |
24 |
200 |
Compute the mean, median and mode.
Solution:
No. of calls (X) |
Frequency (f) |
Less than c.f. |
fX |
0 1 2 3 4 5 6 |
15 22 28 35 42 34 24 |
15 37 65 100 142 176 200 |
|
|
N = ∑f = 200 |
|
|
Mean,
= 3.325
Median (Md.)
= Value of item
= Value of item
= Value of 100.5th item
= 4
Mode (Mo) = Value of variable X corresponding
to maximum frequency
= 4
Example 4: Find upper quartile and upper decile
from the given data. Also obtain P77.
X |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
F |
2 |
5 |
8 |
10 |
12 |
8 |
6 |
4 |
3 |
2 |
1 |
Solution
Calculation of partition values
X |
f |
Less than c.f. |
1 2 3 4 5 6 7 8 9 10 11 |
2 5 8 10 12 8 6 4 3 2 1 |
2 7 15 25 37 45 51 55 58 60 61 |
|
N =61 |
|
For Q3,
Q3
= value of item
= value of item
= value of 46.5th item. The value in c.f. just greater than 46.5 is
51.
So upper
quartile Q3 = 7.
For D9
:
(D9) = value of item
= value of item
= Value of 55.8th item
The value of
c.f. just greater than 55.8 is 58.
So, D9 = 9. For P77
P77
= value of item
= value of item
= value of 47.74th item .
The value of c.f. just greater than 47.74 is
51.
P77 = 7.
Example 5: The length power failure in minute
are recorded in the following table.
Power Failure time |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
Total |
Frequency |
2 |
5 |
7 |
10 |
4 |
3 |
2 |
33 |
Find Q3, D2 and P40 and interpret the results.
Solution:
Power failure time (X) |
Frequency (f)
|
Less than c.f. |
22 23 24 25 26 27 28 |
2 5 7 10 4 3 2 |
2 7 14 24 28 31 33 |
|
N = 33 |
|
For Q3,
Q3
= value of item
th
= value of item
= value of 25.5th item.
The value in c.f. just greater than 25.5 is 28.
So upper quartile Q3 = 26 minutes. For D2
:
D2 = value of item
= value of item
= Value of 6.8th item
The value of
c.f. just greater than 6.8 is 7.
So, D2 = 23 minutes For P40
P40 = value of item
= value of item
= value of 13.6th item.
The value of
c.f. just greater than 13.6 is 14.
P40 = 24 minutes
Example 6: The length in meter of 100 VGA Cable
used in a company are measured to the nearest 0.01 meter and the results are
given below.
Length in meter |
Frequency |
Length in meter |
Frequency |
3.80-3.89 |
3 |
4.20-4.29 |
28 |
3.90-3.99 |
8 |
4.30-4.39 |
18 |
4.00-4.09 |
14 |
4.40-4.49 |
10 |
4.10-4.19 |
19 |
4.50-4.59 |
8 |
Find the value of mean, mode and median.
Solution:
Correction factor =
Length in meter |
Frequency (f) |
Less than c.f. |
Mid .value (X) |
f X |
3.80-3.89
3.90-3.99 4.00-4.09 4.10-4.19 4.20-4.29 4.30-4.39 4.40-4.49 4.50-4.59 |
3 8 14 19 28 18 10 8 |
3 11 25 44 72 90 100 108 |
3.845 3.945
4.045 4.145 4.245 4.345 4.445 4.545 |
11.535 31.56 56.63 78.755
118.86 78.21 44.45
36.36 |
|
N= ∑ f = 108 |
|
|
∑ fx =456.36 |
Mean (
= 4.225 meters
For Mode:
Since, the
given frequency distribution is regular and unimodal and maximum frequency is
28. So, modal class is 4.20-4.29 but its exclusive class is 4.195-4.295
L= 4.195, h = 0.1, f1= 28, f0=
19, f2= 18
Mode (Mo) =
L + × h
= 4.195 +
= 4.24 meters
For median
(Md):
.
∴ Median class is 4.20-4.29
but its exclusive class is 4.195-4.295 L = 4.195, f = 28,
h = 0.1, c.f. = 44
Median (Md)
= L +
= 4.195 +
= 4.23 meters
Example 7: The percentage age distribution of urban male population of
Nepal from 2011 census is given below:
Age group |
Male population |
Age group |
Male population |
0-4 5-9 10-14 15-19 20-24 25-29 30-34 |
11.8 12.9
12.5 11.2 10.7 8.9 7.2 |
35-39 40-44 45-49 50-54 55-59 60 and above |
6.2 4.7 4.0
2.9 2.3 4.7 |
Compute the first and third quartiles, 8th decile and 70th percentile.
Solution:
Correction
factor =
Age group |
Male population (f) |
Less than c.f. |
0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60 & above |
11.8 12.9
12.5 11.2 10.7 8.9 7.2 6.2
4.7 4.0 2.9 2.3 4.7 |
11.8 24.7
37.2 48.4 59.1 68.0 75.2 81.4 86.1 90.1 93.0 95.3 100 |
|
N = ∑ f = 100 |
|
For lower
quartile or first quartile (𝑄1)
𝑄1 lies in class 10-14 but its exclusive class (adjusted class)
is 9.5-14.5)
L = 9.5,
f = 12.5, c.f. = 24.7, h = 5
= 9.5+
= 9.62
For third quartile (𝑄3)
𝑄3 lies in class 30-34 but its exclusive class (adjusted class)
is 29.5-34.5 ,
L =
29.5, f = 7.2, c.f. = 68,
h = 5
= 29.5 +
= 34.36
8th
decile (D8)
D8 lies in class 35- 39 but its exclusive class is 34.5-39.5
L = 34.5,
f =6.2 , c.f. = 75.2, h = 5
= 34.5 +
= 38.37
70th
percentile (P70)
P70 lies in class 30-34 but its exclusive class is 29.5-34.5
L =
29.5, f = 7.2, c.f. = 68,
h = 5
= 29.5 +
Example 8: The marks distribution of 100
students of a college is as follows.
Marks |
10-20 |
20-40 |
40-70 |
70-90 |
90-100 |
No. of students |
15 |
20 |
30 |
20 |
15 |
(i)
Find
the highest mark of the weakest 30% of the students.
(ii)
Find
the lowest mark of top 40 % of the students.
(iii)
Find
the lowest marks of top 20% of the students.
(iv) Find the limits and range of marks of
middle 50% of students. Solution:
Marks |
No. of students (f)
|
Less than c.f. |
10-20 20-40 40-70 70-90 90-100 |
15 20 30 20 15 |
15 35 65 85 100 |
|
N = ∑
f = 100 |
|
(i) The
highest marks of the weakest 30% of the students is given by P30
30th
percentile (P30)
P30 lies in class 20-40
L = 20,
f = 20, c.f. = 15, h = 20
= 20 +
(ii) The
lowest mark of top 40 % of the students is given by P60
60th
percentile (P60)
P60 lies in class 40-70
L = 40, f = 30,
c.f. = 35, h = 30
60 N
= 40 +
(iii) The
lowest marks of top 20% of the students is given by P80
80th
percentile (P80)
P80 lies in class 70- 90,
L = 70,
f = 20, c.f. = 65, h = 20
= 70 +
= 85 marks
(iv) The
limits of marks of middle 50% of students are given by P25 & P75
25th
percentile (P25)
P25 lies in class 20- 40,
L = 20,
f = 20, c.f. = 15, h = 20
= 20 +
= 30 marks
75th
percentile (P75)
P75 lies in class 70- 90,
L = 70,
f = 20, c.f. = 65, h = 20
= 70 +
= 80 marks
Lower limit,
P25 = 30 marks
Upper limit,
P75
= 80 marks
Range = 75 - P25 = 80 -30 = 50 marks
T.U. 2017 (Spring)
1. (b) The temperature in a chemical reactor was measured
every half hour under the same conditions. The results were 78.1, 79.2, 78.9, 80.2, 78.3, 78.8, 79.4. Calculate the mean, median, lower quartile, upper quartile,
standard deviation and coefficient of variation. Solution: Arranging the given data in ascending order of
magnitude
Temperature (in (X) |
𝑋2 |
78.1 78.3
78.8 78.9 79.2 79.4 80.2 |
6099.61 6130.89 6209.44 6225.21 6272.64 6304.36 6432.04 |
∑ 𝑋 = 552.9 |
∑ 𝑋2 =43674.19 |
Mean,
= 78.985
Median (Md.)
= Value of item
= Value of item
= Value of 4th item
= 78.9
Mode (Mo) = Value of variable X which repeats
maximum number of times
= no mode
Lower
quartile, Q1 = Value of item
th
= Value
of item
= Value of 2nd item = 78.3
Upper
quartile, Q3 = Value of item
= Value of item
= Value of 6th item = 79.4
Standard
deviation
= √6239.17 − 6238.743
=
0.6534
Coefficient
of variation, C.V. =
= 0.8272%
PU 2018 (Spring)
1. (a) The
following data set represents the number of new computer accounts registered
during ten consecutive days.
43, 37, 50,
51, 58, 105, 52, 45, 45, 10
i. Compute the mean, median and standard
deviation.
ii.
Draw
a box and whisker plot and identify whether it is skewed or not. Solution:
Arranging the given data in ascending order of magnitude.
No. of new computer accounts (X) |
|
𝑋2 |
10 37 43 45 45 50 51 52 58 105 |
100 1369 1849 2025 2025 2500 2601 2704 3364 11025 |
|
∑ 𝑋 = 496 |
|
∑ 𝑋2 = 29562 |
(i)
Mean,
= 49.6
Median (Md.)
= Value of item
= Value of item
= Value of 5.5th item
= Value of
= 47.5 OR
Median (Md.)
= Value of item
= Value of item
= Value of 5.5th item
= Value of 5th item + 0.5 (6th item - 5th item )
= 45 + 0.5 (50-45)
= 45 + 0.5 × 5
= 47.5
Mode (Mo) = Value of variable X which repeats
maximum number of times
= 45
Standard
deviation
= √2956.2 − 2460.16
= √496.04
= 22.271
(ii) To
construct box and whisker plot:
At first we
have to find five number summary
Smallest
value = 10
Largest
value = 105
Lower
quartile, Q1 = Value of item
= Value of item
= Value of 2.75th item
= Value of 2nd item + 0.75 (3rd item - 2nd item )
= 37 + 0.75 (43-37)
= 37 + 0.75 × 6
= 41.5
upper
quartile, Q3 = Value of item
= Value of 3 item
= Value of 8.25th item
= Value of 8th item + 0.25 (9th item - 8th item )
= 52 + 0.25 (58-52)
= 52 + 0.25 × 6
= 53.5
Hence, the
five-number summary, (smallest, Q1, Md, Q3, largest)
is
(10,41.5,
47.5,53.5,105)
(i) Length of left whisker (i.e. the
distance from the smallest value to Q1) =41.5-10 = 31.5
Length of right whisker (i.e. the distance
from Q3 to the largest value) = 105-53.5 = 51.5
(ii)
The
distance from the smallest value to the Md = 47.5 -`10 = 37.5
The distance from Md to largest value =
105-47.5 = 57.5
Since,
Length of left whisker < Length of right whisker
& the
distance from the smallest value to the Md < The distance from Md to largest
value. Therefore, the distribution is positively skewed (i.e. right skewed).
PU 2017 (Spring)
Q.No.1 (a) Over a period of 40 days the percentage relative humidity in a
vegetable storage building was measured. Mean daily values were recorded as
shown below:
60 |
63 |
64 |
71 |
67 |
73 |
79 |
80 |
83 |
81 |
86 |
90 |
96 |
98 |
98 |
99 |
89 |
80 |
77 |
78 |
71 |
79 |
74 |
84 |
85 |
82 |
90 |
78 |
79 |
79 |
78 |
80 |
82 |
83 |
86 |
81 |
80 |
76 |
66 |
74 |
(i)
Prepare
a stem and leaf display for these data. Show the leaves sorted in order of increasing magnitude on each stem.
(ii)
Draw
a box plot for these data and interpret the data in practical manner. Solution:
(i)
Arranging
the given data in ascending order of magnitude:
Percentage
relative humidity(X):
60, 63, 64,
66, 67, 71, 71, 73, 74, 74, 76, 77, 78, 78,78,79,79,79,79, 80, 80, 80, 80, 81,
81,82, 82, 83, 83, 84, 85, 86, 86, 89, 90, 90, 96, 98, 98,99
Stem and leaf display
Stem |
Leaves |
6 |
0 3 4
6 7 |
7 |
1 1 3
4 4 6
7 8 8
8 9 9
9 9 |
8 |
0 0 0
0 1 1
2 2 3
3 4 5
6 6 9 |
9 |
0 0 6
8 8 9 |
Stem and
leaf display shows the ordered values from the smallest value to the largest
(i.e. leaves sorted in order of increasing magnitude on each stem) and where
the concentration of the data occurs.
(ii)
To
construct box and whisker plot (Box plot):
At first we
have to find five number summary
Smallest value = 60 Largest value = 99
th
Q1 = Value of item
= Value of item
= Value of 10.25th item
= Value of 10th item + 0.25 (11th item - 10th item )
= 74 + 0.25 (76-74)
= 74 + 0.25 × 2
= 74.5
Median (Md.)
= Value of item
= Value of item
= Value of 20.5th item
= Value of 20th item + 0.5 (21th item - 20th item )
= 80 + 0.5 (80-80)
= 80 + 0.5 × 0
= 80
Upper quartile ,Q3 = Value of item
= Value of 3 item
= Value of 30.75th item
= Value of 30th item + 0.75 (31th item - 30th item )
= 84 + 0.75 (85-84)
= 84 + 0.75 × 1
= 84.75
Hence, the five-number summary, (smallest, Q1, Md,
Q3, largest) is (60, 74.5,
80, 84.75, 99)
= 74.5-60= 14.5
Length of right whisker (i.e. the distance
from Q3 to the largest value) = 99-84.75 = 14.25
(ii) The distance from the smallest value to
the Md = 80-`60 = 20
The distance from Md to largest value =
99-80 = 19
Since,
Length of left whisker > Length of right whisker
& the
distance from the smallest value to the Md > The distance from Md to largest
value.
Therefore,
the distribution is negatively skewed (i.e. left skewed). It indicates there is a high frequency of high values of
percentage relative humidity in a vegetable storage building are concentrated
on the right side and low frequency of
less values which are on the left tailed. In other words, there is a high
frequency of high values and low frequency of less values of percentage
relative humidity in a vegetable storage building.
PU 2014 (Spring)
Q.No.2:The following are the number of
minutes that a person had to wait for the bus to work on 15 working days :
10, 1,
13, 9, 5, 9, 2, 10,
3, 8,
6, 17, 2,
10, 15 Draw a box
plot and interpret the result.
Solution: To construct box and whisker plot (Box
plot): At first we have to find five
number summary
Arranging
the given data in ascending order of magnitude
1, 2,
2, 3, 5,
6, 8, 9,
9, 10, 10,
10, 13, 15,
17
Smallest value = 1 minute
Largest value = 17 minutes
Q1 = Value of item
= Value of item
= Value of 4th item = 3
Median (Md.)
= Value of item
= Value of item
= Value of 8th item
= 9
Q3 = Value of item
= Value of 3 item
= Value of 12th item
= 10
Length of right
whisker (i.e. the distance from Q3 to the largest value) = 17- 10 =
7
(ii) The distance from the smallest value to
the Md = 9- 1 = 8
The distance from Md to largest value =
17-9 = 7
Since,
Length of left whisker < Length of right whisker
But, the
distance from the smallest value to the Md > The distance from Md to largest
value. Therefore, the the distribution
is not uniformly distributed.
PU 2016(Fall)
1.(b) A
random sample was taken of the thickness of insulation in transformer windings,
and the following thickness (in millimetres) were recorder:
18
21 22 29
25 31 37
38 41 39
44 48 54
56 56 57
47 38 35
36 29 37
32 42 43
40 48 36
37 37 (i) Prepare a stem-and leaf display for these
data.
(ii) Prepare a box plot for these data.
Solution: Arranging the given data in
ascending order of magnitude:
18, 21,
22, 25, 29, 29,
31, 32, 35,
36, 36, 37,
37, 37, 37
38, 38, 39,
40, 41, 42,
43, 44, 47,
48, 48, 54,
56, 56, 57
(i)
Stem and leaf display
Stem |
Leaves |
1 |
8 |
2 |
1 2 5
9 9 |
3 |
1 2 5
6 6 7
7 7 7
8 8 9 |
4 |
0 1 2
3 4 7
8 8 |
5 |
4 6 6
7 |
(ii)
To construct box and whisker plot (Box plot):
At first we have to find five number
summary
Smallest value = 18 millimetres
Largest value = 57 millimetres
Q1 = Value of item
= Value of item
= Value of 7.75th item
= Value of 7th item + 0.75 (8th item - 7th item )
=
31 + 0.75 (32-31)
= 31 + 0.75 × 1
= 31.75 millimetres
Median (Md.)
= Value of item
= Value of item
= Value of 15.5th item
= Value of 15th item + 0.5 (16th item - 15th item )
= 37 + 0.5 (38-37)
=
37+ 0.5 × 1 = 37.5 millimetres
Q3 = Value of item
= Value of 3 item
= Value of 23.25th item
= Value of 23th item + 0.25 (24th item - 23th item )
= 44 + 0.25 (47-44)
= 44 + 0.25 × 3
= 44.75 millimetres
Hence, the five-number summary is (smallest, Q1, Md,
Q3, largest) is (18, 31.5,
37.5, 44.75, 57)
(ii) The distance from the smallest value to
the Md = 37.5- 18 = 19.5
The distance from Md to largest value =
57—37.5 = 19.5
Since,
Length of left whisker > Length of right whisker
But the
distance from the smallest value to the Md = The distance from Md to largest
value.
Therefore,
the distribution is slightly left skewed. (i.e. the distribution is not
uniformly distributed)
PU 2018 (Fall)
1 (a) An investigator wants to study the speed of cars at Araniko
high and he collected the speed of 30 vehicles and speeds were:
35, 37, 42,
45, 47, 48,
50, 55, 67,
70, 75, 80, 90,
95, 94, 48,
55, 60, 71,
63, 70, 65,
80, 55, 40,
35, 36, 85,
79, 30.
(i)
Present
the above data in stem and leaf display.
(ii)
Construct
continuous frequency distribution using Struge’s rule and
Construct
the cumulative curve and find median speed, speed of first 25% vehicles, speed
of first 75% vehicles and also compute the percentage of vehicles whose speed
lies between 40 to 70 km.
Solution: Arranging the given data in ascending
order of magnitude:
Speed (in
km) X :30, 35, ,35,
36, 37, 40,
42, 45, 47,
48, 48, 50,
55, 55, 55,
60, 63, 65, 67,
70, ,70, ,71,
75, ,79, 80,
80, 85, 90,
94, 95
Stem and leaf display
Stem |
Leaves |
3 |
0 5 5
6 7 |
4 |
0 2 5
7 8 8 |
5 |
0 5 5
5 |
6 |
0 3 5
7 |
7 |
0 0 1
5 9 |
8 9 |
0 0
5 0 4 5
|
(ii) Since,
class size (h) is not given, therefore at first it needs to find the
approximate number of class intervals (k) and class size (h)
Number of
observations, n = 30
S = smallest value =
30 L = Largest value = 95
By Struge’s
formula,
Number of
classes, k = 1 + 3.322logn
= 1 +
3.322 log30
= 1+3.322×
1.4771
=
5.9069 ≈ 6
Class width or class size, h =
Continuous frequency distribution:
Speed (in km) |
Tally bar |
Frequency (f) |
30 – 41 |
|||| | |
6 |
41– 52 |
|||| | |
6 |
52 – 63 |
|||| |
4 |
63 – 74 |
|||| | |
6 |
74– 85 |
|||| |
4 |
85 – 96 |
|||| |
4 |
Less than cumulative frequency
distribution
Speed (in km) |
Less than c.f. |
Less than 41 |
6 |
Less than 52 |
12 |
Less than 63 |
16 |
Less than 74 |
22 |
Less than 85 |
26 |
Less than 96 |
30 |
Median
speed, Md = 58 km
The speed of first 25% vehicles is given by
P25 = = 44 km
& the
speed of first 75% vehicles is given by
P75
= Q3 =
76 km
The number of vehicles whose speed lies between 40 to 70 km
= 4 + 5 + 5
= 14
∴ The percentage of vehicles whose
speed lies between 40 to 70 km
= 46.67%
Note: 𝐐𝟏 = 𝐏𝟐𝟓 , Md = 𝐐𝟐 = 𝐃𝟓= 𝐏𝟓𝟎, 𝐏𝟕𝟓
= 𝐐𝟑
PU 2018 (Spring)
Q.No.1.(b): After the implementation of an
economic program to uplift the economic condition of a community following
information were found.
Monthly
income (Rs. 000) |
4-6 |
6-8 |
8-10 |
10-12 |
12-14 |
16-16 |
16-18 |
After the plan
(no. of
families) |
8 |
65 |
37 |
15 |
15 |
5 |
5 |
Construct an ogive to find
(i) Find the number of families whose
monthly income is between Rs. 8,000 to Rs. 14,000
(ii) Find the number of families whose
monthly income is above Rs. 12,000
Solution:
Less than cumulative frequency distribution
‘Monthly income (Rs. 000) |
Less than c.f. |
Less than 6 |
8 |
Less than 8 |
73 |
Less than 10 |
110 |
Less than 12 |
125 |
Less than 14 |
140 |
Less than 16 |
145 |
Less than 18 |
150 |
(i)
The
number of families whose monthly income is between Rs. 8,000 to Rs. 14,000 = 9+
20+20+20+1 = 70
(ii)
The
number of families whose monthly income is above Rs. 12,000 = 17 +10 =27 OR
The number of families whose monthly income is above Rs.
12,000
= 150-123
= 27
PU 2015 (Spring)
Q.No.1 (a): The following table shows length of
eighty bally bridge:
68 |
84 |
73 |
82 |
68 |
90 |
62 |
88 |
76 |
93 |
73 |
79 |
75 |
73 |
60 |
93 |
71 |
59 |
85 |
75 |
61 |
65 |
88 |
87 |
74 |
62 |
95 |
78 |
63 |
72 |
66 |
78 |
75 |
75 |
94 |
77 |
69 |
74 |
68 |
60 |
96 |
78 |
82 |
61 |
75 |
95 |
60 |
79 |
83 |
71 |
79 |
62 |
89 |
97 |
78 |
85 |
76 |
65 |
71 |
75 |
65 |
80 |
67 |
57 |
88 |
78 |
62 |
76 |
53 |
74 |
86 |
67 |
73 |
81 |
72 |
63 |
76 |
75 |
85 |
77 |
With the reference of above table.
(i)Construct the grouped frequency distribution having class
width 10.
(ii) Draw less than ogive and more than
ogive in same graph and hence locate median.
(iii) By the help of less than ogive , find
the number of bridge having length less than 65 meters.
Solution: (i) Grouped frequency distribution
having class width 10
Length of bally bridge |
Tally
bar |
Frequency (f) |
50-60 |
||| |
3 |
60 –70 |
|||| |||| |||| |||| | |
21 |
70 –80 |
|||| |||| |||| |||| |||| |||| ||| |
33 |
80 –90 |
||||| |||| ||||
|
15 |
90 –100 |
|||| ||| |
8 |
|
|
N = ∑ f = 80 |
Solution:
(ii) Less than and more than cumulative
frequency distribution
Length of bally bridge |
Less than c.f. |
Length of bally bridge |
More than c.f. |
Less than60 |
3 |
More than 50 |
80 |
Less than 70 |
24 |
More than 60 |
77 |
Less than80 |
57 |
More than 70 |
56 |
Less than 90 |
72 |
More than 80 |
23 |
Less than100 |
80 |
More than 90 |
8 |
35
From ogive curve median (Md) = 75 metres
(iii)By the help of less than ogive, the
number of bridge having length less than 65 meters = 10+3 = 13 metres
PU 2017 (Fall)
Q.No.1(a): Following data represents the tensile strength of
steel-rod manufactured by company A located at Biratnagar.
65 |
36 |
49 |
84 |
79 |
56 |
28 |
43 |
67 |
36 |
43 |
78 |
37 |
40 |
68 |
72 |
70 |
55 |
62 |
82 |
88 |
50 |
60 |
56 |
57 |
46 |
39 |
57 |
22 |
65 |
59 |
48 |
76 |
74 |
80 |
69 |
51 |
40 |
56 |
45 |
35 |
21 |
62 |
52 |
63 |
32 |
86 |
64 |
53 |
34 |
Construct a
frequency distribution and represent the data by means of Cumulative frequency
curve. Identify the median and first quartile from the curve. Also interpret
the result of 1st quartile
Solution:
Since, class size (h) is not given, therefore at first it
needs to find the approximate number of class intervals (k) and class size
(h)
Number of observations, n = 50
S = smallest value = 21 L = Largest value = 88
By Struge’s formula,
Number of classes,
k = 1 + 3.322logn
36
= 1 + 3.322
log50
= 1+3.322×
1.6989
=
6.643 ≈ 7
Class width or class
size, h =
Frequency
distribution
Tensile strength of steel rod |
Tally bar |
Frequency (f) |
20 – 30 |
||| |
3 |
30– 40 |
|||| || |
7 |
40 – 50 |
|||| ||| |
8 |
50 – 60 |
|||| |||| | |
11 |
60– 70 |
|||| |||| |
10 |
70 – 80 |
|||| | |
6 |
80-90 |
|||| |
5 |
|
|
N = ∑ f =
50 |
ess than
cumulative frequency distribution
Tensile strength of steel rod |
Less than c.f. |
Less than 30 |
3 |
Less than 40 |
10 |
Less than 50 |
18 |
Less than 60 |
29 |
Less than 70 |
39 |
Less than 80 |
45 |
Less than 90 |
50 |
Less than ogive curve (or Less than cumulative frequency curve)
From ogive curve,
Median (Md) = 59 First quartile (Q1) = 44
Thus,
the first quartile indicates that the tensile strength of first 25% steel-rod is 44. PU 2013 (Fall)
(1) (a) Following information shows the daily wage of workers
of certain hydropower company, prepare suitable ogive that helps to give the
answers of following questions
Daily wages |
0-20 |
20-40 |
40-60 |
60-80 |
80-100 |
No. of workers |
41 |
51 |
64 |
38 |
7 |
(i) About what wage above that 50%
workers earn?
(ii) What would be the daily wage limit of
middle 30% workers?
Additional question:
(iii) If 20% workers are the lowest
earners, find the highest wage of them.
(iv) If 10% workers are the highest
earners, find the lowest wage of highest 10 % of the workers.
Solution:
Less than cumulative frequency distribution
Daily wages |
No. of workers (Less than c.f.) |
Less than 20 |
41 |
Less than 40 |
92 |
Less than 60 |
156 |
Less than 80 |
194 |
Less than
100 |
201 |
38
(i) From ogive curve,
The wage above that 50% workers earn is Rs 42.
(ii) The daily wage limits of middle 30%
workers are given by P35 and P65
From ogive curve,
P35
= Rs. 28
P65 = Rs.52
Lower limit, P35
= Rs. 28
Upper limit, P65 = Rs.52
(iii) The highest wage of 20% lowest earners (workers) is given by P20
From ogive curve,
20 = Rs. 19
(iv) The lowest wage of highest 10 % of
the workers is given by P90.
∴ P90 = Rs. 73
PU 2013 (Fall)
1. (b) The following information shows the income
distribution.
Income ($’000) |
0-10 |
10-20 |
20-30 |
30-40 |
40-50 |
50-60 |
No. of persons |
5 |
10 |
18 |
23 |
7 |
6 |
Construct
less than ogive. Also use it to find (i) the number of persons having income
less than $35,000 and (ii) percentage of persons having income between $20,000
and $50,000
Additional:
(iii)
Percentage of persons having income more than $40,000 Solution:
Less than cumulative frequency distribution
Income ($ 000) |
No. of persons(Less than c.f.) |
Less than 10 |
5 |
Less than 20 |
15 |
Less than 30 |
33 |
Less than 40 |
56 |
Less than
50 |
63 |
Less than 60 |
69 |
(i)
The number of persons having income less than $35000 = 10 +10+10+10+ 5 =
45
(ii)
The number of persons having income between $ 20000 and $50000 = 4
+10+10+10+10+2 = 46 The Percentage of
persons having income between $20,000 and $50,000
= 66.67%
(iii)
The number of persons having income more than $40,000
= 10+10+10+10+10+6
= 56
∴ The percentage of persons having
income more than $40,000
= 81.159%
PU 2016(spring)
Q.No.1 (a) From the following frequency distribution,
Income (Rs 000) |
0-10 |
10-20 |
20-30 |
30-40 |
40-50 |
50-60 |
No. of persons |
5 |
10 |
18 |
23 |
7 |
6 |
Construct an ogive that will help you the answer to find the
number of persons having income:
(i) Less than Rs. 35000
(ii). Between Rs. 20000 and Rs.50000
(iii). More than Rs.25000 Solution:
Less than
cumulative frequency distribution
Income (Rs.000) |
No. of persons(Less than c.f.) |
Less than 10 |
5 |
Less than 20 |
15 |
Less than 30 |
33 |
Less than 40 |
56 |
Less than
50 |
63 |
Less than 60 |
69 |
Less than ogive curve (or Less than cumulative frequency curve)
From ogive
curve,
(i)
The
number of persons having income less than Rs. 35000 = 10 +10+10+10+ 5 = 45
(ii)
The
number of persons having income between Rs. 20000 and Rs.50000 = 4
+10+10+10+10+2 = 46
(iii)
The
number of persons having income more than Rs.25000
=
7+10+10+10+9= 46
PU 2015 1(a):
The test
scores of the students in probability and statistics are listed below.
Construct a stem-and leaf plot of the scores.
92 78
73 89 98
89 83 75
83 94 99
69 71 96
67 81 73
88 86 82
63 73 76 82
84 89 92
95 78 87
Also,
find the lowest score of the best 25% of the students. Solution:
Arranging
the given data in ascending order of magnitude:
63, 67,
69, 71, 73,
73, 73, 75,
76, 78, 78, 81, 82,
82, 83, 83,
84, 86, 87
88, 89, 89,
89, 92, 92, 94,
95, 96, 98,
99
Stem
and leaf plot
Stem |
Leaves |
6 |
3 7 9
|
7 |
1 3 3
3 5 6
8 8 |
8 |
1 2 2
3 3 4
6 7 8
9 9 9
|
9 |
2 2 4
5 6 8
9 |
The lowest
score of the best 25% of the students is given by P75
P75 = Value of item
= Value of 75 item
= Value of 23.25th item
= Value of 23th item + 0.25 (24th item - 23th item )
= 89 + 0.25 (92-89)
= 89 + 0.25 × 3
= 89.75 scores
PU 2013(Spring)
1. (a) The weight (in lbs) of 40 boys in a class are as
follows:
138 172
145 147 150
119 158 152
168 142
157 147
102 144 165
136 164 163
128 135
126 150
146 148 145
125 146 153
138 156
173 140
135 149 140
144 132 154
142 135 (i) Construct a frequency
distribution.
(ii) Draw less than ogive and find no. of boys whose weight
is less than 165 lbs.
Solution: Solution:
Since, class
size (h) is not given, therefore at first it needs to find the approximate
number of class intervals (k) and class size (h)
Number of
observations, n = 40
S = smallest value =
102 L = Largest value = 173
By Struge’s
formula,
Number of
classes, k = 1 + 3.322logn
= 1 +
3.322 log40
= 1+3.322×
1.602
=
6.322 ≈ 6
Class width or class size, h =
Frequency distribution
Weight (in lbs) |
Tally bar |
No.of boys(f) |
102 – 113 |
| |
1 |
113–124 |
| |
1 |
124 –135 |
|||| |
4 |
135 –146 |
||||
|||| |||| |
14 |
146 –157 |
|||| |||| || |
12 |
157–168 |
|||| |
5 |
168 –179 |
||| |
3 |
|
|
N = ∑ f = 40 |
(ii) Less than cumulative frequency distribution
Weight (in lbs) |
Less than c.f. |
Less than 113 |
1 |
Less than 124
|
2 |
Less than 135
|
6 |
Less than 146
|
20 |
Less than 157 |
32 |
Less than 168 |
37 |
Less than 179 |
40 |
42
PU 2013(spring)
Q.No.1.(b): From the following distribution of mark of 500 students of a college, find the minimum pass mark if only 20% of student had failed and also the minimum mark obtained by the top 25% of the students.
Represent the data by histogram.
Marks |
0-20 |
20-40 |
40-50 |
50-60 |
60-80 |
80-100 |
No. of students |
50 |
100 |
150 |
90 |
60 |
50 |
Marks |
No. of students (f) |
Less than c.f. |
0-20 20-40 40-50 50-60 60-80 80-100 |
50 100 150 90 60 50 |
50 150 300 390 450 500 |
|
N = ∑
f = 500 |
|
If 20 % of the students failed i.e. 80% students passed, the
minimum marks of 20% of the failed students is given by P80
80th
percentile (P80)
P80 lies in class 60- 80,
L = 60,
f = 60, c.f. = 390, h = 20
= 60 +
= 63.33 marks
& the minimum marks obtained by the top 25 % of the students is given by P75
75th
percentile (P75)
P75 lies in class 50- 60,
L = 50,
f = 90, c.f. = 300, h = 10
= 50 +
= 58.33 marks
For histogram
This is the case of unequal class interval, therefore adjustment of the frequencies must be made. The class size of third and fourth class intervals is 10, that of first, second, fifth and sixth is 20 which is double of 10. So, the frequencies of first, second, fifth and sixth classes are divided by 2 i.e. 50/2 = 25
PU2014 (fall)
1. (a) Represent the following data by means of histogram,
frequency curve and polygon.
Salaries |
300-310 |
310-320 |
320-330 |
330-350 |
350-370 |
370-400 |
No. of worker |
7 |
19 |
28 |
15 |
12 |
12 |
Solution:
This is the
case of unequal class interval; therefore adjustment of the frequencies must be
made. The class size of first three class intervals is 10, that of fourth and
fifth is 20 which is double of 10. So, the frequencies of fourth and fifth
classes are divided by 2 i.e. 15/2 =7.5 and 12 /2 = 6. Also, class size of
1.
(a) The daily wages of workers of a factory are given below:
Wages (Rs.) |
300-310 |
310-320 |
320-330 |
330-350 |
350-370 |
370-410 |
No. of workers |
8 |
10 |
20 |
18/2 |
16/2 |
12/4 |
(i)
Construct
a histogram and frequency polygon for the data.
(ii)
Draw
an ogive for the data and estimate the median age.
Solution:
(i) This is the case of unequal class interval; therefore
adjustment of the frequencies must be made. The class size of first three class
intervals is 10, that of fourth and fifth is 20 which is double of 10. So, the
frequencies of fourth and fifth classes are divided by 2 i.e. 18/2 =9 and 16 /2
= 8. Also, class size of last class is 40 which is 4 times of 10. so, the
frequency of last class is divided by 4 i.e. 12/4 = 3.
Wages (Rs.) |
No. of workers(Less than c.f.) |
Less than 310 |
8 |
Less than 320 |
18 |
Less than 330 |
38 |
Less than 350
|
56 |
Less than
370 |
72 |
Less than 410
|
84 |
Note: For ogive curve, it not necessary to
be equal class size. So, no need of adjustment.
Md = Rs. 337
Example 9: from the following data, obtain interquartile range, Q.D. & coefficient of Q.D. Daily production: 25, 20, 23, 18, 22, 17, 26 Solution:
Arranging
the given data in ascending order, we get,
17, 18, 20, 22, 23, 25, 26 Now,
Q1 = value of item.
= value of item.
= value of 2nd
item. = 18
Q3 = value of 3 item.
th = value of item.
= value of 6th
item.
= 25
Interquartile
range = Q3
− Q1
= 25 -18
= 7
Quartile
deviation or Semi-interquartile range (Q.D.) =
= 3.5
Coefficient
of Q.D. =
Example 10: Find the interquartile range, Q.D.
and Coefficient of Q.D. from the following series X : 9, 10, 5, 6, 7, 2 , 8, 4
Solution: Arranging the given data in
ascending order X : 2, 4, 5, 6, 7, 8, 9, 10
Q1
= value of item.
= value of item.
= value of 2.25th
item.
= Value of 2nd item + 0.25( 3th
item - 2nd item)
Q1= 4 + 0.25 (5 – 4)
= 4.25
Q3
= value of item.
= value of item.
= value of 6.75th
item.
= Value of 6th item + 0.75( 7th
item - 6th item)
Q3= 8 +
0.75 (9 – 8) = 8.75 Interquartile range =
𝑄3
− 𝑄1
= 8.75 4.25
= 4.5
Quartile deviation or Semi-interquartile range
(Q.D.) =
= 2.25
Coefficient
of Q.D. =
Example11:Compute the quartile deviation of the
following distribution giving the screen size of Laptop available in Nepalese
Laptop Market.
Size of Screen (cm)
|
No. of Laptop |
Size of Screen (cm) |
No. of Laptop |
9.5 10.0 10.5
11.0 11.5 12.0 12.5 13.0 |
1 8 20 30 50 95 110 150 |
13.5 14.0
14.5 15.0 15.5 16.0 16.5 17.0 |
200 250 280 245 80 40 35 5 |
Solution:
Size of screen (cm) X |
No. of Laptop (f)
|
Less than c.f. |
9.5 10 10.5 11 11.5 |
1 8 20 30 50 |
1 9 29 59 109 |
12 12.5 13 13.5 14 14.5 15 15.5 16 16.5 17 |
95 110 150 200 250 280 245 80 40 35 5 |
204 314 464 664 914 1194 1432 1519 1559 1594 1599 |
|
N= ∑ f = 1599 |
|
Quartile deviation (Q.D.) = Q1 =
value of item.
= value of item
= value of 400th
item = 13 cm.
Q3 = value of item.
= value of item
= value of 1200th
item = 15 cm.
Quartile
deviation (Q.D.) =
= 1 cm
Example12. The following frequency distribution
represents the weight of 200 laptops.
Weight in lbs |
Frequency |
Weight in lbs |
Frequency |
4-5 5-6 6-7 7-8 |
20 24 35 48 |
8-9 9-10 10-11 11-12 |
32 24 8 2 |
Compute
the first three quartiles and quartile deviation. Solution:
Weight in lbs |
Frequency (f )
|
Less than c.f. |
4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 |
20 24 35 48 32 24 8 2 |
|
|
N= ∑ f = 193 |
|
For lower
quartile or first quartile (𝑄1)
𝑄1 lies in class 6-7
L = 6,
f = 35, c.f. = 44, h = 1
= 6.12 lbs
For 2nd quartile (Q2)
Q2 lies in class 7-8
L = 7,
f = 48, c.f. = 79, h = 1
= 7.36 lbs
For 3rd quartile (Q3)
Q3 lies in class 8-9
L = 8,
f = 32, c.f. = 127, h = 1
= 8.55 lbs
Quartile
deviation (Q.D.) =
=1.215 lbs
Coefficient
of Q.D. =
Example 13:
The scores obtained by 10 students in Statistics I of an IT college are
given below. Compute range and standard deviation
55 35
60 55 55
65 40 45
35 42
Solution:
Score (X) |
X2 |
55 35 60 55 55 65 40 45 35 42 |
3025 1225 3600 3025 3025 4225 1600 2025 1225 1764 |
∑ X= 487 |
X2 = 24739 |
Range (R) =
L-S
= 65 -35
= 30 score
Standard
deviation
= 10.109
score
Example 14: Find standard deviation (S.D.) and
variance of the following data.
Variable (X) |
10 |
14 |
15 |
18 |
20 |
Frequency (f) |
3 |
5 |
7 |
6 |
4 |
Solution:
Variable (X) |
Frequency (f )
|
fX |
f𝑋2 |
10 14 15 18 20 |
3 5 7 6 4 |
30 70 105 108 80 |
300 980 1575 1944 1600 |
|
N = ∑ 𝑓 = 25 |
∑ f X= 393 |
∑ f X2 = 6399 |
Standard
deviation (
=
√8.841 =
2.973
Variance (
=
8.841
Example 15: The frequency distribution of time
required to open the operating system of 200 computers is given below.
Time in seconds |
No. of computers |
Time in seconds |
No. of computers |
0-4 5-9 10-14 15-19 |
2 20 35 40 |
20-24 25-29 30-34 35-39 |
48 32 18 5 |
Compute the standard deviation.
Solution:
Let a = 17.5
Customer
service time (in minutes) |
No.of customers (f) |
Mid. value (X) |
|
f𝑑′ |
f𝑑′2 |
0-5 5-10 10-15 15-20 20-25 25-30 |
2 8 26 30 28 6 |
2.5 7.5 12.5 17.5
22.5 27.5 |
-3 -2 -1 0 1 2 |
-6 -16 -26 0 28 12 |
18 32 26 0 28 24 |
|
N = ∑ 𝑓 = 100 |
|
|
∑ 𝑓𝑑′ = -8 |
∑ f𝑑′2
= 128 |
Standard deviation (σ ) = √∑
fd′2 − (∑ fd′)2 × h
N N
= 5.64
minutes
Example 16: The following data gives on
temperature of Kathmandu for a week in summer. Compute the range and quartile
deviation.
Day |
Sun |
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
Temp.() |
34 |
35 |
32 |
35 |
36 |
34 |
35 |
Solution:
Range (R) =
L – S
= 36 – 32
= 4
Arranging the
given data in ascending order Temp.() X : 32, 34, 34, 35,
35, 35, 36 Q1 = value of item.
= value of item.
= value of 4th
item. = 35
Q3
= value of item.
= value of item.
= value of 6th
item.
= 35
Quartile
deviation (Q.D.) =
=
0.5
Example 17: The number of runs scored by two
group of cricket players in a test match are
Group A |
10 |
25 |
85 |
72 |
115 |
80 |
52 |
45 |
30 |
10 |
Group B |
120 |
15 |
30 |
35 |
42 |
65 |
80 |
34 |
25 |
15 |
Test which group is more consistent.
Solution:
For Group A
No. of runs(X) |
𝑋2
|
10 25 85 72 115 80 52 45 30 10 |
100 625 7225 5184 13225 6400 2704 2025 900 100 |
∑ 𝑋= 524 |
∑ 𝑋2=38488 |
̅X = ∑ X = 524 =
52.4 n 10
S.d.
C.V. (Group A) =
= 63.38%
For Group B
No. of runs(X) |
𝑋2
|
120 15 30 35 42 65 80 34 25 15 |
14400 225 900 1225 1764 4225 6400 1156 625 225 |
∑ 𝑋= 461 |
∑ 𝑋2=29929 |
̅X = ∑ X = 461 =
46.1 n 10
S.d.
C.V. (Group B) =
= 63.86%
Since, C.V.
(Group A) < CV. (Group B).Therefore, group A is more consistent.
For Group B
No. of runs(X) |
𝑋2
|
120 15 30 35 42 65 80 34 25 15 |
14400 225 900 1225 1764 4225 6400 1156 625 225 |
∑ 𝑋= 461 |
∑ 𝑋2=29929 |
̅X = ∑ X = 461 =
46.1 n 10
S.d.
C.V. (Group
B) =
= 63.86%
Since,
C.V.(Group A) < CV( Group B).Therefore, group A is more consistent.
Example 18: The following data represents the
scores made in an intelligent test by two groups of students from section A and
section B of a college.
Students no. |
Section A |
Section B |
Students no. |
Section A |
Section B |
1 2 3 4 5 |
9 8 10 6 7 |
10 8 6 8 9 |
6 7 8 9 10 |
8 5 6 7 8 |
8 7 8 5 8 |
Test which group is more consistent.
Example 19:What are the roles of measure of
dispersion in descriptive statistics? Following table gives the frequency
distribution of thickness of computer chips (in nanometre) manufactured by two
companies.
Thickness of computer chips |
5 |
10 |
15 |
20 |
25 |
30 |
|
Number of chips by |
Company A |
10 |
15 |
24 |
20 |
18 |
13 |
Company B |
12 |
18 |
20 |
22 |
24 |
4 |
Which company may be considered more consistent in terms of
thickness of computer chips? Apply appropriate descriptive statistics.
Solution: For company A
Thickness of computer chips(X) |
No. of chips (f)
|
d = X-15 |
fd |
fd2 |
5 10 15 20 25 30 |
10 15 24 20 18 13 |
-10 -5 0 5 10 15 |
-100 -75 0 100 180 195 |
1000 375 0 500 1800 2925 |
|
N =∑ f = 100 |
|
∑ fd= 300 |
∑ fd2= 6600 |
Mean (
= 15 + = 18
S.d.
C.V. (Company A) =
= 41.93%
For company B
Mean (
= 15 +
= 17
Thickness of computer chips(X) |
No. of chips (f) |
d = X-15 |
fd |
fd2 |
5 10 15 20 25 30 |
12 18 20 22 24 4 |
-10 -5 0 5 10 15 |
-120 -90 0 110 240 60 |
1200 450 0 550 2400 900 |
|
N =∑ f = 100 |
|
∑
fd= 200 |
∑
fd2= 5500 |
S.d.
C.V. (Company B) =
= 42.005%
Since, C.V.(Company A) < CV( Company B).Therefore, company
A is considered more consistent than company B in terms of thickness of
computer chips.
Example 20: The following table shows the
monthly expenditure of ward no.1 and ward no. 2 of Kathmandu Metropolitan City
in certain locality.
Expenditure (in 000 Rs.) |
0-5 |
5-10 |
10-15 |
15-20 |
20-25 |
25-30 |
No. of families (ward no.1) |
5 |
12 |
50 |
20 |
10 |
3 |
No. of families (ward no.2) |
7 |
15 |
40 |
18 |
12 |
8 |
Which ward of people has uniform expenditure?
Solution: The ward of people has more uniform
expenditure whose Coefficient of variation (C.V.) is less.
Expenditure (in 000 Rs.) |
No.
of families (ward no.1) f |
mid. value (x) |
|
f𝑑′ |
f𝑑′2 |
0-5
5-10
10-15
15-20
20-25
25-30 |
5
12
50
20
10
3 |
2.5
7.5
12.5 17.5 22.5 27.5 |
-2
-1
0
1
2
3 |
-10
-12
0
20
20
9 |
20
12
0
20
40
27 |
|
N = 100 |
|
|
∑
f𝑑′ = 27 |
∑
f𝑑′2 = 119 |
For
ward no.1
C.V. (Ward
no. 1) = 19.3223%
For ward no.2
Expenditure (in 000 Rs.) |
No. of families (ward no.1) f |
mid. value (x) |
|
f𝑑′ |
f𝑑′2 |
0-5 5-10 10-15 15-20 20-25 25-30 |
7 15 40 18 12 8 |
2.5 7.5 12.5 17.5 22.5 27.5 |
-2 -1 0 1 2 3 |
-14 -15 0 18 24 24 |
28 15 0 18 48 72 |
|
N = 100 |
|
|
∑
f𝑑′ = 37 |
∑
f𝑑′2 = 181 |
C.V.(Ward no. 2) = 45.06911%
Since,
C.V.(Ward no. 1) < C.V.(Ward no. 2). Therefore, people of ward 1 has more
uniform expenditure than ward no. 2.
Example 21: The following table gives the two
bike models and their corresponding life:
Life (in years) |
|
0-2 |
2-4 |
4-6 |
6-8 |
8-10 |
No. of bikes |
Model T 1 |
1 |
9 |
12 |
11 |
8 |
Model T2 |
5 |
7 |
11 |
19 |
9 |
Which model of bike has greater uniformity?
Solution:
We have to
compute coefficient of variation (C.V.) to determine the uniformity.
Computation
of Sum of Values for Mean and S.D.
For Model 𝑻𝟏
Life (in years) |
No. of bikes (f) |
mid. value (x) |
|
f𝑑′ |
f𝑑′2 |
0-2 2-4 4-6 6-8 8-10
|
1 9 12 11 8 |
1 3 5 7 9
|
-2 -1 0 1 2
|
-2 -9 0 11 16
|
4 9 0 11 32
|
|
N = ∑ 𝑓 = 41 |
|
|
∑
f𝑑′ = 16 |
∑
f𝑑′2 = 56 |
5
+0.78 = 5.78 years
= 2.202 years
C.V. (Model38.09%
For Model 𝑻𝟐
Life (in years) |
No. of bikes (f) |
mid. (x) |
value |
|
f𝑑′ |
f𝑑′2 |
0-2 2-4 4-6 6-8 8-10 |
5 7 11 19 9 |
1 3 5 7 |
9 |
-2 -1 0 1 2 |
-10 -7 0 19 18 |
20 7 0 19 36 |
|
N = ∑ 𝑓 = 51 |
|
|
|
∑ f𝑑′ = 20 |
∑ f𝑑′2 = 82 |
̅X = a +∑
fd′ × h = 5+ 20
× 2 = 5 +0.7843 =
5.7843 years
N 51 √∑ fd′2 − (∑ fd′)2
× h
σ =
N N
= 2.4116 years
C.V. (Model41.692%
Since, C.V.
(Model T1) < C.V. (Model T2). Therefore, model T1
of bike has greater uniformity than model T2
PU 2014 (Spring), 2015 (Spring), 2018
(Fall)
b) The lives of two models (A and B) of refrigerators in
recent survey are shown below:
Life (No. of years) |
No. of refrigerators |
|
Model A |
Model B |
|
0-2 2-4 4-6 6-8 8-10 10-12 |
5 16 13 7 5 4 |
2 7 12 19 9 1 |
i.
What
is the average life of each model of these refrigerators?
ii.
Which
models has greater uniformity?
Solution: We have to compute coefficient of
variation (C.V.) to determine the uniformity.
For Model A
Life (in years) |
No. of refrigerators (f) |
mid. (x) |
value |
fx |
f𝑥2 |
0-2 2-4 4-6 6-8 8-10 10-12 |
5 16 13 7 5 4 |
1 3 5 7 9 11 |
|
5 48 65 49 45 44 |
5 144 325 343 405 484 |
|
N = ∑ f =
50 |
|
|
∑ fx = 256 |
∑ fx2 = 1701 |
̅X = ∑ f𝑥 = 256 = 5.12 years
N 50
= 2.793 years
C.V. (Model
A) = %
For Model B
Life (in years) |
No. of refrigerators (f) |
mid. value
(x) |
fx |
f𝑥2 |
0-2 2-4 4-6 6-8 8-10 10-12 |
2 7 12 19 9 1 |
1 3 5 7 9 11 |
2 21 60 133 81 11 |
2 63 300 931 729 121 |
|
N = ∑ f = 50 |
|
∑ fx = 308 |
∑ fx2 = 2146 |
̅X = ∑ f𝑥 = 308 = 6.16 years
N 50
= 2.2303 years
C.V. (Model
B) = %
(i)
The
average life of each model of these refrigerators are
̅X
(Model A) = 5.12 years
& ̅X (Model B) = 6.16 years
(ii)Since, C.V. (Model A) > C.V.
(Model B). Therefore, model B of refrigerator
has greater uniformity than model A. PU 2015 (Fall)
1. b) Lives of two models A & B of objects in a recent
survey are:
Life |
0-2 |
2-4 |
4-6 |
6-8 |
8-10 |
10-12 |
Model A |
5 |
16 |
13 |
7 |
5 |
4 |
Model B |
2 |
7 |
12 |
19 |
9 |
1 |
Which model has greater uniformity?
PU 2016 (Fall)
1. (a) For a
computer controlled lathe whose performance was below par, workers record the
following causes and their frequencies:
Power
fluctuation 6
Controller
not stable 22
Operator
error 13
Worn tool
not replaced 2
Other 5
Construct
Pareto chart.
(i)
What
percentage of the cases are due to an unstable controller?
(ii)
What
percentage of the cases is due to either unstable controller or operator
error?
Solution
Arrange
data in descending order and obtain frequencies and percentage cumulative
frequencies as follow;
Categories |
Frequency |
Cumulative frequency |
% cumulative frequency |
Controller not stable |
22 |
22 |
46 |
Operator error |
13 |
35 |
73 |
Power fluctuation |
6 |
41 |
85 |
Worn tool not replaced |
5 |
46 |
96 |
Others |
2 |
48 |
100 |
(i)
The
percentage of the cases are due to an unstable controller = 100 = 45.83 %
(ii)
The
number of cases is due to either unstable controller or operator error = 22 +
13 = 35
∴ The
percentage of the cases is due to either unstable controller or operator error 100
=
72.92 %
PU 2016 (Spring)
1.(b) An
analysis of monthly wages paid to the workers in two firms A and B belonging to
the same industry gives the following results: (use population)
|
Firm A |
Firm B |
No. of workers |
500 |
600 |
Average monthly wages (Rs) |
186 |
175 |
Variance of distribution of wages (Rs) |
81 |
100 |
i.
Which
firm, A or B has a larger wage bill?
ii.
In
which firm, A or B is there greater variability in individual wages?
iii.Calculate (a) the average monthly
wages (b) the variance of the distribution of wages, of all the workers in the
firm A and B taken together. Solution:
For Firm A For
firm B n1
= 500 n2 = 600
̅X1 = Rs 186 ̅X2 =
Rs175 σ12 = 81 σ22 = 100
σ1
= √81
= 9 σ2
= √100
= 10
(i) For firm
A
or, ∑ X1 = n1 × ̅X1 = 500× 186 = Rs. 93000 For firm B
or, ∑ X2 = n2 × ̅X2 = 600× 175 = Rs.105000
Since ∑ X1
< X2
therefore, firm B has a larger wage bill than firm A.
(ii)
C.V.
= 4.838%
C.V.
= 5.714%
Since, C.V.
(Firm A) < C.V.(Firm B). Therefore, in
firm B there greater variability
in individual wages than firm A .
(iii) (a)
The average monthly wages of all the workers in the firm A and B taken together
is given by
= Rs. 180
(b) The
variance of the distribution of wages of all the workers in the firm A and B
taken together is
=
121.363
Where d1 = ̅X1
− ̅X12=
186−180 = 6
d2 = ̅X2 −
̅X12 = 175−180
=- 5
Example
For a group
of 200 candidates, the mean and standard deviation were found to be 40 and 15.
Later on it was discovered that the score 53 was misread as 35. Find the
correct mean and standard deviation corresponding to the correct figures.
Solution:
We have
given,
n
= 200 Mean (𝑋̅) = 40
Standard
deviation = 15
Wrong
observation (i.e. wrong score) = 35 Corrected observation (i.e. correct
score) = 53 Corrected Mean (𝑋̅ correct) =?
Corrected
standard deviation(𝜎𝑐𝑜𝑟𝑟𝑒𝑐𝑡) =?
We
know,
or, 40 =
or,
∑ X
= 200 × 40
or,
∑ X
= 8000
Corrected ∑ X = ∑ X – Wrong observation + Correct
observation
= 8000 – 35 + 53 = 8018
Correct mean
Again,
S.D.
or, 15 = or, 15 =
or, 15
=
Squaring
both sides
or, 225 = or, 225 + 1600 = or, 1825 =
or, ∑ 𝑋2 = 1825× 200 = 365000
Corrected ∑ 𝑋2 = ∑
𝑋2 –
(Wrong observation)2 +
(Correct observation)2
= 365000 – (35)2 + (53)2 = 366584
Corrected
S.D. =
= 15.02
Example
The mean and
standard deviation of a set of 100 workers were found to be 40 and 12
respectively. On checking, it was found that two workers were wrongly taken as
23 and 15 instead of 43 and 18.
Calculate
the correct mean and standard deviation. Also, find correct variance.
Solution:
We
have given,
Total
no. of observations (n) = 100
Mean
(𝑋̅)= 40
Standard
deviation = 12
Wrong
observations = 23 and 15
Correct
observations = 43 and 18
We
know,
or, 40 =
or,
∑ 𝑋
= 100 × 40
or,
∑ 𝑋
= 4000
Corrected ∑ 𝑋 = ∑ 𝑋 – Wrong observations + Correct observations = 8000 – 23–15 + 43+18 = 4023
Correct mean
Again,
S.D.
or, 12
= or, 12 =
or, 12 =
Squaring
both sides
or, 144
= or, 144 + 1600 = or, 1744 = or, ∑ 𝑋2 = 1744× 100 = 174400
Corrected ∑ 𝑋2 = ∑
𝑋2 –
(Wrong observations)2 +
(Correct observations)2
= 174400 – (23)2
– (15)2 + (43)2 + (18)2
= 174400 – 529 – 225 + 1849 + 324
= 175819
Corrected
S.D. =
= 11.82
Correct
variance (σ2correct ) = (σcorrect)2
= (11.82)2 = 139.737
Corrected mean (̅X) = 40.23
Corrected standard = 11.82
& Correct variance (σ2correct ) = 139.737 Additional question
A factory produces two types of CFL bulbs A and B . The
following results were obtained relating to their life
|
Bulb A |
Bulb B |
No. of bulbs |
100 |
90 |
Average length of life |
900 hours |
1000 hours |
Variance |
121 |
144 |
(a)
Compare
the variability of life of two types of CFL bulbs.
(b)
Calculate
the standard deviation of both types of CFL bulbs taken together.
(c)
Also
compute coefficient of variation of both types of CFL bulbs taken together. Solution:
(a) For Bulb A
|
|
|
For Bulb B |
n1 = 100
|
|
|
n2 = 90 |
̅X1 = 900 hours |
|
|
̅X2 = 1000 hours |
σ1
= √121
= 11 σ2 = √144 = 12
C.V. C.V.
= 1.222%
= 1.2%
Since, C.V. (Bulb A) > C.V.(Bulb B).
Therefore, the life of type of Bulb A is more variability than type of Bulb B.
That is, the life of type of Bulb B is more consistent than type of Bulb A.
The standard deviation of both types of CFL
bulbs taken together (i.e. combined standard deviation) is
Where,
= 947.3684
1 =
̅X1 − ̅X12= 900947.3684 = - 47.3684
d2 = ̅X2 −
̅X12 = 1000947.3684= 52.631
c)
Coefficient of variation of both types of CFL bulbs taken together (i.e.
combined C.V.) is
Combined
C.V. =
=
0.392%
PU2014 (fall)
1. (b) The
first two groups have 100 items with mean 45 and variance 49. If the combined
group has 250 items with mean 51 and variance 130, find the mean and standard
deviation of the second group.
Solution: |
|
|
|
first group |
|
|
second group Combined
group |
n1 = 100
|
|
|
n2 = 150 n1
+ n2
= 250 |
̅X1 = 45 |
|
|
̅X2
= ? ̅X12 = 51 |
σ12 = 49 σ2 = ?
122 = 130
or, or,
or, 4500 + 150̅X2 = 12750 or, 150̅X2 = 12750 - 4500 or, 150̅X2 = 8250
or,
̅X2 = 55
And
or, 130 = or,
or, 10900 + 150 σ22 = 19500 or, 150
σ22
= 19500 – 10900 or, 150 σ22 = 8600
or,
or, σ22 = 57.33
𝜎2
= 7.571 Where,
d1
= ̅X1 − ̅X12 = 45 −51 = - 6
d2
= ̅X2 − ̅X12 = 5551 = 4
The End.