QA 233 (Basic Business Statistics)

Solutions to Practice Problem Set II


Mr. Mooney is the President of the Commerce Bank. He has asked his assistant, Miss Jane Hathaway, to collect data from a random sample of his bank’s clientele. Mr. Mooney asks Miss Hathaway to collect data on Age, Gender, Number Of Televisions Owned, 1999 State Income Tax, and Current Checking Account Balance. Consider the following data set (collected on January 12, 2000 by Miss Hathaway for Mr. Mooney) for each of the subsequent questions:

Age

Gender

Number of TV’s Owned

1999 State Income Tax

Account Balance

25

M

2

$1,295.23

$43.27

32

F

4

$4,904.01

$853.71

47

F

1

$3,488.81

$1739.22

26

M

5

$1,561.90

-$64.85

59

F

2

$1,391.24

$2255.77

34

M

4

$7,729.68

$597.71

73

F

10

$5,160.58

$1436.88

48

F

2

$3,052.49

$477.60

  1. How many observations has Ms. Hathaway collected? On how many variables has she collected data? At what level is each variable measured?

Each bank customer from whom data have been collected represents one observation, so Ms. Hathaway has collected eight observations. The variables and the levels at which they are measured are:

Age: Ratio

Gender: Nominal

Number of TV’s Owned: Ratio

1999 State Income Tax: Interval if this is the balance owed at the end of the year (a taxpayer may owe additional taxes or may be due a refund at the end of the year), Ratio if this is the amount of state taxes paid in 1999

Account Balance: Ratio if we don't allow for overdrawn accounts or if the bank has a policy on the maximum amount an account can be overdrawn, Interval otherwise

  1.  Please help Miss Hathaway by providing three measures of central location for Current Checking Account Balance. Which of these would you recommend that Miss Hathaway use? Why?

Ms. Hathaway could choose from any number of measures of central location. These include the midrange, arithmetic mean, median, and mode. The data array for Current Checking Account Balance is:

-$64.85 $43.27 $477.60 $597.71 $853.71 $1436.88 $1739.22 $2255.77

so the values of these measures of central location are:

 

All data values for Current Checking Account Balance are unique, so there is no mode to report.

The midrange may be misleading if the data include an extreme point (an unusually high or low Current Checking Account Balance), and the mode will usually not yield much information about such a disperse variable (it is unlikely that any Current Checking Account Balance is duplicated in such a small sample). The arithmetic mean is a reasonable choice (since there appear to be no wildly extreme Current Checking Account Balances), and the median would be an excellent measure of central location.

  1. What is the 30% trimmed mean of Current Checking Account Balance for the sample in the previous problem?

    The number of observations in this data set is n = 8, so the number of observations to trim from the lowest and highest values of the data array is

    The data array for Current Checking Account Balance is:

-$64.85 $43.27 $477.60 $597.71 $853.71 $1436.88 $1739.22 $2255.77

so the trimmed mean is:

 

  1. Is this data time series or cross sectional in nature? Please explain your response.

The data is cross-sectional in nature. A single observation has been made on each entity included in the sample.

  1. Construct an appropriate graphical display of Number Of Television Sets Owned. Provide an interpretation of your display. Is this display useful? Why or why not?

The most appropriate displays of a quantitative (interval or ratio-scaled) variable such as Number of Televisions Owned are the histogram and the dot plot. A histogram us provided below:

The display provides evidence that the Number of Televisions Owned is skewed toward lower values (less than five television sets owned). However, it is of marginal use because the data set includes very few observations, so the resulting classes are sparsely populated.

  1. Have these data been collected observationally or experimentally? Please explain your response.

The data have been collected observationally. There is no indication that any attempt has been made to control the conditions under which the data are collected.

  1. Please help Miss Hathaway by providing four measures of dispersion for Current Checking Account Balance. Which of these would you recommend that Miss Hathaway use? Why?

Ms. Hathaway could choose from any number of measures of dispersion. These include the range, mean absolute deviation, variance, standard deviation, and coefficient of variation. The data array for Current Checking Account Balance is:

-$64.85 $43.27 $477.60 $597.71 $853.71 $1436.88 $1739.22 $2255.77

so the values of these measures of dispersion are:

The range may be misleading if the data include an extreme point (an unusually high or low Current Checking Account Balance), but the other measures of dispersion are excellent measures of dispersion. If we wish to compare the dispersion of this variable to the dispersion of some other variable, the coefficient of variation would be most appropriate.

  1. Construct an appropriate graphical display that portrays the relationship between Age and Number Of Television Sets Owned. Provide an interpretation of your display. Also provide a numerical measurement of the strength of this relationship and interpret your result.

The most appropriate graphical display of the relationship between two variables is the scatter diagram. A scatter diagram of relationship between Age and Number Of Television Sets Owned is provided below:

 

With the exception of one extreme point (outlier), it appears that older people own fewer television sets. The most appropriate numerical measure of the relationship between two variables is Pearson’s Product Moment Correlation Coefficient. However, we first must calculate the covariance between Age and Number Of Television Sets Owned in order to calculate Pearson’s Product Moment Correlation Coefficient. The covariance between Age (which we will call ‘x’) and Number Of Television Sets Owned (which we will call ‘y’) is provided below:

Once we have also calculated the standard deviations of Age (x) and Number Of Television Sets Owned (which we will call y)

  and 

we can calculate Pearson’s Product Moment Correlation Coefficient for Age (x) and Number Of Television Sets Owned (y):

which supports our conclusion that a marginal positive relationship exists between Age (x) and Number Of Television Sets Owned (y).

  1. Please provide an appropriate summary measure for the values of Gender included in the data. Why is this measure appropriate?

Because Gender is measured on the Nominal scale, the sample proportion p is the most appropriate summary measure for the values of Gender included in the data. This can be expressed as either i) the proportion of males (which we could call ‘pm’) or ii) the proportion of females (which we could call ‘pf’). These values are:

  and 

  1. Calculate the 23rd, 58th, and 81st percentiles for Current Checking Account Balance.

In order to calculate the 23rd, 58th, and 81st percentiles for Current Checking Account Balance, we must first calculate their respective indices. These are:

Since none of these are integers, we can round each index up to the nearest integer to obtain the position in the data array of the respective percentiles. The data array is

-$64.85 $43.27 $477.60 $597.71 $853.71 $1436.88 $1739.22 $2255.77

so the percentiles are

23rd percentile =    $43.27

58th percentile =   $853.71

81st percentile = $1,739.22

  1. Which variable, 1999 State Income Tax Or Current Checking Account Balance, is more disperse?

The most appropriate measure for comparing the relative dispersion of two (or more) variables is the coefficient of variation. To calculate the coefficients of variation for State Income Tax and Current Checking Account Balance we will need their respective means and standard deviations. We have already calculated the mean and standard deviation of Current Checking Account Balance (which we will call ‘y’), but we still need mean and standard deviation of State Income Tax (which we will call ‘x’):

and

...so the coefficients of variation for State Income Tax and Current Checking Account Balance are:

and

so Current Checking Account Balance (y) is relatively more disperse than State Income Tax (x).

  1. Provide and interpret a box plot for Current Checking Account Balance.

The data array is

-64.85 $43.27 $477.60 $597.71 $853.71 $1436.88 $1739.22 $2255.77

so the quartiles are

Q1 = 25th percentile =   $260.44

Q1 = 50th percentile =   $725.71

Q3 = 75th percentile = $1,588.05

Thus the IQR = Q3 - Q1 = $1,327.62, so the outer limits of the inner fences are Q1 – 1.5IQR = -$1,730.99 and Q3 + 1.5IQR = $3,579.47. The box plot is given by:

The data appear to be skewed to the left, with a middle value around $750.00. There are also no extreme points or outliers (by this definition of extreme point).

The range may be misleading if the data include an extreme point (an unusually high or low Current Checking Account Balance), but the other measures of dispersion are excellent measures of dispersion. If we wish to compare the dispersion of this variable to the dispersion of some other variable, the coefficient of variation would be most appropriate.

  1. Construct displays of the values of Age, Number Of Televisions Owned, 1999 State Income Tax, and Current Checking Account Balance for each observation in the data set. Suggest one means by which you could augment your displays to incorporate information on the Gender of each observation.

Star glyphs are the most appropriate means for displaying all four of these quantitative (measured on the interval or ratio-scale) variables for each observation. You can draw one star glyph for each respondent, and use rays emanating from the center of a star glyph in each of four directions to represent relative value on the four variables for each observation. You could also use color to indicate gender, or use symbols representing gender as the center points for the individual star glyphs.

  1. If you are offered two potential investments, one that guarantees you a 45% return at the end of ten years and one that offers you an annual return of 4% for ten years. Which investment will yield the highest annual return?

The geometric mean for the first investment is

...so the first investment will generate a mean annual return of 3.7855%. Thus, the second investment (which has an annual return of 4.00%) will yield the highest annual return.

  1. Suppose you are considering two seven-year investments; investment A offers you 4.6% annual return for seven years, while investment B guarantees you an 8% annual rate of return for the first three years of investment and a 3% annual rate of return for the last four years. Which investment will yield the highest annual rate of return?

The annual return for investment A is 4.60%. To find annual return for investment B, we must find the geometric mean of the seven-year return, i.e.,

Thus, investment B will yield the highest annual rate of return at 5.11%.


 

Return to the List of Practice Problem Sets