QA 233 (Basic Business Statistics)

Solutions to Practice Problem Set X


Dee Zerta is the proprietor of Ice Dreams, a local chain of ice cream soda shops. Ms. Zerta frequently has difficulty estimating her shop’s daily demand, which in turn has lead to problems with her inventory. Because vanilla ice cream is the basis for so many of Ice Dreams’ menu items (such as banana splits, parfaits, sundaes, shakes and malts), anticipation of demand for this flavor is her greatest challenge. However, Ms. Zerta has noticed that her sales seem to increase as the outside temperature increases, and she believes she may be able to use this knowledge to alleviate this problem. She randomly selects twenty days from the previous year and uses the internet to find the official high temperature in her area for those days. She then uses her business records to find the amount of vanilla ice cream sold by the Ice Dreams shops on those days. Her results are given below.

 

Date

Gallons of Vanilla Ice Cream Sold

Official High Temperature

11-01-00

151

 64°

03-11-00

 98

 58°

05-19-00

233

 82°

01-30-00

 61

 48°

12-14-99

 58

 54°

04-14-00

201

 75°

06-22-00

311

 93°

06-07-00

322

 89°

02-08-00

164

 61°

11-19-99

111

 57°

10-22-00

192

 78°

05-22-00

275

 90°

09-09-00

345

 99°

03-28-00

221

 71°

06-25-00

391

103°

07-02-00

382

 98°

10-29-00

168

 77°

08-19-00

363

101°

08-02-00

370

 97°

04-22-00

215

 80°

Please use these data to help Ms. Zerta determine the relationship between outside temperature and sales of vanilla ice cream at her shops.

First we recognize that we wish to be able to predict or estimate Gallons of Vanilla Ice Cream Sold, so this is the dependent variable y. Since we wish to use information about Official High Temperature in making such predictions or estimations, this is our independent variable x. We will need to perform some summary calculations:

 

Date

Gallons of Vanilla Ice Cream Sold (y)

Official High Temperature (x)

xy x2 y2

11-01-00

 151

 64°

  9664   4096   22801

03-11-00

  98

 58°

  5684   3364    9604

05-19-00

 233

 82°

 19106   6724   54289

01-30-00

  61

 48°

  2928   2304    3721

12-14-99

  58

 54°

  3132   2916    3364

04-14-00

 201

 75°

 15075   5625   40401

06-22-00

 311

 93°

 28923   8649   96721

06-07-00

 322

 89°

 28658   7921  103684

02-08-00

 164

 61°

 10004   3721   26896

11-19-99

 111

 57°

  6327   3249   12321

10-22-00

 192

 78°

 14976   6084   36864

05-22-00

 275

 90°

 24750   8100   75625

09-09-00

 345

 99°

 34155   9801  119025

03-28-00

 221

 71°

 15691   5041   48841

06-25-00

 391

103°

 40273  10609  152881

07-02-00

 382

 98°

 37436   9604  145924

10-29-00

 168

 77°

 12936   5929   28224

08-19-00

 363

101°

 36663  10201  131769

08-02-00

 370

 97°

 35890   9409  136900

04-22-00

 215

 80°

 17200   6400   46225

Totals

4632

1575

399471 129747 1296080

Now we can use these summary calculations to compute stimated slope coefficient b1 and the estimated y-intercept b0:

 

and

so the estimated regression equation is:

So we estimate that i) when Official High Temperature increases by one unit (degree Fahrenheit)  the Gallons of Vanilla Ice Cream Sold increases by 6.07 units (gallons) and ii) when Official High Temperature is zero units (degrees Fahrenheit)  the Gallons of Vanilla Ice Cream Sold is expected to be -246.501 units (this meaningless result is due to fact that we had to extrapolate to estimate the y-intercept term b0).

-- QA 233 students - stop here for this problem! --

Although we were not asked to do so, we could also evaluate the correlation coefficient (r) and the coefficient of determination (r2) for this problem. We would first need to compute ssyy:

Now we can calculate the correlation coefficient (r):

The correlation coefficient is close to 1.0 and positive, so we have evidence that a strong positive relationship exists between Gallons of Vanilla Ice Cream Sold and Official High Temperature. We can also calculate the coefficient of determination (r2):

Thus the regression has been able to explain 94.3% of all variation in the observed values of the dependent variable Gallons of Vanilla Ice Cream Sold in our sample data.

 

  1. George Steinbrenner, the owner of the New York Yankees, is arguing with other about whether a baseball team’s attendance is related to how successful the fans expect the team to be. In an attempt to make his point, Mr. Steinbrenner has one of his staff members assemble data on the Yankees total attendance during each season from 1976 through 1990. He also has the staff member record the number of wins the Yankees had achieved during the previous season for each of these seasons. The data are:

    Season

    Attendance (in millions of fans)

    Wins During Previous Season

    1976

    2.0

     83

    1977

    2.1

     97

    1978

    2.3

    100

    1979

    2.5

    100

    1980

    2.6

     89

    1981

    2.4

    103

    1982

    ***

    ***

    1983

    2.3

     79

    1984

    1.8

     91

    1985

    2.2

     87

    1986

    2.3

     97

    1987

    2.4

     90

    1988

    2.6

     89

    1989

    2.2

     85

    1990

    2.0

     74

    Mr. Steinbrenner’s assistant recognized that approximately one-third of the 1982 season was cancelled due to a player’s strike, and so has omitted that season from the data. Please use the data collected to determine if Mr. Steinbrenner has a supportable conjecture with regards to the Yankees. Also explain for Mr. Steinbrenner the nature of the relationship between the Yankees’ attendance and their level of success on the field during the previous season.

    First we recognize that we wish to be able to predict or estimate Attendance (in millions of fans), so this is the dependent variable y. Since we wish to use information about Wins During Previous Season in making such predictions or estimations, this is our independent variable x. We will need to perform some summary calculations:

     

Season

Attendance in Millions of fans (y)

Wins During Previous Season (x)

xy x2 y2

1976

 2.0

  83

 166.0

  6889

  4.00

1977

 2.1

  97

 203.7

  9409

  4.41

1978

 2.3

 100

 230.0

 10000

  5.29

1979

 2.5

 100

 250.0

 10000

  6.25

1980

 2.6

  89

 231.4

  7921

  6.76

1981

 2.4

 103

 247.2

 10609

  5.76

1982

 ***

 ***

 ***

  ***

  ***

1983

 2.3

  79

 181.7

  6241

  5.29

1984

 1.8

  91

 163.8

  8281

  3.24

1985

 2.2

  87

 191.4

  7569

  4.84

1986

 2.3

  97

 223.1

  9409

  5.29

1987

 2.4

  90

 216.0

  8100

  5.76

1988

 2.6

  89

 231.4

  7921

  6.76

1989

 2.2

  85

 187.0

  7225

  4.84

1990

 2.0

  74

 148.0

  5476

  4.00

Totals

31.7

1264

2870.7

115050

172.49

Now we can use these summary calculations to compute he estimated slope coefficient b1 and the estimated y-intercept b0:

and

so the estimated regression equation is:

So we estimate that i) when Wins During Previous Season increases by one unit (game won)  the Attendance (in millions of fans) increases by 0.0093 units (millions of fans) or 93000 fans and ii) when Wins During Previous Season is zero units (no wins)  the Attendance (in millions of fans) is expected to be 1.424 units or 1,424,000 fans (this meaningless result is again due to fact that we had to extrapolate to estimate the y-intercept term b0).

-- QA 233 students - stop here for this problem! --

Although we were not asked to do so, we could also evaluate the correlation coefficient (r) and the coefficient of determination (r2) for this problem. We would first need to compute ssyy:

Now we can calculate the correlation coefficient (r):

The correlation coefficient is not close to 1.0 and positive, so we have evidence that a somewhat weak positive relationship exists between Attendance (in millions of fans) and Wins During Previous Season. We can also calculate the coefficient of determination (r2):

Thus the regression has been able to explain 11.3% of all variation in the observed values of the dependent variable Attendance (in millions of fans) in our sample data.

 
  1. Johnny Bravo, ace QA 233 student, wants to demonstrate the relationship between the number of hours a student spends studying and the grades that he/she earns. He randomly selects ten students and asks them to record the number of hours they spend studying throughout the quarter. After the quarter has concluded he also records the GPA each student earned during the quarter. His results follow:

Student ID Number

Total Hours Spent Studying During the Quarter

Grade Point Average

2196

153

3.75

2907

122

3.15

3108

 97

2.40

2562

182

4.00

4680

115

2.33

1741

 75

1.80

0850

104

2.50

9123

119

2.90

3408

 98

2.50

2597

122

2.85

 

 Please use the data collected by Johnny to determine the nature of the relationship between the number of hours a student spends studying and the grades that he/she earns.

First we recognize that we wish to be able to predict or estimate Grade Point Average, so this is the dependent variable y. Since we wish to use information about Total Hours Spent Studying During the Quarter in making such predictions or estimations, this is our independent variable x. We will need to perform some summary calculations:

 

Student ID Number

Total Hours Spent Studying During the Quarter (x)

Grade Point Average (y)

xy x2 y2

2196

 153

 3.75

 573.75

 23409

14.0625

2907

 122

 3.15

 384.30

 14884

 9.9225

3108

  97

 2.40

 232.80

  9409

 5.7600

2562

 182

 4.00

 728.00

 33124

16.0000

4680

 115

 2.33

 267.95

 13225

 5.4289

1741

  75

 1.80

 135.00

  5625

 3.2400

0850

 104

 2.50

 260.00

 10816

 6.2500

9123

 119

 2.90

 345.10

 14161

 8.4100

3408

  98

 2.50

 245.00

  9604

 6.2500

2597

 122

 2.85

 347.70

 14884

 8.1225

Totals

1187

28.18

3519.60

149141

83.4464

Now we can use these summary calculations to compute estimated slope coefficient b1 and the estimated y-intercept b0:

and

so the estimated regression equation is:

So we estimate that i) when Total Hours Spent Studying During the Quarter increases by one unit (hour studied)  the student's Grade Point Average increases by 0.0212 and ii) when Total Hours Spent Studying During the Quarter is zero units (no studying)  the student's Grade Point Average is expected to be 0.304 (this result is again due to fact that we had to extrapolate to estimate the y-intercept term b0).

-- QA 233 students - stop here for this problem! --

Although we were not asked to do so, we could also evaluate the correlation coefficient (r) and the coefficient of determination (r2) for this problem. We would first need to compute ssyy:

Now we can calculate the correlation coefficient (r):

The correlation coefficient is close to 1.0 and positive, so we have evidence that a strong positive relationship exists between Total Hours Spent Studying During the Quarter and the student's Grade Point Average. We can also calculate the coefficient of determination (r2):

Thus the regression has been able to explain 91.7% of all variation in the observed values of the dependent variable student's Grade Point Average in our sample data.


 

Return to the List of Practice Problem Sets