Exercise 2, BA 3300 Business Statistics

Question:  Gender and Grades --is there a difference?

There is a common stereotype that females have lesser quantitative skills than males.
A Statistics instructor has a theory that females on average have higher grades than males in quantitative courses at UM-St. Louis.
These data points were chosen so that they consisted of pairs carefully matched on a variety of characteristics, such as age, experience, prior education and training, ambition, destructive activities, IQ of parents, and factors subject to free will.
He believes that, while the overall means of these two distributions are not vastly different, the small gender difference may be detected if enough of the other factors affecting the grades could be controlled and adjusted for (the Bill Cosby school of detecting causality).
First, we will just do a two-sample t-test to see if the means of the two groups are significantly different, and then we will look at a distribution of the differences between pairs (a one-sample t-test) to see if the differences have a mean significantly different from zero. We will compare that to the EXCEL paired two sample t-test procedure.

He suspects females may do better, but others think it might go the other way, so we will use two-tailed tests in the t-tests.

Being a lazy sort of fellow, he is willing to use built-in tools in EXCEL to do analyses--such as t-tests- rather than resort to his Monroe calculator and a lot of scratchpads. He has hired you as the lab assistant to do the work. He has provided a Comma-delimited ASCII file of his data as sexscore.txt .
Copy and paste the data into an EXCEL sheet and immediately do a SaveAs, naming it sexgrad.xls. (make sure file type is EXCEL.  It's not enough to just change the extension)

Your mission is to analyze these data, printing to one page wide by 2 pages tall maximum. do this by selecting a range that includes your analyses and only the top of the dataset (showing your student number used in the data). Print a selected range, preview / setup click the button to fit to 1 page by l page tall. use landscape or portrait to maximize print size on the one page. You can also adjust margins to help with this. Hand in a printout as exercise 2 and put a copy of the file in the digital dropbox on mygateway..

Before you go any further, insert a few rows at the top, type in your name, section time and student number so I can tell who did it.

The first few data have the letters A B C D in them.  Replace the letters with the last 4 numbers in your student number in order. For example, if your student number is 1036782:
these numbers, as used in the analysis, would be 41.6, 43.7, 37.8, 39.2
  • Use the following tools in EXCEL (you don't have to do them exactly in this order. If you are not sure about some of them, do the easy ones first):
    1. Derive tables of descriptive statistics for each of the two samples, male and female, including 95% confidence intervals. Derive a mean and variance for the combined data--you can do this using the functions, even though the data are spread over two columns. Comment on whether it looks like these two samples came from populations with different means, referring to the sample means, standard errors and confidence intervals. What happened to the variance when the data was split into two groups -versus combined?
    2. Construct histograms (histograms are frequency GRAPHs) on the two groups (males and females) separately,and the combined data, making sure the X axis has the right values as labels on the axis so you can align them for easy visual comparison. Use about 15-20 bins so you can see the general shapes of the distributions. Do these look like the samples came from populations with different means?
    3. Conduct an F test (alpha=0.05) on the variances of the two distributions and decide whether to do a t-test assuming equal or unequal variances. Comment on the resulting F value and Pvalue.  What is Ho in this test? is it rejected? What does the Pvalue mean?
    4. Conduct the appropriate (unpaired) t-test with alpha=0.05 to see if the means of the two distributions are significantly different. What is Ho? What is Ha?
    5. Comment on the results.  Is there a significant difference? What is the probability of getting this observed t-statistic if the Ho is true? Do you conclude from this analysis that males and females have different average grades in the parent populations?

    Retest using the pairing:

    The strategy of pairing values to control for other variables will allow us to detect differences that might not be seen in the midst of other effects. we will do this manually with functions first, then compare the result to what we get from the data analysis tools procedure. These should be exactly the same.
    1. Add a third column containing the differences between the pairs. Derive descriptive statistics and a confidence interval (95% confidence) on the column of differences.  Calculate a t value for the mean of this distribution to see if it is different from zero. You will have to calculate the t value manually to do this, and then get a critical t and a Pvalue by using the EXCEL functions TDIST and TINV.  Comment on the observed t statistic and the Pvalue in drawing a conclusion whether the mean of this set of differences is significantly different from zero.
    2. Conduct a Procedure (data analysis tools)  t-test for difference of the means of the two distributions as paired variables, with alpha =0.05 comment on the results. Are they "significant"? what does that mean?  How does this compare to the t-test on the column of differences?
    3. Why didn't we see the difference between males and females when we did the analysis without pairing the observations? What did the pairing do for us mathematically so that we could identify a difference with higher confidence?
    4. In all of these, why is it best to use a t test rather than a Z test even though there is a large number of observations?
    5. A preview of regression and correlation:

      highlight both original columns again, click on the chart icon and this time select an XY scatter chart. Click next a few times, making sure that the chart appears as an object in the same spreadsheet, then move it to a convenient location. Right click on the middle of the data and click add trendline. This will bring up a dialogue where you can select a linear trendline under type, and then, on the options tab, check the boxes for display equation on chart and don't set intercept = 0.  Consider the result.  How well do the data conform to the line (are there a lot of points far away from the line)? What does this mean in comparing male to female grades when values are paired this way? If you wanted to predict a female's grade knowing only the grade of her pair partner, how would you calculate your best guess?  (What is the prediction equation to use for this?)