Study on Sex-Bias in Graduate Admissions at The University of California Berkeley

The data is a partial dataset from the classical Berkely Graduate School Admissions sex-bias case in which proportions for admissions data aggregated over departments gave a clear indication of bias toward males in graduate school admissions.
Only the seven largest departments are included here, but the example is representative of the original larger dataset.

The videos show how to set this up for analysis in EXCEL using a Pivottable to generate the crosstabulations and then a Chi-squared analysis to test for independence between admission rates and sex of the applicants. The result is a very strong rejection of the independence. Aggregate data suggests a strong bias in favor of male applicants. If it is assumed applicants are equally qualofied, this would constitute an illegal discrimination against female applicants. However, inclusion of a Department variable as a page variable in the pivottable allows filtering to give subgroup results. testing for sex discrimination in departmental subsets gives a different result. This outcome is a classical example of Simpson's Paradox, wherein aggregate data gives a relationship reverse to that in the subsets. This is caused by lurking variables that are not controlled for in the aggregation.

Since, as we examine relationships statistically, we usually don't measure everything, the disturbing question that comes out of this is "how can you be sure that a relationship seen in any crosstabulation or correlation is meaningful and not just the result of some unmeasured other variable or interaction that has not been taken into account?" The answer is, "we can't" thus—statistical significance is not "proof" of anything.
Even very high confidence in a conclusion is not “proof.” Our understandings of situations may change as a result of digging further into the data. Sometimes, aggregations of data may be inappropriate because there are other variables lurking in the subsets that could change the interpretation substantially.
Despite its limitations, Microsoft EXCEL is available on most people's desktops and provides rapid analysis capabilities in an understandable interface.
Excel's pivottable tool is a very convenient tool for crosstabulation and examining aggregates versus subsets. The page variable allows drilling further down into the data in a very intutuitive way, using filters.

Videos on how to set up the pivottable and the chi-squared analysis:

Screencaptures showing formulas Exposition.doc
Dataset in EXCEL sexbias2.xls
Setting up a Pivottable Video berkpivt.html
calculating percents of admissions Video berkpct.html
flexible Chi2 Calculation and interpretation Video berkchi2.html

The partial dataset was obtained from Gerstman B.B. (2000) Data Analysis with Epi Info, Binary Outcome, Stratified Analysis on the web at http://www.sjsu.edu/faculty/gerstman/EpiInfo/stratified.htm

Some other useful links on this topic:

http://plato.stanford.edu/entries/paradox-simpson/
http://www.google.com/search?hl=en&q=simpson%27s+paradox
http://core.ecu.edu/psyc/wuenschk/StatHelp/Reversal-Paradox.txt
http://repository.upenn.edu/cgi/viewcontent.cgi?article=1014&context=wharton_research_scholars
http://wolfweb.unr.edu/homepage/jerryj/NNN/Aggregates.pdf

Return to course page
Return to home