INFO DASL DATASETS

Laatste update 6 maart 2016



 
Nr.1
Abstract: Four measurements were made of male Egyptian skulls from five different time periods ranging from 4000 B.C. to 150 A.D. We wish to analyze the data to determine if there are any differences in the skull sizes between the time periods and if they show any changes with time. The researchers theorize that a change in skull size over time is evidence of the interbreeding of the Egyptians with immigrant populations over the years. Because there are four different measurements that characterize skull size, we must use multivariate techniques that allow multiple dependent variables. Our dependent variables are the measurements MB, BH, BL, and NH. The predictor variable is Year.
Two different analyses may be performed on these data. If we assume that Year is a discrete predictor variable, then we may analyze the data using multivarite analysis of variance (MANOVA). If we wish to determine if there is a linear trendto the change in skull size, then we treat Year as a continuous predictor variable and analyze the data using multivariate regression.
A MANOVA analysis of the data shows that there is a significant difference between the multivariate measurements of skulls at different time periods at the 1% level of significance. If we look at the differences of individual measurements across the time periods, MB, BH, and BL all show significant differences at the 5% level. However, the NH measurement does not differ significantly across the time periods at the 5% level of significance.
The first plot below shows the difference in the MB measurement across the time periods. The shaded regions are the 95% confidence intervals of the median measurement at each time period. We can see that MB seems to increase over time. In the second plot NH does not appear to change as significantly across the time periods.
A multivariate regression of the data, treating Year as a continuous predictor, shows that, when all four measurements are taken together, the measurements appear to change over the years. This result agrees with the MANOVA result above. Looking at each measurement individually, all four measurements change significantly across the years. Recall that NH did not change significantly in the MANOVA analysis.
The third and fourth plots below are scatterplots of measurements vs. Year. The measurement MB appears to change more significantly with year than NH. Both MB and NH increase over time; however, plots of BH and BL vs. Year show that BH and BL decrease over time. In this study the direction of the change is not important since any change in skull size would be evidence of interbreeding.
Hypothesis tests in MANOVA and multivariate regression require that the dependent variables have a multivariate normal distribution. A plot matrix of the dependent variables shows univariate and bivariate normality. While this does not prove multivariate normality since it does not check the three and four-dimensional structure of the data, it does provide strong evidence of multivariate normality. Therefore, the hypothesis tests in the two analyses above should be valid.
In order to simplify the analysis, we can use a principal components analysis to reduce the number of dependent variables. The principal components analysis of the four measurement variables shows that the dimensionality of the dependent variables cannot be reduced; therefore, our analyses above cannot be simplified in this way.

Datafile Name: Egyptian Skulls
Datafile Subjects: Archeology , Biology
Story Names: Egyptian Skull Development
Reference: Thomson, A. and Randall-Maciver, R. (1905) Ancient Races of the Thebaid, Oxford: Oxford University Press. Also found in: Hand, D.J., et al. (1994) A Handbook of Small Data Sets, New York: Chapman & Hall, pp. 299-301. Manly, B.F.J. (1986) Multivariate Statistical Methods, New York: Chapman & Hall.
Authorization: Contact Authors
Description: Four measurements of male Egyptian skulls from 5 different time periods. Thirty skulls are measured from each time period.
Number of cases: 150
Variable Names:
1.MB: Maximal Breadth of Skull
2.BH: Basibregmatic Height of Skull
3.BL: Basialveolar Length of Skull
4.NH: Nasal Height of Skull
5.Year: Approximate Year of Skull Formation (negative = B.C., positive = A.D.)
Bron&Copyright: DASL


 
Nr.2
Abstract: Are the size and weight of your brain indicators of your mental capacity? In this study by Willerman et al. (1991) the researchers use Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects. The researchers take into account gender and body size to draw conclusions about the connection between brain size and intelligence. Willerman et al. (1991) conducted their study at a large southwestern university. They selected a sample of 40 right-handed Anglo introductory psychology students who had indicated no history of alcoholism, unconsciousness, brain damage, epilepsy, or heart disease. These subjects were drawn from a larger pool of introductory psychology students with total Scholastic Aptitude Test Scores higher than 1350 or lower than 940 who had agreed to satisfy a course requirement by allowing the administration of four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. With prior approval of the University's research review board, students selected for MRI were required to obtain prorated full-scale IQs of greater than 130 or less than 103, and were equally divided by sex and IQ classification.
The MRI Scans were performed at the same facility for all 40 subjects. The scans consisted of 18 horizontal MR images. The computer counted all pixels with non-zero gray scale in each of the 18 images and the total count served as an index for brain size.
A straightforward method for evaluating the relationship between brain size and IQ scores is the correllation coefficient. For the 20 men in the study, the researchers report correlations between IQ scores and brain sizes before and after controlling for body size of r = 0.51 (p-value less than 0.05) and r = 0.65 (p-value less than 0.01) respectively. For the 20 women in the study, the researchers report the corresponding correlations to be r = 0.33 (p-value not significant) and r = 0.35 (p-value not significant). With both genders pooled the correlation between IQ and adjusted brain size was r = 0.51 (p-value less than 0.05).

Datafile Name: Brain size
Datafile Subjects: Medical
Story Names: Brain Size and
Reference: Willerman, L., Schultz, R., Rutledge, J. N., and Bigler, E. (1991), "In Vivo Brain Size and Intelligence," Intelligence, 15, 223-228.
Authorization: Contact authors
Description: Willerman et al. (1991) collected a sample of 40 right-handed Anglo introductory psychology students at a large southwestern university. Subjects took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. The researchers used Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects. Information about gender and body size (height and weight) are also included. The researchers withheld the weights of two subjects and the height of one subject for reasons of confidentiality.
Number of cases: 40
Variable Names:
1.Gender: Male or Female
2.FSIQ: Full Scale IQ scores based on the four Wechsler (1981) subtests
3.VIQ: Verbal IQ scores based on the four Wechsler (1981) subtests
4.PIQ: Performance IQ scores based on the four Wechsler (1981) subtests
5.Weight: body weight in pounds
6.Height: height in inches
7.MRI_Count: total pixel Count from the 18 MRI scans

Bron&Copyright: DASL


 
Nr.3
Abstract: As cheddar cheese matures, a variety of chemical processes take place. The taste of matured cheese is related to the concentration of several chemicals in the final product. In a study of cheddar cheese from the LaTrobe Valley of Victoria, Australia, samples of cheese were analyzed for their chemical composition and were subjected to taste tests. Overall taste scores were obtained by combining the scores from several tasters. Scatterplots, correlation, and simple regression can be used to examine the relationships among the individual variables. Simple regressions of taste on each of the chemical concentration variables show that all three chemicals are significant predictors of flavor. However, in regressions of taste on each pair of chemicals, acetic acid is no longer significant. Multicollinearity may be a problem in these regressions since the correlation between each pair of chemicals exceeds 0.60.
The best two-variable regression model is the regression of "taste" on "H2S" and "Lactic". Adding "acetic" to the model does not change the value of R-squared, and actually decreases the adjusted R-squared and the F-ratio. Scatterplots and normal probability plots of the residuals indicate no violations of the assumptions of the model. The images below correspond to this two-variable model.

Datafile Name: Cheese
Datafile Subjects: Food , Science
Story Names: Cheddar Cheese Taste
Reference: Moore, David S., and George P. McCabe (1989). Introduction to the Practice of Statistics.
Description: As cheese ages, various chemical processes take place that determine the taste of the final product. This dataset contains concentrations of various chemicals in 30 samples of mature cheddar cheese, and a subjective measure of taste for each sample. The variables "Acetic" and "H2S" are the natural logarithm of the concentration of acetic asid and hydrogen sulfide respectively. The variable "Lactic" has not been transformed.
Number of cases: 30
Variable Names:
1.Case: Sample number
2.Taste: Subjective taste test score, obtained by combining the scores of several tasters
3.Acetic: Natural log of concentration of acetic acid
4.H2S: Natural log of concentration of hydrogen sulfide
5.Lactic: Concentration of lactic acid
Bron&Copyright: DASL



 
Nr.4
Abstract: Clouds were randomly seeded or not with silver nitrate. Rainfall amounts were recorded from the clouds. The purpose of the experiment was to determine if cloud seeding increases rainfall. The rainfall distributions are more nearly symmetric after a log transformation. The log transformation also makes the variance of the two groups more nearly equal.
After a log transformation, a pooled t-test may be appropriate. (Without a transformation it is neither appropriate (failing both the normality and equal variance assumptions) nor significant at .05.) Without transforming, a Mann-Whitney U test would be appropriate.
A boxplot or the dotplot of rainfall for the two groups of clouds is helpful.

Datafile Name: Clouds
Datafile Subjects: Environment
Story Names: Cloud Seeding
Reference: Chambers, Cleveland, Kleiner, and Tukey. (1983). Graphical Methods for Data Analysis. Wadsworth International Group, Belmont, CA, 351. Original Source: Simpson, Alsen, and Eden. (1975). A Bayesian analysis of a multiplicative treatment effect in weather modification. Technometrics 17, 161-166.
Authorization: contact authors
Description: Rainfall from Cloud-Seeding. The rainfall in acre-feet from 52 clouds 26 of which were chosen at random and seeded with silver nitrate.
Number of cases: 26
Variable Names:
1.Unseeded_Clouds: Amount of rainfall from unseeded clouds (in acre-feet)
2.Seeded_Clouds: Amount of rainfall from seeded clouds with silver nitrate (in acre-feet)

Bron&Copyright: DASL



 
Nr.5
Abstract: Management of the growing mustang population on federal lands has been a controversial issue. A suggested method for controlling overpopulation is to sterilize the dominant male in each group. Eagle, Asa, and Garrott et al. (1993) conducted an experiment evaluating the effectiveness of sterilizing the dominant males as a way to reduce foaling (birth) rates for 2 or more years. The researchers chose two Herd Management Areas (HMAs), Flanigan in northwestern Nevada and Beaty Butte in southeastern Oregon, for this study. In December 1985, they rounded up the horses in bands and counted all individual horses, determined their sex, and estimated their ages by looking at tooth wear. They photographed all horses three years old or older and fitted them with numbered collars to assist in identification throughout the study. They identified the dominant male in each band, vasectomized it, and fitted it with a radio-transmitting collar. Finally, they released the band as a group. Between June 1986 and July 1988 they attempted to locate each sterilized male 3-4 times a year by aerial survey from helicopter. The researchers recorded the number of adults and foals in each group containing a sterilized male (treated groups), and in the groups without a sterilized male (untreated groups).
While the researchers could not record actual birthrates in the bands of horses, the number of foals per 100 adults in each band is a good substitute. Graphical methods are useful for illustrating the difference in foal to adult ratio for the treated and untreated groups. Different multiple regression models may be used to evaluate the effect of the treatment while controlling for other variables such as herd size

Datafile Name: Wild Horses
Datafile Subjects: Nature
Story Names: Reining in the Wild Horses
Reference: Eagle, T. C., Asa, C., and Garrott, R. et al. (1993), "Efficacy of Dominant Male Sterilization To Reduce Reproduction in Feral Horses," Wildlife Society Bulletin , 21(2), 116-121.
Authorization: Contact authors
Description: The authors conducted an experiment evaluating whether sterilizing the dominant male in a herd of wild horses would reduce foaling (birth) rates for 2 or more years. In December 1985, they rounded up bands of horses from two Herd Management Areas. They vasectomized the dominant male in each band and released the horses. Between June 1986 and July 1988 they attempted to locate the bands of horses 3-4 times a year by aerial survey from a helicopter. The researchers recorded the number of adults and foals in each group containing a sterilized male (treated groups), and in the groups without a sterilized male (untreated groups).
Number of cases: 38
Variable Names:
1.Adults: total number of adults in the group
2.Sterile_Males: number of sterilized males counted in the group
3.Foals: number of foals counted in the group
4.Year: Year
5.Location: F if in Flanigan Herd Management Area; B if in Beaty Butte Herd Management Area
6.Date: date of the observation
7.Treatment: 1 if sterilized group; 0 if untreated group

Bron&Copyright: DASL