Maria Cholifah (林玛丽亚): DESCRIPTIVE STATISTICS

Selasa, 12 Desember 2017

DESCRIPTIVE STATISTICS

Descriptive statistics, such as measures of central tendency and variability, help us to understand typical cases in a sample and the distribution of a variable more clearly. Measures of central tendency include the mode, the median, and the mean and these provide us with idea of what may be the typical/average data value in the data set. The mode should be used only for categorical data as it basically counts the frequencies. The median should be reported when an unusual data value is present in the data set. Otherwise, the mean should be reported as it possesses statistically preferable characteristics.

Measures of variability include the range, the interquartile range, the variance, and the standard deviation and they provide us an idea of the accuracy of the measures of central tendency. The range should be used as a crude measure of variability as it is extremely sensitive to the presence of unusual data values. The interquartile range should be reported when unusual or outlying data value is present in the data set. Otherwise, the standard deviation should be reported as it possesses statistically preferable characteristics.

A normal distribution is a very important probability distribution, which can represent many human characteristics, such as height, weight, and blood pressure. Skewness and kurtosis can be used to assess whether a variable is normally distributed; values should be between -1 and +1 standard deviations to be normal. It is important that variables of interest be normally distributed as most statistical analyses assume a normal distribution.

When a variable is normally distributed, 68% of observations will fall within one standard deviation from the mean, 95% of observations will fall within two standard deviations from the mean, and 99.7% of observations will fall within three standard deviations from the mean. Any value that falls outside of the three standard deviation range can be treated as an unusual value for the data set.

Z-score are a good example of how we can compute standardized scores to determine where any given score(s) fall in a normal distribution. We can use standardized scores to make comparisons between a single score, such as on a standardized test, with all scores.

Instead of estimating an unknown population parameter with a single number or poit estimate, one can create an interval, called a confidence interval, as a different way of answering to the question, “How well does the sample statistic represent an unknown population parameter?” Confidence intervals are interpreted as the interval that will include the true parameter with a given confidence level, either 90%, 95%, or 99%. As the percentage of the confidence interval goes up (increased confidence that the mean falls within that range) the likelihood of confidence interval including a true population parameter increases.

...