Descriptive statistics, such as measures
of central tendency and variability, help us to understand typical cases in a
sample and the distribution of a variable more clearly. Measures of central tendency
include the mode, the median, and the mean and these provide us with idea of
what may be the typical/average data value in the data set. The mode should be
used only for categorical data as it basically counts the frequencies. The
median should be reported when an unusual data value is present in the data
set. Otherwise, the mean should be reported as it possesses statistically
preferable characteristics.
Measures of variability include the
range, the interquartile range, the variance, and the standard deviation and
they provide us an idea of the accuracy of the measures of central tendency.
The range should be used as a crude measure of variability as it is extremely
sensitive to the presence of unusual data values. The interquartile range should
be reported when unusual or outlying data value is present in the data set.
Otherwise, the standard deviation should be reported as it possesses
statistically preferable characteristics.
A normal distribution is a very
important probability distribution, which can represent many human
characteristics, such as height, weight, and blood pressure. Skewness and
kurtosis can be used to assess whether a variable is normally distributed;
values should be between -1 and +1 standard deviations to be normal. It is important
that variables of interest be normally distributed as most statistical analyses
assume a normal distribution.
When a variable is normally
distributed, 68% of observations will fall within one standard deviation from
the mean, 95% of observations will fall within two standard deviations from the
mean, and 99.7% of observations will fall within three standard deviations from
the mean. Any value that falls outside of the three standard deviation range
can be treated as an unusual value for the data set.
Z-score are a good example of how we
can compute standardized scores to determine where any given score(s) fall in a
normal distribution. We can use standardized scores to make comparisons between
a single score, such as on a standardized test, with all scores.
Instead of estimating an unknown
population parameter with a single number or poit estimate, one can create an
interval, called a confidence interval, as a different way of answering to the
question, “How well does the sample statistic represent an unknown population
parameter?” Confidence intervals are interpreted as the interval that will
include the true parameter with a given confidence level, either 90%, 95%, or
99%. As the percentage of the confidence interval goes up (increased confidence
that the mean falls within that range) the likelihood of confidence interval
including a true population parameter increases.
Tidak ada komentar:
Posting Komentar