Selasa, 12 Desember 2017

OTHER MULTIVARIATE DATA ANALYSIS: DISCRIMINANT ANALYSIS, CLUSTER ANALYSIS, OTHERS

Multivariate Data Analysis refers to any statistical technique used to analyze data that arises from more than one variable.

Canonical Correlation/Regression
It is also known as multiple multiple regression or multivariate multiple regression. All other multivariate techniques may be viewed as simplifications or special cases of this “fully multivariate general linear model.”
We have two sets of variables (set X and set Y). We wish to create a linear combination of the X variables (b1X1 + b2X2 + .... + bpXp), called a canonical variate, that is maximally correlated with a linear combination of the Y variables (a1Y1 + a2Y2 + .... + aqYq). The coefficients used to weight the X’s and the Y’s are chosen with one criterion, maximize the correlation between the two linear combinations.

Logistic Regression
Logistic regression is used to predict a categorical (usually dichotomous) variable from a set of predictor variables. With a categorical dependent variable, discriminant function analysis is usually employed if all of the predictors are continuous and nicely distributed; logit analysis is usually employed if all of the predictors are categorical; and logistic regression is often chosen if the predictor variables are a mix of continuous and categorical variables and/or if they are not nicely distributed (logistic regression makes no assumptions about the distributions of the predictor variables).

Principal Components and Factor Analysis
Here we start out with one set of variables. The variables are generally correlated with one another. We wish to reduce the (large) number of variables to a smaller number of components or capture most of the variance in the observed variables. Each factor (or component) is estimated as being a linear (weighted) combination of the observed variables. We could extract as many factors as there are variables, but generally most of them would contribute little, so we try to get a few factors that capture most of the variance. Our initial extraction generally includes the restriction that the factors be orthogonal, independent of one another.

Discriminant Function Analysis
It is to predict group membership from a set of two or more continuous variables. The analysis creates a set of discriminant functions (weighted combinations of the predictors) that will enable us to predict into which group a case falls, based on scores on the predictor variables (usually continuous, but could include dichotomous variables and dummy coded categorical predictors). The total possible number of discriminant functions is one less than the number of groups, or the number of predictor variables, whichever is less.

Multiple Analysis Of Variance, MANOVA
In MANOVA the Y’s are weighted to maximize the correlation between their linear combination and the X’s. A different linear combination (canonical variate) is formed for each effect (main effect or interaction—in fact, a different linear combination is formed for each treatment df—thus, if an independent variable consists of four groups, three df, there are three different linear combinations constructed to represent that effect, each orthogonal to the others). Standardized discriminant function coefficients (weights for predicting X from the Y’s) and loadings (for each linear combination of Y’s, the correlations between the linear combination and the Y’s themselves) may be used better to define the effects of the factors and their interactions. One may also do a “step down analysis” where one enters the Y’s in an a priori order of importance (or based solely on statistical criteria, as in stepwise multiple regression). At each step one evaluates the contribution of the newly added Y, above and beyond that of the Y’s already entered.

Cluster Analysis

In a cluster analysis the goal is to cluster cases (research units) into groups that share similar characteristics. Contrast this goal with the goal of principal components and factor analysis, where one groups variables into components or factors based on their having similar relationships with  latent variables.
...

Tidak ada komentar: