Lecture 5 Survey Research & Design in Psychology James Neill, 2010 Exploratory Factor Analysis
Overview
What is factor analysis?
Assumptions
Steps / Process
Examples
Summary
What is factor analysis?
What is factor analysis?
Purpose
History
Types
Models
Universe : Galaxy All variables : Factor
Conceptual model of factor analysis FA uses correlations among many items to search for common clusters.
Factor analysis...
is used to identify clusters of inter-correlated variables (called ' factors ').
is a family of multivariate statistical techniques for examining correlations amongst variables.
empirically tests theoretical data structures .
is commonly used in psychometric instrument development .
Purposes There are two main applications of factor analytic techniques:
Data reduction : Reduce the number of variables to a smaller number of factors.
Theory development : Detect structure in the relationships between variables, that is, to classify variables.
Purposes: Data reduction
Simplifies data structure by revealing a smaller number of underlying factors
Helps to eliminate or identify items for improvement :
redundant variables
unclear variables
irrelevant variables
Leads to calculating factor scores
Purposes: Theory development
Investigates the underlying correlational pattern shared by the variables in order to test theoretical models e.g., how many personality factors are there?
The goal is to address a theoretical question as opposed to calculating factor scores.
History of factor analysis
Invented by Charles Spearman (1904)
Usage hampered by onerousness of hand calculation
Since the advent of computers, usage has thrived, esp. to develop:
Theory e.g., determining the structure of personality
Practice e.g., development of 10,000s+ of psychological screening & measurement tests
EFA = Exploratory Factor Analysis
explores & summarises underlying correlational structure for a data set
CFA = Confirmatory Factor Analysis
tests the correlational structure of a data set against a hypothesised structure and rates the “goodness of fit”
Two main types of FA: Exploratory vs. confirmatory factor analysis
This (introductory) lecture focuses on Exploratory Factor Analysis (recommended for undergraduate level). However, note that Confirmatory Factor Analysis (and Structural Equation Modeling) is generally preferred but is more advanced and recommended for graduate level. This lecture focuses on exploratory factor analysis
Conceptual model - Simple model
e.g., 12 items testing might actually tap only 3 underlying factors
Factors consist of relatively homogeneous variables.
Factor 1 Factor 2 Factor 3
Eysenck’s 3 personality factors e.g., 12 items testing three underlying dimensions of personality Extraversion/ introversion Neuroticism Psychoticism talkative shy sociable fun anxious gloomy relaxed tense unconventional nurturing harsh loner
Question 1 Conceptual model - Simple model Question 2 Question 3 Question 4 Question 5 Factor 1 Factor 2 Factor 3 Each question loads onto one factor
Question 1 Conceptual model - Complex model Question 2 Question 3 Question 4 Question 5 Factor 1 Factor 2 Factor 3 Questions may load onto more than one factor
Conceptual model – Area plot Correlation between X1 and X2 A theoretical factor which is partly measured by the common aspects of X1 and X2
How many factors? One factor? Three factors? Nine factors? (independent items)
Does personality consist of 2, 3, or 5, 16, etc. factors? e.g., the “Big 5”?
Neuroticism
Extraversion
Agreeableness
Openness
Conscientiousness
Example: Personality
Does intelligence consist of separate factors, e.g,.
Verbal
Mathematical
Interpersonal, etc.?
...or is it one global factor (g)? ...or is it hierarchically structured? Example: Intelligence
Example: Essential facial features ( Ivancevic, 2003)
Six orthogonal factors, represent 76.5% of the total variability in facial recognition (in order of importance):
upper-lip
eyebrow-position
nose-width
eye-position
eye/eyebrow-length
face-width
Example: Essential facial features ( Ivancevic, 2003)
Assumptions
GIGO
Sample size
Levels of measurement
Normality
Linearity
Outliers
Factorability
G arbage . I n . G arbage . O ut
Screen the data
Use variables that theoretically “go together”
Assumption testing: Sample size Some guidelines:
Min. : N > 5 cases per variable
e.g., 20 variables, should have > 100 cases (1:5)
Ideal : N > 20 cases per variable
e.g., 20 variables, ideally have > 400 cases (1:20)
Total N > 200 preferable
Assumption testing: Sample size Comrey and Lee (1992): 50 = very poor, 100 = poor, 200 = fair, 300 = good, 500 = very good 1000+ = excellent
Assumption testing: Sample size
Assumption testing: Level of measurement
All variables must be suitable for correlational analysis, i.e., they should be ratio/metric data or at least Likert data with several interval levels.
Assumption testing: Normality
FA is robust to violation of assumptions of normality
If the variables are normally distributed then the solution is enhanced
Assumption Testing: Linearity
Because FA is based on correlations between variables, it is important to check there are linear relations amongst the variables (i.e., check scatterplots)
15 classroom behaviours of high-school children were rated by teachers using a 5-point scale.
Task : Identify groups of variables (behaviours) that are strongly inter-related & represent underlying factors.
Example factor analysis: Classroom behaviour
Classroom behaviour items
Cannot concentrate ? can concentrate
Curious & enquiring ? little curiousity
Perseveres ? lacks perseverance
Irritable ? even-tempered
Easily excited ? not easily excited
Patient ? demanding
Easily upset ? contented
Control ? no control
Relates warmly to others ? disruptive
Persistent ? frustrated
Difficult ? easy
Restless ? relaxed
Lively ? settled
Purposeful ? aimless
Cooperative ? disputes
Classroom behaviour items
Classroom behaviour items
Assumption testing: Factorability Check the factorability of the correlation matrix (i.e., how suitable is the data for factor analysis?) by one or more of the following methods:
Correlation matrix correlations > .3?
Anti-image matrix diagonals > .5?
Measures of sampling adequacy (MSAs)?
Bartlett’s sig.?
KMO > .5 or .6?
Assumption testing: Factorability (Correlations) Are there SOME correlations over .3? If so, proceed with FA Takes some effort with a large number of variables, but accurate
Examine the diagonals on the anti-image correlation matrix
Consider variables with correlations less that .5 for exclusion from the analysis – they lack sufficient correlation with other variables
Used to reduce data to a set of factor scores for use in other analyses
Analyses all the variance in each variable
Principal axis factoring (PAF)
Used to uncover the structure of an underlying set of p original variables
More theoretical
Analyses only shared variance (i.e. leaves out unique variance)
Total variance of a variable Principal Components (PC) Principal Axis Factoring (PAF)
Often there is little difference in the solutions for the two procedures.
If unsure, check your data using both techniques
If you get different solutions for the two methods, try to work out why and decide on which solution is more appropriate
PC vs. PAF
Communalities
Each variable has a communality =
the proportion of its variance explained by the extracted factors
sum of the squared loadings for the variable on each of the factors
Ranges between 0 and 1
If communality for a variable is low (e.g., < .5, consider extracting more factors or removing the variable)
High communalities (> .5): Extracted factors explain most of the variance in the variables being analysed
Low communalities (< .5): A variable has considerable variance unexplained by the extracted factors
May then need to extract MORE factors to explain the variance
Communalities
Communalities - 2
Explained variance
A good factor solution is one that explains the most variance with the fewest factors
Realistically, happy with 50-75% of the variance explained
Explained variance 3 factors explain 73.5% of the variance in the items
Eigen values
Each factor has an eigen value
Indicates overall strength of relationship between a factor and the variables
Sum of squared correlations
Successive EVs have lower values
Rule of thumb: Eigen values over 1 are ‘stable’ (Kaiser's criterion)
Explained variance The eigen values ranged between .16 and 9.35. Two factors satisfied Kaiser's criterion (EVs > 1) but the third EV is .93 and appears to be a useful factor.
Scree plot
A line graph of Eigen Values
Depicts amount of variance explained by each factor
Cut-off: Look for where additional factors fail to add appreciably to the cumulative explained variance
1st factor explains the most variance
Last factor explains the least amount of variance
Scree plot
Scree plot Scree plot
Scree plot Scree plot
How many factors? A subjective process ... Seek to explain maximum variance using fewest factors, considering:
Theory – what is predicted/expected?
Eigen Values > 1? (Kaiser’s criterion)
Scree Plot – where does it drop off?
Interpretability of last factor ?
Try several different solutions ? (consider FA type, rotation, # of factors)
Factors must be able to be meaningfully interpreted & make theoretical sense?
How many factors?
Aim for 50-75% of variance explained with 1/4 to 1/3 as many factors as variables/items.
Stop extracting factors when they no longer represent useful/meaningful clusters of variables
Keep checking/clarifying the meaning of each factor – make sure you are reading the full wording of each item.
Factor loadings (FLs) indicate relative importance of each item to each factor.
In the initial solution , each factor tries “selfishly” to grab maximum unexplained variance.
All variables will tend to load strongly on the 1st factor
Initial solution: Unrotated factor structure
Initial solution - Unrotated factor structure
Factors are weighted combinations of variables
A factor matrix shows variables in rows and factors in columns
1st factor extracted:
Best possible line of best fit through the original variables
Seeks to explain lion's share of all variance
A single factor, best summary of the variance in the whole set of items
Initial solution - Unrotated factor structure
Each subsequent factor tries to explain the maximim possible amount of remaining unexplained variance.
Second factor is orthogonal to first factor - seeks to maximise its own eigen value (i.e., tries to gobble up as much of the remaining unexplained variance as possible)
Initial solution - Unrotated factor structure
Vectors (Lines of best fit)
Initial solution: Unrotated factor structure
Seldom see a simple unrotated factor structure
Many variables load on 2 or more factors
Some variables may not load highly on any factors (check: low communality)
Until the FLs are rotated, they are difficult to interpret.
Rotation of the FL matrix helps to find a more interpretable factor structure.
Two basic types of factor rotation Orthogonal (Varimax) Oblique (Oblimin)
Two basic types of factor rotation
Orthogonal minimises factor covariation, produces factors which are uncorrelated
Oblimin allows factors to covary, allows correlations between factors
Orthogonal rotation
Why rotate a factor loading matrix?
After rotation, the vectors (lines of best fit) are rearranged to optimally go through clusters of shared variance
Then the FLs and the factor they represent can be more readily interpreted
Why rotate a factor loading matrix?
A rotated factor structure is simpler & more easily interpretable
each variable loads strongly on only one factor
each factor shows at least 3 strong loadings
all loading are either strong or weak, no intermediate loadings
Orthogonal vs. oblique rotations
Consider purpose of factor analysis
If in doubt, try both
Consider interpretability
Look at correlations between factors in oblique solution
if >.3 then go with oblique rotation (>10% shared variance between factors)
Interpretability
It is dangerous to be driven by factor loadings only – think carefully - be guided by theory and common sense in selecting factor structure.
You must be able to understand and interpret a factor if you’re going to extract it.
Interpretability
However, watch out for ‘seeing what you want to see’ when evidence might suggest a different, better solution.
There may be more than one good solution! e.g., in personality
2 factor model
5 factor model
16 factor model
Factor loadings & item selection A factor structure is most interpretable when: 1. Each variable loads strongly (> + .40) on only one factor 2. Each factor shows 3 or more strong loadings; more loadings = greater reliability 3. Most loadings are either high or low, few intermediate values. 4. These elements give a ‘simple’ factor structure.
-> ? reliability -> ? 'roundedness' -> Law of diminishing returns
Typically = 4 to 10 is reasonable
How many items per factor?
How do I eliminate items? A subjective process; consider:
Size of main loading (min = .4)
Size of cross loadings (max = .3?)
Meaning of item (face validity)
Contribution it makes to the factor
Eliminate 1 variable at a time, then re-run, before deciding which/if any items to eliminate next
Number of items already in the factor
Factor loadings & item selection Comrey & Lee (1992): loadings > .70 - excellent > .63 - very good > .55 - good > .45 - fair > .32 - poor
Factor loadings & item selection Cut-off for acceptable loadings:
Look for gap in loadings (e.g., .8, .7, .6 , .3, .2 )
Choose cut-off because factors can be interpreted above but not below cut-off
Other considerations: Normality of items Check the item descriptives . e.g. if two items have similar Factor Loadings and Reliability analysis, consider selecting items which will have the least skew and kurtosis. The more normally distributed the item scores, the better the distribution of the composite scores.
Factor analysis in practice
To find a good solution, consider each combination of:
PC-varimax
PC-oblimin
PAF-varimax
PAF-oblimin
Apply the above methods to a range of possible/likely factors, e.g., for 2, 3, 4, 5, 6, and 7 factors
Eliminate poor items one at a time, retesting the possible solutions
Check factor structure across sub-groups (e.g., gender) if there is sufficient data
You will probably come up with a different solution from someone else!
Check/consider reliability analysis
Factor analysis in practice
Example: Condom use
The Condom Use Self-Efficacy Scale (CUSES) was administered to 447 multicultural college students.
PC FA with a varimax rotation.
Three distinct factors were extracted:
Appropriation
Sexually Transmitted Diseases
Partners' Disapproval
Factor loadings & item selection .56 I feel confident I could gracefully remove and dispose of a condom after sexual intercourse .61 I feel confident I could remember to carry a condom with me should I need one .65 I feel confident I could purchase condoms without feeling embarrassed .75 I feel confident in my ability to put a condom on myself or my partner FL Factor 1: Appropriation
Factor loadings & item selection .80 I would not feel confident suggesting using condoms with a new partner because I would be afraid he or she would think I thought they had a sexually transmitted disease .86 I would not feel confident suggesting using condoms with a new partner because I would be afraid he or she would think I have a sexually transmitted disease .72 I would not feel confident suggesting using condoms with a new partner because I would be afraid he or she would think I've had a past homosexual experience FL Factor 2: STDs
Factor loadings & item selection .58 If my partner and I were to try to use a condom and did not succeed, I would feel embarrassed to try to use one again (e.g. not being able to unroll condom, putting it on backwards or awkwardness) .65 If I were unsure of my partner's feelings about using condoms I would not suggest using one .73 If I were to suggest using a condom to a partner, I would feel afraid that he or she would reject me FL Factor 3: Partner's reaction
Summary
Introduction
Assumptions
Steps/Process
Introduction: Summary
Factor analysis is a family of multivariate correlational data analysis methods for summarising clusters of covariance .
FA summarises correlations amongst items.
The common clusters (called factors) are summary indicators of underlying fuzzy constructs.
Assumptions: Summary
Sample size
5+ cases per variables (ideally 20+ cases per variable)
N > 200
Bivariate & multivariate outliers
Factorability of correlation matrix (Measures of Sampling Adequacy)
Select items (check factor loadings to identify which items belong in which factor; drop items one by one; repeat)
Name and define factors
Examine correlations amongst factors
Analyse internal reliability
Compute composite scores
Next lecture
Summary: Types of FA
PC: Data reduction
uses all variance
PAF: Theoretical data exploration
uses shared variance
Try both ways
Are solutions different? Why?
Summary: Rotation
Orthogonal (varimax)
perpendicular vectors
Oblique (oblimin)
angled vectors
Try both ways
Are solutions different? Why?
No. of factors to extract?
Inspect Evs
look for > 1 or sudden drop (Inspect scree plot)
% of variance explained
aim for 50 to 75%)
Interpretability / theory
Summary: Factor extraction
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
Ivancevic, V., Kaine, A.K., MCLindin, B.A, & Sunde, J. (2003). Factor analysis of essential facial features . In the proceedings of the 25th International Conference on Information Technology Interfaces (ITI), pp. 187-191, Cavtat, Croatia.
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods , 4 (3), 272-299.
Tabachnick, B. G. & Fidell, L. S. (2001). Principal components and factor analysis. In Using multivariate statistics . (4th ed., pp. 582 - 633). Needham Heights, MA: Allyn & Bacon.
References
Open Office Impress
This presentation was made using Open Office Impress.
Free and open source software.
http://www.openoffice.org/product/impress.html
Embed HTML
Set your desired dimension then copy the code below to your blog/website.
Width:
Height:
Code:
Exploratory factor analysis
Lecture 5 Survey Research & Design in Psychology James Neill, 2010 Exploratory Factor Analysis
Overview
What is factor analysis?
Assumptions
Steps / Process
Examples
Summary
What is factor analysis?
What…
Download Exploratory factor analysis
To continue, complete human verification below.
We need to make sure that you are not a malicious bot or virus. Complete the puzzle to download the document. If you got confused by the question, click reload button to change the puzzle.