A Learning Objectives
Chapter 1: Introduction
- Identify and use statistical notation for variables, sample size, mean, standard deviation, variance, and correlation.
- Calculate and interpret frequencies, proportions, and percentages.
- Create and use frequency distribution plots to describe the central tendency, variability, and shape of a distribution.
- Calculate and interpret measures of central tendency and describe what they represent, including the mean, median, and mode.
- Calculate and interpret measures of variability and describe what they represent, including the standard deviation and variance.
- Apply and explain the process of rescaling variables using means and standard deviations for linear transformations.
- Calculate and interpret the correlation between two variables and describe what it represents in terms of the shape, direction, and strength of the linear relationship between the variables.
- Create and interpret a scatter plot, explaining what it indicates about the shape, direction, and strength of the linear relationship between two variables.
Chapter 2: Measurement, Scales, and Scoring
- Define the process of measurement.
- Define the term construct and describe how constructs are used in measurement, with examples.
- Compare and contrast measurement scales, including nominal, ordinal, interval, and ratio, with examples, and identify their use in context.
- Compare and contrast dichotomous and polytomous scoring.
- Describe how rating scales are used to create composite scores.
- Explain the benefits of composites over component scores.
- Create a generic measurement model and define its components.
- Define norm referencing and identify contexts in which it is appropriate.
- Compare three examples of norm referencing: grade, age, and percentile norms.
- Define criterion referencing and identify contexts in which it is appropriate.
- Describe how standards and performance levels are used in criterion referencing with standardized state tests.
- Compare and contrast norm and criterion score referencing, and identify their uses in context.
Chapter 3: Testing Applications
- Provide examples of how testing supports low-stakes and high-stakes decision-making in education and psychology.
- Describe the general purpose of aptitude testing and some common applications.
- Identify the distinctive features of aptitude tests and the main benefits and limitations in using aptitude tests to inform decision-making.
- Describe the general purpose of standardized achievement testing and some common applications.
- Identify the distinctive features of standardized achievement tests and the main benefits and limitations in using standardized achievement tests to inform decision-making.
- Compare and contrast different types of tests and test uses and identify examples of each, including summative, formative, mastery, and performance.
- Summarize how technology can be used to improve the testing process.
Chapter 4: Test Development
Cognitive
- Describe the purpose of a cognitive learning objective or learning outcome statement, and demonstrate the effective use of learning objectives in the item writing process.
- Describe how a test outline or test plan is used in cognitive test development to align the test to the content domain and learning objectives.
- Compare items assessing different cognitive levels or depth of knowledge, for example, higher-order thinking such as synthesizing and evaluating information versus lower-order thinking such as recall and definitional knowledge.
- Identify and provide examples of SR item types (multiple-choice, true/false, matching) and CR item types (short-answer, essay).
- Compare and contrast SR and CR item types, describing the benefits and limitations of each type.
- Identify the main theme addressed in the item writing guidelines, and how each guideline supports this theme.
- Create and use a scoring rubric to evaluate answers to a CR question.
- Write and critique cognitive test items that match given learning objectives and depths of knowledge and that follow the item writing guidelines.
Noncognitive
- Define affective measurement and contrast it with cognitive measurement in terms of applications and purposes, the types of constructs assessed, and the test construction process.
- Compare affective test construction strategies, with examples of their use, and the strengths and limitations of each.
- Compare and contrast item types and response types used in affective measurement, describing the benefits and limitations of each type, and demonstrating their use.
- Define the main affective response sets, and demonstrate strategies for reducing their effects.
- Write and critique effective affective items using empirical guidelines.
Chapter 5: Reliability
CTT reliability
- Define reliability, including potential sources of reliability and unreliability in measurement, using examples.
- Describe the simplifying assumptions of the classical test theory (CTT) model and how they are used to obtain true scores and reliability.
- Identify the components of the CTT model (\(X\), \(T\), and \(E\)) and describe how they relate to one another, using examples.
- Describe the difference between systematic and random error, including examples of each.
- Explain the relationship between the reliability coefficient and standard error of measurement, and identify how the two are distinguished in practice.
- Calculate the standard error of measurement and describe it conceptually.
- Compare and contrast the three main ways of assessing reliability, test-retest, parallel-forms, and internal consistency, using examples, and identify appropriate applications of each.
- Compare and contrast the four reliability study designs, based on 1 to 2 test forms and 1 to 2 testing occasions, in terms of the sources of error that each design accounts for, and identify appropriate applications of each.
- Use the Spearman-Brown formula to predict change in reliability.
- Describe the formula for coefficient alpha, the assumptions it is based on, and what factors impact it as an estimate of reliability.
- Estimate different forms of reliability using statistical software, and interpret the results.
- Describe factors related to the test, the test administration, and the examinees, that affect reliability.
Interrater reliability
- Describe the purpose of measuring interrater reliability, and how interrater reliability differs from traditional reliability.
- Describe the difference between interrater agreement and interrater reliability, using examples.
- Calculate and interpret indices of interrater agreement and reliability, including proportion agreement, Kappa, Pearson correlation, and g coefficients.
- Identify appropriate uses of each interrater index, including the benefits and drawbacks of each.
- Describe the three main considerations involved in using g coefficients.
Chapter 6: Item Analysis
- Explain how item bias and measurement error negatively impact the quality of an item, and how item analysis, in general, can be used to address these issues.
- Describe general guidelines for collecting pilot data for item analysis, including how following these guidelines can improve item analysis results.
- Identify items that may have been keyed or scored incorrectly.
- Recode variables to reverse their scoring or keyed direction.
- Use the appropriate terms to describe the process of item analysis with cognitive vs noncognitive constructs.
- Calculate and interpret item difficulties and compare items in terms of difficulty.
- Calculate and interpret item discrimination indices, and describe what they represent and how they are used in item analysis.
- Describe the relationship between item difficulty and item discrimination and identify the practical implications of this relationship.
- Calculate and interpret alpha-if-item-deleted.
- Utilize item analysis to distinguish between items that function well in a set and items that do not.
- Remove items from an item set to achieve a target level of reliability.
- Evaluate selected-response options using option analysis.
Chapter 7: Item Response Theory
- Compare and contrast IRT and CTT in terms of their strengths and weaknesses.
- Identify the two main assumptions that are made when using a traditional IRT model, regarding dimensionality and functional form or the number of model parameters.
- Identify key terms in IRT, including probability of correct response, logistic curve, theta, and functions for item response, test response, standard error, and information.
- Define the three item parameters and one ability parameter in the traditional IRT models, and describe the role of each in modeling performance.
- Distinguish between the 1PL, 2PL, and 3PL IRT models in terms of assumptions made, benefits and limitations, and applications of each.
- Describe how IRT is utilized in item analysis, test development, item banking, and computer adaptive testing.
Chapter 8: Factor Analysis
- Compare and contrast the factor analytic model with other measurement models, including CTT and IRT, in terms of their applications in instrument development.
- Describe the differing purposes of exploratory and confirmatory factor analysis.
- Explain how an EFA is implemented, including the type of data required and steps in setting up and fitting the model.
- Interpret EFA results, including factor loadings and eigenvalues.
- Use a scree plot to visually compare factors in an EFA.
- Explain how a CFA is implemented, including the type of data required and steps in setting up and fitting the model.
- Interpret CFA results, including factor loadings and fit indices.
Chapter 9: Validity
- Define validity in terms of test score interpretation and use, and identify and describe examples of this definition in context.
- Compare and contrast three main sources of validity evidence (content, criterion, and construct), with examples of how each type is established, including the validation process involved with each.
- Explain the structure and function of a test outline, and how it is used to provide evidence of content validity.
- Calculate and interpret a validity coefficient, describing what it represents and how it supports criterion validity.
- Describe how unreliability can attenuate a correlation, and how to correct for attenuation in a validity coefficient.
- Identify appropriate sources of validity evidence for given testing applications and describe how certain sources are more appropriate than others for certain applications.
- Describe the unified view of validity and how it differs from and improves upon the traditional view of validity.
- Identify threats to validity, including features of a test, testing process, or score interpretation or use, that impact validity. Consider, for example, the issues of content underrepresentation and misrepresentation, and construct irrelevant variance.
Chapter 10: Test Evaluation
- Review and critique the documentation contained in a test review, test manual, or technical report, including:
- Data collection and test administration designs,
- Reliability analysis results,
- Validity analysis results,
- Scoring and reporting guidelines,
- Recommendations for test use.
- Data collection and test administration designs,
- Compare and contrast tests using reported information.
- Use information reported for a test to determine the appropriateness of the test for a given application.