Comparisons of and Concerns about Two Testing Application Chapters in the 2014 Standards for Educational and Psychological Testing

This paper compares two “Testing Application” chapters of the 2014 Standards for Educational and Psychological Testing (hereinafter, Standards), jointly published by the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME). The two chapters are Chapters 10 and 12 of the Standards: “Psychological Testing and Assessment (PTA)” and “Educational Testing and Assessment (ETA)”. It specifically aims to raise some overarching issues related to these chapters. An in-depth comparative analysis was conducted based on specific similarities of and differences between these two chapters. Both PTA and ETA cover the background of and standards regarding test administration, score interpretation, and the use of scores. However, PTA focuses more on test selection and test security, whereas ETA covers more on the test design and development. The overarching issues, questions, and concerns related to both chapters are discussed along with the results of the analysis. The paper concludes with a description of the differences between the two chapters from the current 2014 Standards and those of the previous 1999 version along with some plausible explanations for such discrepancies. The summary and analysis may be useful to test users and graduate students from the psychology and education fields whose interests revolve around testing and assessment practices.


Introduction
The 2014 Standards is a joint publication of three sponsoring associations, the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME). It is published to provide updates to the documentation in the previous 1999 edition. Linn (2006) noted that the 1999 version was "widely recognized as the most authoritative statement of professional consensus regarding the development and evaluation of educational and psychological tests" (p.27). Hence, the 2014 publication primarily serves two purposes (AERA et al., 2014, p.1): "[T]o provide criteria for the development and evaluation of tests and testing practices," and "to provide guidelines for assessing the validity of interpretations of test scores for the intended test uses." The Standards (AERA et al., 2014) is intended for professionals who specify, develop, or select tests and for those who interpret or evaluate the technical quality of test results. Other audiences include professional test sponsors,  , 1966, 1974, 1985, 1999). chapter. An in-depth comparative analysis was conducted based on similarities of and differences between these two chapters. The paper also discusses overarching issues, questions, and concerns related to both chapters along with the results of the analysis. It concludes with a description of the differences between the two chapters from the current 2014 Standards and those of the previous 1999 version along with some plausible explanations for such discrepancies.

Summary of the Two Chapters
PTA and ETA share several similarities in terms of their standards and their background information. Both PTA and ETA cover the background of and standards regarding test administration, score interpretation, and the use of scores (p. 152-168, p. 183-200). However, they differ in the emphasis placed on certain themes. PTA focuses more on test selection (p. 152) and test security (p.168), whereas ETA covers more the test design and development (p. 184).
Chapter 10 of the Standards (AERA et al., 2014) focuses on the standards for PTA. The chapter is based on the premise that a psychology test (Bornstein, 2017; Camara, Nathan, & Puente, 2000; Groth-Marnat & Wright, 2016) will meet its intended purpose(s) through meaningful interaction between the stakeholders involved in a setting in which the test is used and a relevant set of sophisticated professional activities to produce useful results, inventories, and interpretations. Chapter 12 delineates standards in the context of ETA. The purpose of the standards and the target audience to whom the standards are currently applied are stated at the beginning of each chapter. The content of the standards from both chapters is summarized and listed in Table 1.

Specific Similarities of the Two Chapters
Although PTA and ETA are conducted in separate fields, they share similar test and assessment guidelines with respect to testing practices. In this section, a few key standards from both fields will be highlighted and discussed.
Standards 10.1 and 10.3 from PTA are closely related to Standard 12.15 from ETA. The two PTA standards suggest that professionals should receive proper training and acquire the certification required to administer and interpret the tests. In the ETA chapter, Standard 12.15 states the importance of considering the professional qualifications of the school personnel responsible for interpreting the test scores for the purpose of informing instruction and making decisions within the school context. These standards speak to the appropriateness of the skills and knowledge of professionals and supervisees when administering, scoring, and interpreting assessments. Specifically, professionals and supervisees, according to both the ETA and the PTA, should be qualified to administer, score, and interpret tests and assessments as demonstrated by their education, training, experience, and credentials.
Standard 10.15 from PTA states that interpretation of test results for diagnostic purposes should be based on multiple sources of data. Similarly, Standard 12.10 from ETA states that decisions which may have a major impact on students should take into consideration relevant information in addition to single test scores. Both standards suggest the importance of providing collateral information when using test scores to make decisions that will have a major impact on an individual. In most cases, multiple data sources will often enhance the appropriateness of decision making.
Both ETA and PTA emphasize that test users should be familiar with the evidence of the validity and the reliability from the test when drawing inferences from test scores. Standard 10.2 requires the test users to know the validity and reliability of the test scores that are supported by logical analysis. Standard 12.14 from the ETA also states that professionals should be familiar with the reliability and validity of the test scores for their intended purpose as well as test fairness.
Regardless of whether tests and assessments are used in combination for multiple purposes or, when tests are packaged together for purposes of administration, evidence should be provided for each purpose of the test or for the package of the tests. Standard 10.16 states that, when tests are used in combination with one another, professionals should review the evidence for combining tests and Standard 12.2 states that one source of evidence for a test is not sufficient when the test is used for multiple purposes. It is to the task of the testing professionals to obtain evidence for every instance of the use of the assessment.
Finally, Standards 10.4 and 12.4 focus on the relationship of the test with the construct/knowledge the test is supposed to capture. Both standards state that evidence should be provided to support the extent to which the test samples the range of knowledge of the assessment and whether it meets its intended purposes. Also it is important to state explicitly the target domain that the test represents and those aspects that the test fails to represent (as in Standard 12.4).

Specific Differences between the Two Chapters
Despite the similarities that ETA and PTA share, there are also many differences between them. Chapter 10 (PTA) contains more emphasis on the test security than does Chapter 12 (ETA), based on their content. Test security is the sole topic of Standard 10.18 of PTA, which emphasizes that the obsolete versions of any test should not be kept available for the public. Standard 12.16 is the only part of Chapter 12 which discusses protocols for test security. The commentary note in Standard 10.18 of PTA provides extensive instruction on how to keep test information secure but no explicit standard and example are mentioned in ETA. However, although the ETA standards do not provide details about test security procedures, testing organization provides such information at the operational level. The North Carolina Department of Public Instruction (NCDPI, October 2019), for example, provides testing security protocols in extensive detail. This may indicate that test security level largely depends on the stakes of the test in educational settings whereas the stakes of assessment in psychological settings are usually very high.
ETA describes the general procedures of test design and development (TDD) in educational assessment settings, however, PTA emphasizes selection of psychological tests for a particular test taker. ETA addresses topics such as minimizing the negative consequences of a test (Standard 12.1) as well as obtaining the evidence of validity and reliability for a test with multiple purposes (Standard 12.2). The aspects discussed in the TDD cluster of Standards from the ETA chapter also overlap with the general TDD chapter in Chapter 4 of the Standards (AERA et al., 2014). For example, the cluster provides general guidance for specifying a test (p. 83), developing and reviewing items (p. 85), and revising the test (p. 87). However, the standards of TDD specific to the ETA chapter are not mentioned in Chapter 4 given the nature and context of the educational field. On the contrary, no explicit cluster on TDD is available in Chapter 10 as Chapter 4 has already described the standards related to PTA. PTA emphasizes test selection as well as concerns about topics related to choosing suitable tests for test takers (Standard 10.5) and choosing credible tests used for the diagnosis of different groups (Standard 10.6). There is no topic on test selection in the ETA chapter or in the related chapters under the "Part II: Operations" section in the Standards (AERA et al., 2014). This is not the case for the PTA chapter because professionals in the psychology field need to know the intended reasons to use a test in the assessment of a test taker (cf., Haladyna

Discussion
In this section some of the issues and questions which may have occurred to the reader are explored. For the scope of this summary I chose to focus on two of the issues.

Comparisons of the Chapters between the 1999 and
2014 Standards (AERA, APA, & NCME, 1999, 2014) Retrospective comparisons with the Standards published in 1999 indicate that the standards from the two chapters of the 2014 version are more organized in light of the categorization of standards into particular themes of testing practices (see Table 1). Previous standards without commentary notes have been extended, with the addition of elaborative comments regarding the stated standards, and several separate yet repetitive standards from 1999 As stated in the introductory chapter of the Standards (AERA et al., 2014), the ETA chapter was "rewritten to attend to the issues associated with uses of tests for educational accountability purposes" (p. 4). In some of the standards in the ETA chapter, repetitive words in the 1999 version were revised and specific terminologies (e.g., dates of administration, test takers' age) were generalized (e.g., contextual information, demographics). One instance of a profound revision existing in the 2014 chapter concerns the issue of fairness in testing in ETA. The newly added Standard 12.3 and the revised Standard 12.13 call for targeted subgroups to have an equal chance to demonstrate proficiency on a test by considering all relevant steps of the testing process, such as the use of technology, accommodations, and modifications, which was not highlighted in the 1999 Standards (AERA et al., 1999). Also, the topics on reporting test scores have been significantly revised and polished, likely to attempt to achieve more accountable testing practices (Smith & Fey, 2000) (cf., Battistin, 2016).

Questions about and Issues concerning about the Two Chapters
As mentioned previously, the ETA chapter does not have a standard dedicated to test security as the PTA chapter does. With the increasing trend of computer-based assessments (Boevé et Lissitz & Jiao, 2012). This is an issue that should be addressed now (cf., International Test Commission (ITC) guidelines on Computer-Based and Internet Delivered Testing (ITC, July 2005) and on the Security of Tests, Examinations, and Other Assessments (ITC, July 2014)) but should also be considered for future versions of the Standards.
Although both chapters present a comprehensive summary of the standards for their respective fields, there may exist standards in one field that should or could be explored in the other field. Worth noting is that standards that provide value not only to their intended area but can be generalized to others as well. For example, Standard 12.12 of ETA mentions that if two tests from a person are compared, the extent to which the two test statistics are very similar and the standard error of the test scores should be considered. This issue is not mentioned in Chapter 10 thus triggers another question concerning whether it should be addressed in PTA as professionals may use multiple tests to make a decision on an individual (Standard 10.6).

Conclusions
This paper's cross-chapter comparisons of the 2014 Standards (AERA et al., 2014) aim to provide to test users and graduate students interested in testing and assessment some holistic insights regarding the focus, similarities, and differences of testing practices in the psychology and educational fields. With some of the issues and concerns raised, it is hoped that graduate students will be able to pursue their research and professional endeavors by adhering to the latest standards and guidelines. It is hoped that they would be able to generate significant research questions that can contribute to the field of testing and assessment in education and psychology. Future work may continue the comparison between the two chapters and Chapter 11: The Workplace Testing and Credentialing chapter, as discussion on certification/licensure and employment testing is often integrated with educational and psychological testing assessment, especially in the context of higher education (Sackett et al., 2008).
The brief comparison of the 1999 and 2014 versions may prove useful for test users and graduate students so that they will be aware of the progression of the Standards and are also cognizant of the latest testing standards, development, and agenda especially in the North American testing context. Although there remain improvements to the Standards (AERA et al., 2014) which may be made in the future, it is important for all novice and experienced testing practitioners, scholars, and users in the discussed fields to refer consistently to the standards and comply with most, if not all, of the provided guidelines to ensure sound and practical testing practices.