Test (assessment)







A test or examination (informally, exam) is an assessment intended to measure a test-taker's knowledge, skill, aptitude, physical fitness, or classification in many other topics (e.g., beliefs). A test may be administered orally, on paper, on a computer, or in a confined area that requires a test taker to physically perform a set of skills. Tests vary in style, rigor and requirements. For example, in a closed book test, a test taker is often required to rely upon memory to respond to specific items whereas in an open book test, a test taker may use one or more supplementary tools such as a reference book or calculator when responding to an item. A test may be administered formally or informally. An example of an informal test would be a reading test administered by a parent to a child. An example of a formal test would be a final examination administered by a teacher in a classroom or an I.Q. test administered by a psychologist in a clinic. Formal testing often results in a grade or a test score. A test score may be interpreted with regards to a norm or criterion, or occasionally both. The norm may be established independently, or by statistical analysis of a large number of participants.

A standardized test is any test that is administered and scored in a consistent manner to ensure legal defensibility. Standardized tests are often used in education, professional certification, psychology (e.g., MMPI), the military, and many other fields.

A non-standardized test is usually flexible in scope and format, variable in difficulty and significance. Since these tests are usually developed by individual instructors, the format and difficulty of these tests may not be widely adopted or used by other instructors or institutions. A non-standardized test may be used to determine the proficiency level of students, to motivate students to study, and to provide feedback to students. In some instances, a teacher may develop non-standardized tests that resemble standardized tests in scope, format, and difficulty for the purpose of preparing their students for an upcoming standardized test. Finally, the frequency and setting by which a non-standardized tests are administered are highly variable and are usually constrained by the duration of the class period. A class instructor may for example, administer a test on a weekly basis or just twice a semester. Depending on the policy of the instructor or institution, the duration of each test itself may last for only five minutes to an entire class period.

In contrasts to non-standardized tests, standardized tests are widely used, fixed in terms of scope, difficulty and format, and are usually significant in consequences. Standardized tests are usually held on fixed dates as determined by the test developer, educational institution, or governing body, which may or may not be administered by the instructor, held within the classroom, or constrained by the classroom period. Although there is little variability between different copies of the same type of standardized test (e.g., SAT or GRE), there is variability between different types of standardized tests.

Any test with important consequences for the individual test taker is referred to as a high-stakes test.

A test may be developed and administered by an instructor, a clinician, a governing body, or a test provider. In some instances, the developer of the test may not be directly responsible for its administration. For example, Educational Testing Service (ETS), a nonprofit educational testing and assessment organization, develops standardized tests such as the SAT but may not directly be involved in the administration or proctoring of these tests. As with the development and administration of educational tests, the format and level of difficulty of the tests themselves are highly variable and there is no general consensus or invariable standard for test formats and difficulty. Often, the format and difficulty of the test is dependent upon the educational philosophy of the instructor, subject matter, class size, policy of the educational institution, and requirements of accreditation or governing bodies. In general, tests developed and administered by individual instructors are non-standardized whereas tests developed by testing organizations are standardized.

Early history
Ancient China was the first country in the world that implemented a nationwide standardized test, which was called the imperial examination. The main purpose of this examination was to select for able candidates for specific governmental positions. The imperial examination was established by the Sui Dynasty in 605 AD and was later abolished by the Qing Dynasty 1300 years later in 1905. England had adopted this examination system in 1806 to select specific candidates for positions in Her Majesty's Civil Service,modeled on the Chinese imperial examination. This examination system was later applied to education and it started to influence other parts of the world as it became a prominent standard (e.g. regulations to prevent the markers from knowing the identity of candidates), of delivering standardized tests.

However, with this notable exception, examinations throughout the world, to the extent that they existed, tended to be in oral form, where the examinee would have to either recite a dissertation or respond to a series of questions. This system was indicative of the status of education in the premodern world - it tended to be restricted to the elites and therefore the system of examination was more personalised.

Civil service
As the profession transitioned to the modern mass-education system, the style of examination became fixed, with the stress on standardized papers to be sat by large numbers of students. Leading the way in this regard was the burgeoning civil service that began to move toward a meritocratic basis for selection in the mid 19th century in England.

As early as in 1806, the Honourable East India Company established a college near London to train and examine administrators of the Company's territories in India. Examinations for the Indian 'civil service'- a term coined by the Company - were introduced in 1829.

In 1853 the Chancellor of the Exchequer William Gladstone, commissioned Sir Stafford Northcote and Charles Trevelyan to look into the operation and organisation of the Civil Service. The Northcote-Trevelyan Report of 1854 made four principal recommendations: that recruitment should be on the basis of merit determined through standardized written examination, that candidates should have a solid general education to enable inter-departmental transfers, that recruits should be graded into a hierarchy and that promotion should be through achievement, rather than 'preferment, patronage or purchase'. A Civil Service Commission was also set up in 1855 to oversee open recruitment and end patronage, and most of the other Northcote-Trevelyan recommendations were implemented over some years.

The Northcote-Trevelyan model of meritcratic examination remained essentially stable for a hundred years. This was a tribute to its success in removing corruption, delivering public services (even under the stress of two world wars), and responding effectively to political change. It also had a great international influence and was adapted by members of the Commonwealth. The Pendleton Civil Service Reform Act established a similar system in the United States.

Education
This trend began to influence the method of examination in British universities from the 1850s, where oral examination had been the norm since the Middle Ages. There was a rapid switchover to a written style of examination from the mid-century. In the US, the transition happened under the influence of the educational reformer Horace Mann. This shift decisively helped to move education into the modern era, by standardizing expanding curricula in the sciences and humanities, creating a rationalized method for the evaluation of teachers and institutions and creating a basis for the streaming of students according to ability.

This examination system was later applied to primary and secondary education and it started to influence other parts of the world as it became a prominent standard (e.g. regulations to prevent the markers from knowing the identity of candidates), of delivering standardized tests.

Both World War I and World War II demonstrated the necessity of standardized testing and the benefits associated with these tests. Tests were used to determine the mental aptitude of recruits to the military. The US Army used the Stanford-Binet Intelligence Scale to test the IQ of the soldiers.

After the War, industry began using tests to evaluate applicants for various jobs based on performance. In 1952, the first Advanced Placement (AP) test was administered to begin closing the gap between high schools and colleges.

Education
Some countries such as the United Kingdom and France require all their secondary school students to take a standardized test on individual subjects such as the General Certificate of Secondary Education (GCSE) (in England) and Baccalauréat respectively as a requirement for graduation. These tests are used primarily to assess a student's proficiency in specific subjects such as mathematics, science, or literature. In contrasts, high school students in other countries such as the United States may not be required to take a standardized test to graduate. Moreover, students in these countries usually take standardized tests only to apply for a position in a university program and are typically given the option of taking different standardized tests such as the ACT or SAT, which are used primarily to measure a student's reasoning skill. High school students in the United States may also take Advanced Placement tests on specific subjects to fulfill university-level credit. Depending on the policies of the test maker or country, administration of standardized tests may be done in a large hall, classroom, or testing center. A proctor or invigilator may also be present during the testing period to provide instructions, to answer questions, or to prevent cheating.

Grades or test scores from standardized test may also be used by universities to determine if a student applicant should be admitted into one of its academic or professional programs. For example, universities in the United Kingdom admit applicants into their undergraduate programs based primarily or solely on an applicant's grades on pre-university qualifications such as the GCE A-levels or Cambridge Pre-U. In contrast, universities in the United States use an applicant's test score on the SAT or ACT as just one of their many admission criteria to determine if an applicant should be admitted into one of its undergraduate programs. The other criteria in this case may include the applicant's grades from high school, extracurricular activities, personal statement, and letters of recommendations. Once admitted, undergraduate students in the United Kingdom or United States may be required by their respective programs to take a comprehensive examination as a requirement for passing their courses or for graduating from their respective programs.

Standardized tests are sometimes used by certain countries to manage the quality of their educational institutions. For example, the No Child Left Behind Act in the United States requires individual states to develop assessments for students in certain grades. In practice, these assessments typically appear in the form of standardized tests. Test scores of students in specific grades of an educational institution are then used to determine the status of that educational institution, i.e., whether it should be allowed to continue to operate in the same way or to receive funding.

Finally, standardized tests are sometimes used to compare proficiencies of students from different institutions or countries. For example, the Organisation for Economic Co-operation and Development (OECD) uses Programme for International Student Assessment (PISA) to evaluate certain skills and knowledge of students from different participating countries.

Licensing and certification
Standardized tests are sometimes used by certain governing bodies to determine if a test taker is allowed to practice a profession, to use a specific job title, or to claim competency in a specific set of skills. For example, a test taker who intends to become a lawyer is usually required by a governing body such a governmental bar licensing agency to pass a bar exam.

Immigration and naturalization
Standardized tests are also used in certain countries to regulate immigration. For example, intended immigrants to Australia are legally required to pass a citizenship test as part of that country's naturalization process.

Competitions
Tests are sometimes used as a tool to select for participants that have potential to succeed in a competition such as a sporting event. For example, serious skaters who wish to participate in figure skating competitions in the United States must pass official U.S. Figure Skating tests just to qualify.

Group memberships
Tests are sometimes used by a group to select for certain types of individuals to join the group. For example, Mensa International is a high I.Q. society that requires individuals to score at the 98th percentile or higher on a standardized, supervised IQ test.

Written tests
Written tests are tests that are administered on paper or on a computer. A test taker who takes a written test could respond to specific items by writing or typing within a given space of the test or on a separate form or document.

In some tests; where knowledge of many constants or technical terms is required to effectively answer questions, like Chemistry or Biology - the test developer may allow every test taker to bring with them a cheat sheet.

A test developer's choice of which style or format to use when developing a written test is usually arbitrary given that there is no single invariant standard for testing. Be that as it may, certain test styles and format have become more widely used than others. Below is a list of those formats of test items that are widely used by educators and test developers to construct paper or computer-based tests. As a result, these tests may consist of only one type of test item format (e.g., multiple choice test, essay test) or may have a combination of different test item formats (e.g., a test that has multiple choice and essay items).

Multiple choice
In a test that has items formatted as multiple choice questions, a candidate would be given a number of set answers for each question, and the candidate must choose which answer or group of answers is correct. There are two families of multiple choice questions. The first family is known as the True/False question and it requires a test taker to choose all answers that are appropriate. The second family is known as One-Best-Answer question and it requires a test taker to answer only one from a list of answers.

There are several reasons to using multiple choice questions in tests. In terms of administration, multiple choice questions usually requires less time for test takers to answer, are easy to score and grade, provide greater coverage of material, allows for a wide range of difficulty, and can easily diagnose a test taker's difficulty with certain concepts. As an educational tool, multiple choice items test many levels of learning as well as a test taker's ability to integrate information, and it provides feedback to the test taker about why distractors were wrong and why correct answers were right. Nevertheless, there are difficulties associated with the use of multiple choice questions. In administrative terms, multiple choice items that are effective usually take a great time to construct. As an educational tool, multiple choice items do not allow test takers to demonstrate knowledge beyond the choices provided and may even encourage guessing or approximation due to the presence of at least one correct answer. For instance a test taker might not work out explicitly that $$6.14*7.95=48.813$$, but knowing that $$6*8=48$$, they would choose an answer close to 48. Moreover, test takers may misinterpret these items and in the process, perceive these items to be tricky or picky. Finally, multiple choice items do not test a test taker's attitudes towards learning because correct responses can be easily faked.

Alternative response
True/False questions present candidates with a binary choice - a statement is either true or false. This method presents problems, as depending on the number of questions, a significant number of candidates could get 100% just by guesswork, and should on average get 50%.

Matching type
A matching item is an item that provides a defined term and requires a test taker to match identifying characteristics to the correct term.

Completion type
A fill-in-the-blank item provides a test taker with identifying characteristics and requires the test taker to recall the correct term. There are two types of fill-in-the-blank tests. The easier version provides a word bank of possible words that will fill in the blanks. For some exams all words in the word bank are exactly once. If a teacher wanted to create a test of medium difficulty, they would provide a test with a word bank, but some words may be used more than once and others not at all. The hardest variety of such a test is a fill-in-the-blank test in which no word bank is provided at all. This generally requires a higher level of understanding and memory than a multiple choice test. Because of this, fill-in-the-blank tests[with no word bank] are often feared by students.

Essay
Items such as short answer or essay typically require a test taker to write a response to fulfill the requirements of the item. In administrative terms, essay items take less time to construct. As an assessment tool, essay items can test complex learning objectives as well as processes used to answer the question. The items can also provide a more realistic and generalizable task for test. Finally, these items make it difficult for test takers to guess the correct answers and require test takers to demonstrate their writing skills as well as correct spelling and grammar.

The difficulties with essay items is primarily administrative. For one, these items take more time for test takers to answer. When these questions are answered, the answers themselves are usually poorly written because test takers may not have time to organize and proofread their answers. In turn, it takes more time to score or grade these items. When these items are being scored or graded, the grading process itself becomes subjective as non-test related information may influence the process. Thus, considerable effort is required to minimize the subjectivity of the grading process. Finally, as an assessment tool, essay questions may potentially be unreliable in assessing the entire content of a subject matter.

Mathematical questions
Most mathematics questions, or calculation questions from subjects such as chemistry, physics or economics employ a style which does not fall in to any of the above categories, although some papers, notably the Maths Challenge papers in the United Kingdom employ multiple choice. Instead, most mathematics questions state a mathematical problem or exercise that requires a student to write a freehand response. Marks are given more for the steps taken than for the correct answer. If the question has multiple parts, later parts may use answers from previous sections, and marks may be granted if an earlier incorrect answer was used but the correct method was followed, and an answer which is correct (given the incorrect input) is returned.

Higher level mathematical papers may include variations on true/false, where the candidate is given a statement and asked to verify its validity by direct proof or stating a counterexample.

Physical fitness tests
A physical fitness test is a test designed to measure physical strength, agility, and endurance. They are commonly employed in educational institutions as part of the physical education curriculum, in medicine as part of diagnostic testing, and as eligibility requirements in fields that focus on physical ability such as military or police. Throughout the 20th century, scientific evidence emerged demonstrating the usefulness of strength training and aerobic exercise in maintaining overall health, and more agencies began to incorporate standardized fitness testing. In the United States, the President's Council on Youth Fitness was established in 1956 as a way to encourage and monitor fitness in schoolchildren.

Common tests  include timed running or the multi-stage fitness test (commonly known as the "beep test ), and numbers of push-ups, sit-ups/abdominal crunches and pull-ups that the individual can perform. More specialised tests may be used to test ability to perform a particular job or role.

Performance tests
A performance test is an assessment that requires an examinee to actually perform a task or activity, rather than simply answering questions referring to specific parts. The purpose is to ensure greater fidelity to what is being tested.

An example is a behind-the-wheel driving test to obtain a driver's license. Rather than only answering simple multiple-choice items regarding the driving of an automobile, a student is required to actually drive one while being evaluated.

Performance tests are commonly used in workplace and professional applications, such as professional certification and licensure. When used for personnel selection, the tests might be referred to as a work sample. A licensure example would be cosmetologists being required to demonstrate a haircut or manicure on a live person. The Group-Bourdon test is one of a number of psychometric tests which trainee train drivers in the UK are required to pass.

Some performance tests are simulations. For instance, the assessment to become certified as an ophthalmic technician includes two components, a multiple-choice examination and a computerized skill simulation. The examinee must demonstrate the ability to complete seven tasks commonly performed on the job, such as retinoscopy, that are simulated on a computer.

Test preparations
From the perspective of a test developer, there is great variability with respect to time and effort needed to prepare a test. Likewise, from the perspective of a test taker, there is also great variability with respect to the time and needed to obtain a desired grade or score on any given test. When a test developer constructs a test, the amount of time and effort is dependent upon the significance of the test itself, the proficiency of the test taker, the format of the test, class size, deadline of test, and experience of the test developer.

The process of test construction has been greatly aided in several ways. For one, many test developers were themselves students at one time, and therefore are able to modify or outright adopt test questions from their previous tests. In some countries such as the United States, book publishers often provide teaching packages that include test banks to university instructors who adopt their published books for their courses. These test banks may contain up to four thousand sample test questions that have been peer-reviewed and time tested. The instructor who chooses to use this testbank would only have to select a fixed number of test questions from this test bank to construct a test.

As with test constructions, the time needed for a test taker to prepare for a test is dependent upon the frequency of the test, the test developer, and the significance of the test. In general, nonstandardized tests that are short, frequent, and do not constitute a major portion of the test taker's overall course grade or score require do not require the test taker to spend great amounts preparing for the test. Conversely, nonstandardized tests that are long, infrequent, and do constitute a major portion of the test taker's overall course grade or score usually require the test taker to spend great amounts preparing for the test. To prepare for a nonstandardized test, test takers may rely upon their reference books, class or lecture notes, Internet, and past experience to prepare for the test. Test takers may also use various learning aids to study for tests such as flash cards and mnemonics. Test takers may even hire tutors to coach them through the process so that they may increase the probability of obtaining a desired test grade or score. Finally, test takers may rely upon past copies of a test from previous years or semesters to study for a future test. These past tests may be provided by a friend or a group that has copies of previous tests or from instructors and their institutions.

Unlike nonstandardized test, the time needed by test takers to prepare for standardized tests are less variable and usually considerable. This is because standardized tests are usually uniformed in scope, format, and difficulty and often have important consequences with respect to a test taker's future such as a test taker's eligibility to attend a specific university program or to enter a desired profession. It is not unusual for test takers to prepare for standardized tests by relying upon commercially available books that provide in-depth coverage of the standardized test or compilations of previous tests (e.g., 10 year series in Singapore). In many countries, test takers even enroll in test preparation centers or cram schools that provide extensive or supplementary instructions to test takers to help them better prepare for a standardized test. Finally, in some countries, instructors and their institutions have also played a significant role in preparing test takers for a standardized test.

Cheating on tests
Cheating on a test is the process of using unauthorized means or methods for the purpose of obtaining a desired test score or grade. This may range from bringing and using notes during a closed book examination, to copying another test taker's answer or choice of answers during an individual test, to sending a paid proxy to take the test.

Several common methods have been employed to combat cheating. They include the use of multiple proctors or invigilators during a testing period to monitor test takers. Test developers may construct multiple variants of the same test to be administered to different test takers at the same time. In some cases, instructors themselves may not administer their own tests but will leave the task to other instructors or invigilators, which may mean that the invigilators do not know the candidates, and thus some form of identification may be required. Finally, instructors or test providers may compare the answers of suspected cheaters on the test themselves to determine if cheating did occur.

Support and criticisms of tests
Despite their widespread use, the validity, quality, or use of tests, particularly standardized tests in education have continued to be widely supported or criticized. Like the tests themselves, supports and criticisms of tests are often varied and may come from a variety of sources such as parents, test takers, instructors, business groups, universities, or governmental watchdogs.

Supporters of standardized tests in education often provide the following reasons for promoting testing in education:


 * Feedback or diagnosis of test taker's performance
 * Fair and efficient
 * Promotes accountability
 * Prediction and selection
 * Improves performance

Critics of standardized tests in education often provide the following reasons for revising or removing standardized tests in education:


 * Narrows curricular format and encourages teaching to the test.
 * Poor predictive quality.
 * Grade inflation of test scores or grades.
 * Culturally or socioeconomically biased.

Other types of tests and other related terms

 * ordinary exam: an exam taken during the corresponding course;
 * sufficiency exam or examination for credit: an exam which should be taken as a way of getting official credits from the academic institution;
 * revalidation exam or equivalence exam: offering value for an exam previously taken in another institution;
 * extraordinary exam: an exam taken after the period of ordinary exams corresponding to the course.

International examinations

 * Abitur — used in Germany.
 * GCSE and A-level — Used in the UK except Scotland.
 * International Baccalaureate Diploma Programme — International examination.
 * International General Certificate of Secondary Education (IGCSE) — international examinations
 * Junior Certificate and Leaving Certificate — Republic of Ireland.
 * Matura/Maturita — used in Austria, Bosnia and Herzegovina, Bulgaria, Croatia, the Czech Republic, Italy, Liechtenstein, Hungary, Macedonia, Montenegro, Poland, Serbia, Slovenia, Switzerland and Ukraine; previously used in Albania.
 * Nationella prov — used in Sweden.
 * Standard Grade, Higher Grade, and Advanced Higher — used in Scotland