Back to Home Page

 

Sample Exam Procedures  

The following represents a rough outline of generic  procedures and topics that might be found in a professional licensure test development, administration and scoring  manual.

Table of Contents

Job Analysis        

Rationale for Job Analysis

Construction of Item Banks

Item Bank Storage               

Size of Item Banks               

Item Bank Printing

Test Development

Item Development for Multiple Choice and True False Items

Item-writing rules for professional licensure examinations        

Restricted Response Items

Taxonomies of Questions

Item Construction               

Item Review (Preadministration)       

Grammar and Ambiguity    

Review for Technical Accuracy       

Preliminary Item Tryouts   

Item Review (Post administration)

Maintenance of Item Banks              

STAT Software

Item Longevity

Test Development Performance Examinations

Examiner Standardization (Performance Examinations)      

Test Assembly

Entry Level Competency

Cut Scores

Review of Literature

Criterion Referenced Tests

Angoff  

Angoff Item Bank               

Holistic Method  

Use of Criterion Groups     

Using Multiple Methods

Equating and Related Topics

Definition of Equating

Linear Equating

P-value Substitution

examination development unit Equating Methods

Scheduling

Scheduling the Exam Site

Candidate Information Bulletin

Camera Ready Copy

Printing Exams

Determining the Number of Copies

Monitoring Printing            

Numbering Exams               

Exam Administration          

Proctor Hiring

Examiner Hiring

Shipping Materials

Candidate Examination Instructions

Receiving Test Materials

Selection of Test Site

Arrangement of Test Site

Preparation of Test Administration Site          

Pay and Travel Reimbursement Forms

Analysis of Examination Results

Preparing for Scanning

Scanning Reports

Comment Forms

Item Analysis

Key Changes

Rescoring (Angoff Changes)

Examination Summary Statistical Report

Grade Notification               

Design of Grade Notification

Grade Reviews

Grade Review Sessions

Grade Appeals

Examination Security

Staff  Recruitment

Statement of Confidentiality

Office Security

Disclosure Agreements

Destruction of Examination Material

Compromised Test Materials (state developed exams)

Internal Printing

External Printing  

Printing Supervisor             

Auditing Print Errors

Shipping

Incident Report

Job Analysis

A good job analysis is critical to the successes of any examination development project because it lays out the plan and rationale for all that is tested.  There are a variety of job analysis techniques and models available that vary in the type of task dimensions used.  Most job task analysis models include two or more dimensions.  The two most common dimensions are:

(1) frequency of task performance (on a six point scale)

(2) degree of importance

Rationale for Job Analysis

The rationale for completing a job task analysis is simple:

How can we accurately know what to test unless we scientifically sample the population of practicing professional and determine the most important (risky) and most frequently performed tasks?  On a professional level, the job task analysis is performed because it is highly recommended/required by the Joint American Psychological Association, American Educational Research Association and the National Council of Measurement in Education Standards for Educational and Psychological Tests. 

 Construction of Item Banks

The first step in the construction of an item bank requires the use of  an item construction form (see Appendix  ).  This form is used for drafting a new item and insures that  item writers draft an item in the format specified by the examination unit office.  The areas that must be completed are :

Board - Name of regulatory board.

Content Area - Cosmetology, Barber, etc.

Reference - The page, paragraph and sentence which supports the correct answer.

Date - The date the item was constructed.

Stem - The main body of the item.

Subtest - Color Processing Hair.

Task - Refers to the task number from the Job Analysis that supports the development of the item. 

Authors - The name of the person who wrote the item.

Key - The correct answer.

Cognitive Level - The level from Bloom’s Taxonomy (see Taxonomy in Table of Contents).

Distracters - The incorrect answers.

Item Bank Storage

There are numerous item banking software packages available.  Please click here for a sample link item banks.  Most of the vendors of optical scanners offer item banking software for a reasonable price. Item banks should be stored on a secure file server and access is granted to individual banks is granted by the network supervisor.  The access codes are changed periodically and the examination developer must request access for the network supervisor in order to get into a specific item bank.

Size of Item Banks

A survey of several states and national organizations was conducted to determine if there was a general rule of thumb for size of item banks.  The following information was collected.

Examination Jurisdiction

 Exams Banks (Average Current Size)

 Exams Banks (Target Size)

Comments

In-State Developed

3 times the size of an examination

5 times the size of an examination

  Based on a survey of 10 state examination development organizations the average size was 3 times the size of a single exam.

National Examinations

  Based on a survey of 22 national examinations the average size was 10 times the size of a single exam.

 

Based on a survey of 22 national examinations the average size was 10 times the size of a single exam.

Item Bank Printing

It may be necessary to have a copy of an entire item bank for review or test construction purposes. I such cases the Examination Consultant should request a copy using the Examination Development and Testing Item Bank Printout Request Form shown in Appendix ___.

Test Development

Two primary issues must be address by the test developer when a new test is assigned.  The two issues deal with what to measure and how to measure it.  The issue of what to measure is usually decided by a job task analysis.   The job analysis is eventually refined into a test blueprint shown in Appendix.  The test blueprint specifies the type of questions, the taxonomic level of questions and the number of questions used to measure each sub domain of information.

The decision of how to measure something falls more in the domain of professional judgment.  The following sections deal with simple item formats (e.g., multiple choice) and some guideline for development.

Item Development for Multiple Choice and True False Items

Item or question development is typically performed by subject matter experts or (SME’s).  However, item development may occasionally be performed  by examination development staff.  In either case there are some basic guidelines to be followed. 

Item-writing rules for professional licensure examinations

1. The stem should be clear and concise with no extraneous information. The stem should present a self-contained question or problem.

2. For entry level competency examinations in professional licensure the stem should not be so long that it becomes a measure of reading comprehension.

3. The stem should contain as much of the question as possible so that the length of the answer choices can be as short as possible.  The examinee should be able to easily scan the answers choices.  Again try to avoid measures that will load on the construct of reading comprehension.

4. Be sure that only one alternative represents the correct or best answer. Item writers will sometimes erroneously write an item that contains two or even more correct answers.

5. If possible, avoid the negatively stated stems. Negatively formulated stems can be confusing to examinees.  If the question were phased positively and the examinee got the answer correct then the test has focused on measurement of traits extraneous to the measure of professional competency.

6. The stem should be grammatically consistent with the answer choices.  Grammatical consistency in the example below provides a clue to the correct answer.

Example:  Colors that are considered earth tones are:

a. red

b. blue

c. purple

d. brown and gray

6.  The answer choices should be approximately the same length.  If the correct answer is longer than the distracters the examinee may guess the correct answer.

7.  The distracters should be plausible answers.  Sometimes this is difficult because a subject area is simply to narrow in scope.  For example, laws and rules examinations, that deal with the penalties for various violations of professional practice, may provide the item developer with few plausible distracters.  The item developer can not use “felony” as a distracter for a simple failure to post a license.  When situations like this occurs it is necessary to be creative. 

8. For multiple choice and true false questions the position of the correct answers should be approximately equally distributed across all answer choices.  In a 100 item examination there should be approximately 25 items with the answer a., 25 items with the answer b. and so on. Further, the correct asters should be randomly distributed so that there are not long sequences of items with the same correct answer choice.

9. Unless it is unavoidable the alternatives such as “none of the above” or “all of the above.” In many instances multiple-choice writers employ phrases such as “none of the above” because they’ve run out of ideas for  plausible distracters and want to create a final option.  The item developer must consider whether it is more important to reduce the probability of guessing correctly or to create an item that is less than ideal.

Restricted Response Items

It is frequently assumed that writing free response type questions will create difficulty with scoring rubrics and ultimately reliability.  However, the restricted response item allows the developer to specify the parameters of the response in terms of form and content.  By setting strict parameters (e.g., responses no longer than 5 words) the effort to create scoring rubrics and to achieve reasonable levels of reliability, becomes a reasonable task.  In most cases it will still be necessary to create simple scoring rubrics such as listing sample responses and the associated score for each response.   When compared to a four answer multiple choice question this type of question immediately reduces the error associated guessing factor by at least 25 percent.

Taxonomies of Questions

Test questions can be structured to address various levels of cognitive activity.  These levels have been categorized into cognitive taxonomies.  In general these taxonomies are hierarchically arranged so that higher levels of the taxonomy are related to higher levels of cognitive functioning.  The relevance of cognitive taxonomies to item development is that good test specifications and or test blueprints should specify how many questions should be developed within each taxonomic level.

There are two primary taxonomies used today.  Bloom 1970 describes a six level taxonomy. Gagne 1973 describes a five level taxonomy.  The Bloom taxonomy is perhaps the most popular and will be described below.  The reader is cautioned that the exact demarcation between the various taxonomic levels can be difficult to specify and is the fuel for intellectual debate.

1. Knowledge- refers to recall of information.  Example: List the bones of the  of the foot.

2. Comprehension - refers to interpretation or translation of cognitive information given certain rules. For example: give the square root of 10,000.  Obviously, the taxonomic level of a question is dependent to a certain degree on the cognitive status of the individual receiving the question.  Consider the example just given.  If the individual already knew that 100 was the square root then the taxonomic level of the question would be recall.  But if rules for determining the square root had to be applied then the question would be at the comprehension level.

3. Application- involves solving problems through the use of principals or generalizations.  For example: find the volume of a barrel that has a circumference of three feet and a height of three feet.

4. Analysis-breaking down a problem into its component elements and identifying relationships among these elements.   For example, identify a bacteria given its DNA structure, and physical characteristics and reaction to certain antibodies.

5. Synthesis-require the combination of several elements, rules or principals.  For example: Write a computer program that will amortize a loan and use of at least 25 different programming principals..

6. Evaluation-  uses many different cognitive skills to create or self generate a novel piece of information.  For example: write a procedure manual for a test development and test administration unit.

Item Construction

The test developer’s task  requires two major types of decisions - what to measure and how to measure it. During item constructions the latter type of decision must be addressed. Developing  a pool of items to measure a construct entails the following activities.

1. Select an appropriate item format

2. Verifying that the proposed format is feasible for the intended examinees

3. Selecting and training the item writers

4. Writing the items

5.  Monitoring the progress of the item writers and the quietly of the item.

Item Review (Preadministration)

Grammar and Ambiguity

As test items are drafted, it is advisable for the test developer to ask qualified colleagues (e.g. Examination Technicians) to review them informally for wording, grammar, ambiguity, and other technical flaws. “Problems” items can then be revised as necessary. In review on an item-by-item basis. Important aspects of item constructions witch should be considered include.

1. Potential Bias (any wording that word be offensive to any racial or ethnic group)

2. Appropriateness or relevance to test specification (when available)

3. Technical item-construction flaws

4. Grammar

5. Level of readability (most word processing software can give readability ratings)  

Review for Technical Accuracy

This review is typically conducted with subject matter experts (SME’s).  The items are reviewed by a technical consultant or a committee of SME’s.  It is helpful to have current text books and technical dictionaries present for use in verifying the accuracy of answers.

Preliminary Item Tryouts

Before the test developer has used new items in a test it is a good idea to try out the items on a sample of examinees. This can be accomplished by two methods. The first is administering items the end of a regular test administration and identifying the items as trial items that will not be scored.  The second method is by embedding items in a examination and instructing the computer which items not to score.  Which ever method is used would typically be a policy decision.

The minimum number of subject needed to field test an item will vary with the item complexity and the level of homogeneity of the candidate population.  However, 15 to 20 subjects will be sufficient in most cases.

Item Review (Post administration)

The post examination item review process would utilize  a set of guidelines that relate to the statistical results of an item analysis.  The item analysis form use by the examination unit office presents a difficulty index and a discrimination index.  The difficulty index simply tells the examination developer the percent of candidates that got an item correct.  For example,  if sixty percent of the candidates got an item correct the discrimination index would be shown as .60  The discrimination index indicates the ___________.

Maintenance of Item Banks

Item Banking Software

A variety of software is available for the purpose of storing examination information.

Item Longevity

For security reasons items should be rotated so that they are not used on examination forms consecutively.  The reason is very intuitive.  If an examinee retakes the examination at the next administration the candidate will be presented with test items they are already familiar with.  This could cause a candidate with sub-competent levels skills to be mis-classified.  The practical longevity of an item will vary from subject area to subject area.  For example, the subject domain of taxes on the Certified Public Accountants Examination will change each year.  However, in the case of the domain of “tapered hair cuts”, on the barbers exam, the domain of knowledge maybe fairly stable over time. The decision of when to have items reviewed by subject matter experts is a professional judgment decision.  Typically this decision should be made by the subject matter experts who develop the items.

Examiner Standardization (Performance Examinations)

Prior to any performance examination it is necessary to secure examiners.  This process is usually conducted by the Senior Examination Consultants.  Typically a letter is sent out to potential examiners based upon an approved list furnished by the board office.  A sample copy of this letter is contained in Appendix ____.

The standardization process will vary from one examination to the next.  However, the basic process involves assisting each examiner to standardized the scoring scheme in their mind with the scoring scheme that other examiner have in their minds.  A common strategy for doing this involves presenting a hypothetical candidate performance to be scored.  After all examiners score the hypothetical example they discuss their scores and try to eliminate any variability between examiners.  This process is repeated several times to allow for the inclusion of various types of candidate performance similar to performance that would be seen on the examination.  A analysis of examiner performance can be generated by the Operation Analyst Associate which will allow the Examination Consultants to assist examiners with evaluating their own performance.

Performance examinations present some of the most complex and intractable psychometric problems that exist in professional licensure examinations.  Many and perhaps most of these problems center around error in measurement caused by rater bias.  Rater bias (i.e., rater error) and methods for dealing with rater error are discussed in Helson, 1947, Guilford, 1954, Hecht, 1979 and Seigle, 1979.

Test Assembly

Entry Level Competency

Typically a professional licensure examinations are created to satisfy the requirements of a practice act or title act created to protect the publics health and safety.  The proficiency level of the examination should be set at the level specified through the cut score setting procedure.  The distribution of most competency exams generally show a negatively skewed with the scores group toward the higher end of the score scale.  This distribution shape is not always the case especially with certain national examinations.

Cut Scores

Review of Literature

A review of the measurement literature reveals that 30 to 40 different methods of standard setting have been described (Behuniak, Archambault, and Gable, 1982). These 30 to 40 methods can be broken down into three major categories: normed referenced, criterion‑referenced, and decision‑theoretic standards. (For a thorough discussion of the first two types, see Livingston and Zieky [1982]; for a discussion of standard setting in decision theory, see Swaminathan, Hambleton, and Algina [1975]).

Criterion Referenced Tests

The decision‑theoretic approach to standard setting is very complex and has received little real life application. The norm-referenced standards apply primarily to aptitude measures such as college entrance tests and, therefore, are not very relevant to the present discussion of competency determination in personnel selection and professional certification and licensure. This discussion will focus on standard setting methods related to criterion‑referenced testing (CRT) devices. This includes all measurement devices where standards can be reported in terms of specific performance outcomes such as, running a 100 yard dash in 13 seconds or learning seven out of ten spelling words.

Angoff

Within the realm of criterion‑referenced testing, there are three major categories of standard setting approaches: 1) judgments based on item content, 2) judgments based on holistic impression, and 3) judgments based on performance of groups of Examinees. Judgments based on item content are the most frequently used and studied. An example of this type of standard setting and the one most widely used was suggested by Angoff (1971). With this method, judges are instructed to think of a group of minimally competent persons. Then judges review all test items and are asked to estimate the proportion of the minimally competent group who would answer each item correctly. The proportions are then summed for each judge and the average of all judges' ratings becomes the minimum passing score. For example:

Item Number

Judge #1

Judge #2

1

.70

.80

2

.30

.40

3

.80

.90

4

.95

.90

Avg.

.68

.775

Avg. of Both Judges = .73   (Consensus Judgment)

It is not unusual for this methodology may be modified slightly.  The examination consultants within the examination development unit have indicated the personal preferences have been added to this process.

The Angoff information is gathered on the Standard Setting Procedure form developed for each examination.  A copy of a sample form is shown in Appendix __.

Angoff Item Bank

Senior Examination Consultants may request a copy of the Angoff Item Bank for any examination.  This report (shown in Appendix ____) facilitates the Consultants the ability to construct tests by showing Angoff values for each item within a subtest.

Holistic Method

The next method involves holistic impressions of competency,

tens answered correctly. This percentage‑correct impression is in turn based on a judge's familiarity with the domain covered by the test items and the relative difficulty of the content area. To use this method, a panel of judges would simply render an opinion regarding what percentage of items should be correctly answered by an examinee expressed as a percentage.

who has attained the minimum level of competency to perform a given job or provide a given professional service.  Averages of the group estimate then becomes the standard.  This procedure is not used by examination development unit.

Use of Criterion Groups

The final category is the generally the most reliable and informative of the three methods and involves results of actual examinee performance. The most popular application of this method involves running trial administrations on actual examinees. For example, one might observe how a group of cosmetology certification candidates perform on a newly developed certification test and then based on the distribution of scores set the cut‑off. Although this approach is used quite frequently, it is in conflict with the concept of setting a standard based on some specified level of performance such as knowing 70% of a given domain. By waiting to see how a group of individuals actually do, one has in effect reverted back to norm-referenced standard setting because the performance of the group sets the standard rather than setting a performance criterion apriori. In fact the group of candidates that appeared for the first administration of the test may not even be representative of the population of teachers and then the standards become real ethereal.

Another criterion group method for setting a standard when using criterion referenced tests is to administer the test to a group of individuals identified by subject matter experts (SME’s) to be below a competence criterion and a group of individuals identified by SME's to be above the competence criterion. To illustrate, assume that a test for licensing plumbers has been developed. One might simply administer the test to a representative sample of first‑year plumbing students from state vocational programs and at the same time administer the test to a group of licensed plumbers who are recognized as accomplished practitioners.

The test should be able to discriminate between these two groups; however, if it does not, then the validity of the test should be reevaluated. If the test does discriminate between the two groups, then the standard is set between the performance of the two groups. Again professional judgment of SME’s must be used for the final cut score placement. The importance of this method is that it uses two reference points, a known competent group and an known incompetent group, to target where competency lies on the ability scale set by the test.

It is very easy to set a cutoff score on any test that will pass incompetent individuals and it is equally easy to do the reverse. By having two competency levels represented during pretesting, the decision‑makers can usually feel sure that they are not going to be embarrassed when scores are finally reported.

Using Multiple Methods

Finally, it should be pointed out that a standard setting methodology simply provides an estimate of the true standard. In most licensure and personnel selection settings one will rarely know exactly where the true standard is. For this reason one should use many different methods for estimating what the standard should be. The decision of numerous court cases are very specific on this point.  The interested reader is encouraged to refer to the cited references prior to setting any standard that will have an impact on another person's employment or professional life.

Equating and Related Topics

Definition of Equating

Equating is the process of establishing equivalent scores on two instruments (e.g., test A and test B).  For scores to be equivalent on test A and test B they must:  measure the same trait, have the same reliability, have the equivalent percentile ranks corresponding to scores.  This definition is applicable to linear equating and equipercential equating.

The practical purpose of equating for professional licensure examinations is to allow regulatory official to place the performance of different groups of candidates on the same scale.  For the examinee is means that they will be given an examination that is basically the same difficulty level as previous administrations.  

Linear Equating

Perhaps the most commonly used linear equating model used in professional licensure testing is one in which candidate group 1 gets administered an anchor test and group 2 gets administered the same anchor test.   Many times the anchor test is simply a group of anchor (common) items administered within different tests.

 To examine the assumptions for this model assume that group 1 and group 2 are drawn from the same population.  The primary assumptions of linear equating are:

1. The slope, intercept and standard error of estimate for the regression of group 1 scores to the anchor test scores are equal to the slope, intercept and standard error for the regression of group 1 scores to the anchor test scores in the total population.

2. Same as above except using scores from group 2.

 The degree to which these assumptions hold true are dependent on how representative group 1 and group 2 are to the total population.  If either of these groups are not representative of the total population of candidates from which they are drawn, then the assumptions are not supported.

For a technical discussion of this model see a Angoff 1971. 

P-value Substitution

Perhaps one of the most commonly used methods for controlling the difficulty levels of examinations for locally developed professional licensure examinations is known as p-value substitution.  Although it is used in practice, this method is not referred to frequently in the psychometric literature.  Purer mathematical methods of equating (e.g., Angoff IV) would  always be preferred.  However, this method may be preferable to no control of examination difficulty. 

For P-value substitution the items selected should be chosen so that they produce an average p value consistent with the average p value of the item bank that was used for setting the cutoff score.  Since the item bank was the reference point for which the cut score setting committee used in establishing standards it is important that the examination be are reflection of the item bank. Failure to follow this method or another difficulty adjusting method could result in the difficulty level of each examination varying widely.

examination development unit Equating Methods

Currently equating methodology for the examination development unit is being developed.  However, if an examination is designated for this research project the examination consultant must designate a common set of anchor questions for each examination.  The second STAT screen field designates whether an item is an anchor item.  The designation is Cor. meaning that it is a core item.

Scheduling

Scheduling the Exam Site

The examination technician should try to estimate the number of candidates that will be taking a given examination so that a site with an appropriate capacity can be selected.  The Board Office may be able to assist with this estimation based on the list of candidates that are declared eligible to sit for the examination.

When the number of candidates for an exam has been estimated the information is give to the administrative Secretary to the Executive Director.  The secretary then insures that the site is rented or reserved in cases where no rental fee is required.

Admission Letters

Admission letters containing the date, time, location and other pertinent information must be mailed to the candidates 14 days in advance of the examination.  There is one exceptions  Landscape Architect in which notice must be sent 6 weeks in advance.  A candidate information bulletin is which is also enclosed with the admission letter.

Candidate Information Bulletin

At least one month prior to the examination the process of reviewing, revising or developing the Candidate Information Bulletin must be started.  Fourteen days prior to the examination (with the exception of Landscape Architects) the Candidate Information Bulletin must be sent to the examinee.  This bulletin is prepared by the Senior Examination Consultant in conjunction with the board office and the Examination Technicians.  This bulletin covers a wide variety of subjects including: getting to the examination, sample questions, and score reporting.  A sample copy of a bulletin is shown in Appendix ___.

Camera Ready Copy

Two weeks prior to the exam date a camera ready copy of the exam must be prepared by the examination technician.  This process involves getting a magnetic copy of exam draft from the examination consultant.  Note that the magnetic disk is actually obtained from the Operations Analyst Associate but the Examination Consultant will have to be contacted to make sure it is ready for proofing.  When the disk is checked out the Examination Technician must d=sign the “Diskette Check Out Form shown in Appendix _____.  The technician then proofs, formats and makes any corrections necessary.   This step must be done with enough time to permit a camera ready copy two weeks in advance of the administration.  The proofed  copy is formatted in Word Perfect then return to the Examination Consultant.

 Printing Exams

Determining the Number of Copies

The camera ready copy is carried to the copy room and copied based upon the number of candidates scheduled to appear plus 10 % extra.  The Examination Technician should be present during printing. 

Monitoring Printing

A printing counter must be checked prior to beginning coping.  The “beginning count” and “ending count” must be entered on the printing control form.  Any extra sheets of paper (typically do to misprints) must be accounted for on the printing control form.  These pieces of paper must be destroyed by shredding whenever they contain test items. 

Numbering Exams

After the examinations have been printed they are numbered with a numbering stamp.  The numbering process should be done in a secure area.  When the numbering is complete all exams should be locked in the vault.

Exam Administration

Proctor Hiring

At the beginning of each month the Examination Technician must secure the proctors for the entire month.  The proctors are ordered from Crown Services who has the contract for securing proctors.  The representative at Crown Services must be told how many proctors will be need for each examination and the dates and locations for each examination.  Request for specific individual proctors will be accommodated by Crown whenever possible.

Examiner Hiring

Prior to the examination (typically 1 month) examiners must be hired.  Willingness to participate as an examiner is identified through a questionnaire.  A sample questionnaire is shown in Appendix ____.

Shipping Materials

All test booklets should be shipped to the site or back from the test site with the following materials:

a. answer sheets (see sample answer sheet in Appendix ___ )

b. timer

c. exam booklets

d. tape

e. material for making signs

f. black markers

g. candidate rosters

h. proctor lists

I. proctor training materials

Candidate Examination Instructions

Prior to the designated starting time the examination instructions should be read to the candidates.  A sample copy of examination instructions is shown below.

Receiving Test Materials

All test materials must be audited when they arrive from the test site or when they arrive from national test vendors.

When test materials arrive via common carrier (e.g. fed-ex) they should be audited according to the examination development unit's packing list.  This audit procedure is not necessary if the materials were audited at the test site and transported from the site by the examination technician.

When test materials arrive from a test vendor  they should be audited according to the vendors packing list.

All audit forms must be signed and dated by the examination technician.

If  secure test materials (i.e. test booklets) have to stored over night  at a test site, they must be kept in a locked room and with a pad lock on the storage trunk.  The examination technician shall be the only person authorized to hold combinations (no keys) to secured materials and to open the storage trunks holding secure materials.  Non secure materials such as pencils may be stored in any manner and can be access by all test personnel (e.g. proctors). For more information see the section of this manual titled examination security.

Any suspected discrepancies it the audit forms should be verified immediately.  If discrepancies can not be explained they should be reported the Senior Executive Director of the examination development unit.

Selection of Test Site

Test sites should have the following:

a. adequate lighting

b. adequate air-conditioning and heating systems that can be controlled.

c. adequate secure storage areas (limited access to building manager and the examination technician)

d. a work area for proctors

e. adequate rest room facilities

f. tables of an appropriate height for writing (for written exams)

g. adequate traffic flow control

h. adequate noise control

i. a confirmed schedule of events so that potential noise problems can be anticipated

j. access to telephones for emergencies

k. adequate parking facilities

l. adequate access for disabled candidates

m. water fountains are available

n. near  a backup test site whenever possible

Arrangement of Test Site

For written exams the tables shall be arranged so the there is one candidate per six foot table or  two candidates per eight foot table.  Candidates at eight foot tables should be placed at opposite corners.  Tables should be spaced with at least 48 inches  between tables.

Seating arrangements should be planned at least one week before the test administration.  Plans should include accommodations for disabled candidates and late arrivals.  Seating arrangements should be posted at the test site for the convenience of proctors.

All seats should be labeled in advance of candidate arrival.

Preparation of Test Administration Site

The test site should be given a final inspection at least 24 hours prior to administration.  Factors to be inspected are included in the Test Site Inspection Check List in Appendix ____.

Proctor  training and orientation should cover the following:

Proctors may talk to candidates quietly and evaluate issues such as the incorrect test booklet so that such issue may be reported. However, proctors should not assist candidates by reading or interpreting test booklets. 

Proctors should provide candidates with Test  Remark Forms so they can raise issues about the accuracy of a test question.

Pay and Travel Reimbursement Forms

Prior to leaving the test site the examination technician must secure completed Examiner Reimbursement forms and travel forms for all staff. A sample copy of an examiner reimbursement form is shown in Appendix __.

Analysis of Examination Results

Preparing for Scanning

Most national examinations require that test materials be sent back to the national office within two days  With some national examinations this time period is three days.  The specific time period should be identified in the examination bulletin prepared by the national examination office.

For  examinations developed by the examination development unit, the machine readable answer sheets (i.e., bubble sheets) must be separated and sorted prior to scanning.  Scanning services are provided by the Operations Analyst Associate.  Typically scanning and item analyses must be prepared for the consultants within two weeks.  The answer sheets, answer key and item analysis report are given back to the examination technician for distribution to the examination consultant.  For large examinations (  info needed   )ten percent of all answer sheets must be hand scored by the examination technician to identify any scanning problems.  For smaller exams or exams with very complicated scoring schemes all answer sheets should be hand scored. 

Scanning Reports

A scanning report is also provided to the examination technician which identifies any irregularities in the answer sheets.  For example, the report may show that for candidate 101 the response associated with item 27 was scored as an error because the bubbles for two answers were shaded.  When this occurs it is important to check the answer form for candidate 101 to verify that two answers were shaded.  It is possible that a stray mark or piece of trash has covered one of the bubbles causing a scanning error .  This scanning error is recorded as an "m" on a form provided by the Operations Analyst Associate to the examination technician.  A scan report showing a "U" means that the item was not answered by the candidate.

When a scan error occurs the candidates score must be adjusted if it changes the original score.  The scan error must be reported to the (info needed) for processing.

Comment Forms

While hand scoring is being accomplished by the examination technician the examination consultant will review the comment forms.  If the comment form indicates that an item is technically incorrect the item may be referred to an SME for verification.  If the item is determined to require a key change the (info needed)  will rescore the examination.  If examination results have already been distributed to candidates the new results will be computed subsequently mailed to those examinees affected by the change.  Hand scoring is also performed with the new key.

Item Analysis

The item analysis report for an examination is requested by the examination technician and sent to the examination consultant.   The following is a sample report for a single test item.  The shaded area is not a part of the report but serves to define the information in the rows.  

Sample Item Analysis for ITEM #1

Answer Choice

A

B

C

D

E

U

M

DiscIdx

Cut 60

Number of Examinees Responding

13*

1

7

9

 

 

1

.19

 

Mean Score of Examinees Responding

86

82

81

86