Design principles for “final answer” linear algebra assessment
So-called “final answer” questions are a boon to graders, because they are fast and easy to grade. This form of assessment, where the final answers gets all points or zero points, can be perceived to be unfair. We, as teachers, have a responsibility to only design exercises that can be fairly assessed this way. We carried out a study, comparing two forms of grading, to assist us in creating a set.
The linear algebra exams at the University of Twente typically have at least 30% of the points allocated to final answer questions. With no points allocated to a process, but only to the final answer, students often complain that these questions are unfair. If the questions are ones where a careless error might result in an incorrect answer, yet the process was correct, accusations of unfairness could have validity. As teachers, we therefore have a responsibility to set linear algebra exam questions which can fairly be assessed on their answer alone.
A clear set of design principles for final answer assessment is valuable in several ways. (1) A set of design principles makes setting the questions easier, as they give us some starting points. For example, the design principle “the item should be quickly checkable” leads us to consider checkable items such as matrix inverses or eigenvalues. (2) Making the design principles known to the students, along with some examples, helps the students know what to expect in the exam and helps them prepare in a sensible way. (3) When challenged on the fairness of the exam questions, the teacher can defend the decisions made by showing how the questions were designed and why final answer assessment was deemed appropriate.
In 2019, we carried out a study within two programmes that wrote the same linear algebra exam. We collected the students’ rough work after the exam and carried out two parallel grading processes. For summative purposes, we grade the exams the standard way, with the final answer questions being awarded all points or zero. Simultaneously, we carried out “hypothetical” grading, grading the rough work (as much as was feasible) as if we would have with partial points awarded for process. We then compared the two forms of grading to see if one form of grading was significantly different from the other.
We came to several conclusions, which were: (1) certain kinds of questions were susceptible to grader variation, mostly those with notation requirements which some graders were less strict about penalising; (2) some questions resulted in double penalties, such as null space and column space questions, where making an error on one fairly reliably making a similar mistake with the other; (3) when the hypothetical grading resulted in a much higher grade than the final answer grading, it was because several questions had started well (setting up a matrix). But ended badly (not knowing how to interpret a result).
With respect to conclusion (3) above, as teachers we are not keen on giving someone an inflated grade based on uncomprehending beginning of an exercise with no ability to follow through and therefore we did not see this as a weakness of final answer grading. Conclusions (1) and (2) resulted in a refined set of design principles that we used henceforth in setting these questions. The final set of design principles is as follows:
• Limit the amount of points assigned to an item to a maximum of 3.
• An item should be a one-step process not susceptible to careless error OR the item should be quickly checkable and the students should have had the opportunity to learn how to do this.
• The algebra involved should be minor, testing the core concept of the question, rather than algebraic manipulation. Consider avoiding fractions.
• Avoid items where the same error could be penalised twice.
• Avoid items with notational complexities where (were the item to be hand graded), grader opinion might differ.
• If infinitely many answers are correct (for example basis of a subspace) the testing procedure must allow for recognition of any correct answer.
• As the learning goals that can be tested in this way are limited, a maximum of 50% of the grade should be based ion final answer items.
You can read more about this study here: Veale, A.J., Craig, T.S. (2022). Design principles for final answer assessment in linear algebra: implications for digital testing. Teaching Mathematics and its Applications 41(4). https://doi.org/10.1093/teamat/hrac002
