Assessment After AI: A Playful Field Guide for the Perplexed
A bit of fun with yet another paper on AI, which is lazy code for LLMs, and assessment: Assessment after Artificial Intelligence: The Research We Should Be Doing. It’s a paper that reads like a UN climate report for grading, only with more footnotes and fewer glaciers.
Imagine for a moment, a group of eminent assessment scholars locked in a room in Melbourne for three days. Not hostages, just academics at a workshop, which is basically the same thing minus the ransom note.
Their mission?
To work out how assessment is supposed to survive the arrival of artificial intelligence when they mean to say LLMs, those chatty silicon sidekicks that write essays faster than students can locate the assignment rubrics.
Predictably, what came out of this locked-room scenario was not a quick fix, nor a silver bullet, nor even a mildly shiny paperclip. Instead, we got a framework: six big questions meant to guide the next decade of research while we pretend everything is still under control.
Let’s walk through these questions politely but with a bit of fun.
1. Why do we assess? (Other than tradition, habit, and institutional anxiety?)
The paper argues we must confront the “why” because AI makes authorship fuzzy. Translation: We’ve finally noticed that the emperor’s assessment rubric is wearing no clothes.
For years we pretended that a submitted document reflected some clean, isolated human mind. Now that a machine can write it, we’re forced to admit the truth: it was never that simple. Students have always borrowed, paraphrased, Googled, begged, borrowed, stolen, and occasionally read. LLMs just made this mess more visible.
2. Who is being assessed? (Hint: it’s no longer obvious.)
In a world where a student, a language model, three drafts, and a panic attack all co-produce the final essay at 2:39 AM, whose capability is the grade describing?
The authors elevate it to a grand philosophical crisis. A sceptic would shrug and say: it only looks profound if you pretend assessment wasn’t always a group project—just one done slowly, expensively, and without the help of a polite robot.”
Still, they’re right to point out that a grade now reflects what might be called an actor-network: student + AI + institution + vague vibes + the teacher’s cognitive state before morning coffee + any amount of interruptions by the student’s unconscious + the ambient temperature in the room where it happened. This should make us rethink “authorship” instead of doubling down on techno-policing.
3. What should we be assessing? (Besides whether students can avoid a LLM detector.)
The authors warn that focusing only on assurance of learning leads to absurdities—like redesigning tasks until they only measure who can be forced to hand-write something under fluorescent lights.
A more playful reframing might look like a choice between a few futures:
- Assessing what humans still do better than machines (judgment, sense-making, moral reasoning, refusing terms and conditions before clicking “accept”), or
- Assessing what humans can now pull off with machines—feats that were pure fantasy before LLMs muscled into the partnership, or
- Clinging to the essay as if it were the last biscuit in the staff room
Their point stands: once you admit hybrid capability is real, old learning outcomes don’t just wobble—they clutch their pearls, faint dramatically, and start twitching like Victorian ghosts confronted with electricity.
4. How should we assess? (Preferably without turning classrooms into biometric exam prisons.)
The paper politely critiqued surveillance creep. Can I put it more bluntly: Designing tasks to “keep LLMs out” is like trying to keep water out of a colander by yelling at it sternly. The authors call for interpretive credibility rather than technical containment.
The playful version: If you need a forensics lab to understand whether the student learned something, the assessment has already failed.
5. Where should assessment live? (Spoiler: not only in a 13-week unit.)
The authors wisely suggest programmatic assessment—multiple touchpoints, longitudinal signals, ecosystems of evidence.
They carefully avoid saying the obvious: The unit-as-king is a historical artifact from when libraries closed at 5pm and “online” meant a queue.
LLMs will force assessment to slip out of the unit silo and into something that looks more like a learning portfolio crossed with a personal data biography, minus the dystopia (hopefully).
6. What if…? (The fun part, but also the part most institutions are allergic to.)
The authors call this the speculative imagination space:
What if assessment didn’t rely on artefacts?
What if capability were evidenced through rehearsals, simulations, dialogue portfolios, or—radical thought—students actually doing meaningful work?
The paper invites this visioning politely. A sceptical expert might add: Universities only innovate when forced, so “what if” research is less blue-sky dreaming and more preparing sandbags before the LLM tsunami breaches the levee.
The Real Subtext (The Bit the Paper Can’t Say Out Loud)
Across its elegant prose and carefully built principles, the paper is basically whispering: “Friends, the game board has changed. Stop pretending it hasn’t.”
The six questions aren’t a roadmap; they’re a diplomatic way of telling the sector that:
- Detection is dead.
- Purity myths are dead.
- Unit-based assessment is dying.
- Learning outcomes may need reincarnation.
- The essay will survive only if we justify it better than “we’ve always done it.”
And research needs to stop rediscovering old insights with LLMs sprinkled on top like parmesan.
A Final Analogy
Assessment after LLMs is like discovering your long-trusted telescope has been quietly fitted with an automatic star-drawing attachment. You can still look through it. You can still see something.But unless you understand who—or what—helped produce the image, you have no idea whether you’re engaging in astronomy or imagination.
This paper doesn’t fix the telescope. But it does give you a decent user manual, a philosophical warranty card, and a hint that maybe the real work is learning to observe the sky differently.
No comments:
Post a Comment