Top or slop? AI output can look polished and still be weak. That's the habit worth teaching.

I introduced ChatGPT to my son a few years ago, when he was about 14.

At the time, I was already using generative AI fairly often. Not just for work, but for hobby projects, quick explanations, first drafts, planning, and the small everyday tasks where it can be genuinely useful. I remember showing it to him and expecting a bigger reaction. He was impressed, but only in a distant sort of way. It was interesting, but he had not yet connected it to anything he actually needed.

That has changed. Recently, he has started using AI more for images and text, and I can see the appeal immediately. Type something in, get something back, and move on. The part that does not come naturally yet is the second look. Is the answer accurate? Is it specific? Is it just saying something that sounds right? Is this good enough, or is it only good-looking?

That is the problem this activity is built around: AI output can sound polished, confident, and useful while still being vague, weak, or wrong.

Why it matters

AI output has a surface-quality problem. It can have the structure of a good answer, the tone of a good answer, and the confidence of a good answer, without having the substance underneath. That matters because students are often being asked to judge the output at the exact moment when the output is designed to look judgeable. It reads cleanly. It uses transitions. It gives balanced points. It may even sound more polished than what a student would write themselves.

That does not mean AI is useless. It means students need a different kind of literacy. UNESCO's guidance on generative AI in education argues for a human-centered approach that protects student agency and helps schools think carefully about how these tools affect teaching, learning, and assessment. The classroom skill is not "trust AI" or "ban AI." It is learning when AI output is useful, when it is weak, and when it needs checking by someone who knows the topic.

This is especially important because young people are not waiting for adults to finish the policy conversation. Common Sense Media's 2024 report on teens and generative AI shows that these tools are already part of many students' lives, including schoolwork, creativity, and everyday problem-solving. If students are going to use AI, they need habits that slow down the "one try and it's fine" approach: check the facts, look for vague filler, notice when a list is replacing an argument, and ask whether the output is genuinely useful or just fluent.

Teacher guiding students working on laptops in a classroom

A teacher guides students through a digital task — the same setting where evaluating AI output is increasingly a core skill. Photo: Emily Seymour / Voice of America, public domain.

Addressing it in your class

Building that judgment is the aim of AI: Top or Slop, a group activity for ages 14 to 18 that fits well in media literacy, digital literacy, English, advisory, technology, or future-of-work lessons. Students generate real AI output during the activity, then evaluate it using a set of practical red flags: vague generalities, confident claims without sources, formulaic structure, over-hedging, generic tone, and lists that replace actual reasoning. The goal is not to make students cynical about AI. It is to help them become better judges of when AI output is strong, weak, or somewhere in between.

What the activity covers

Ages	14–18 (grades 9–12)
Group size	3–4 students
Time	60–70 minutes
Works for	Media literacy, digital literacy, English, advisory, technology, future-of-work lessons

The activity is built in three parts. In Part 1, students learn the red flags that often appear in weak AI output. These include vague generalities, confident claims without sources, the classic formulaic AI essay structure, hedging on everything, a generic tone that fits everywhere and nowhere, and using lists as a substitute for real thinking. This gives students shared language before they start evaluating.

In Part 2, students put the red flags into practice by generating real AI output from a set of prompts. They test different output types, including a news summary, a persuasive essay opening, a historical explanation, creative writing, advice, and a factual claim. For each one, they identify red flags and give a verdict: Top, Slop, or Somewhere in between. Comparing verdicts across groups is where the lesson gets especially useful, because students have to explain the criteria behind their judgments.

In Part 3, students choose two weaker outputs and improve the prompts. This matters because it moves the discussion away from "AI is good" or "AI is bad" and toward a more useful question: What did I ask for, and how could I ask better? Students rewrite prompts by adding constraints, asking for specifics, requesting reasoning, or making the task clearer, then test whether the output improves.

The lesson also includes a teacher guide with timing, facilitation notes, differentiation ideas, and an assessment rubric. The rubric focuses on critical evaluation of AI output, understanding red flags, prompt improvement, group discussion, and quality of reflection, so students are assessed on judgment rather than simply whether they liked or disliked the AI response.

How to run it well

The most important thing is to keep the lesson balanced. Some students will be too trusting of AI output because it sounds polished. Others will be dismissive and say it is all useless. Neither position is very helpful. The aim is calibrated judgment: noticing when AI is useful, when it is weak, and what kind of checking the situation requires.

Where groups often stall is in explaining why something is weak. Students may say, "It just sounds like AI," which is a useful instinct but not yet a useful evaluation. Push them to name the pattern. Is it vague? Is it making a confident claim without a source? Is it hiding behind balance instead of taking a position? Is it giving a list when the task needed an argument? The red flags table is there to turn instinct into language.

The best discussion usually comes when two groups disagree on a verdict. One group might call a response "Top" because it is clear and well organized, while another calls it "Slop" because it says nothing specific. That disagreement is the lesson. Ask: "What would someone need to know about this topic to tell whether the answer is actually good?" That question helps students see the gap between surface quality and real quality, which is exactly the habit they need when using AI outside the classroom.

Get the activity

AI: Top or Slop is available from my store Graphene - Digital Life Lessons on Teachers Pay Teachers. It is part of the AI, Technology and the Future bundle, a collection of activities that help students think critically about generative AI, digital tools, future skills, and the choices they will need to make in a world where AI output is everywhere. Use it as a standalone lesson or as part of a wider sequence on AI literacy and evaluating information.

Get the activity See the full bundle

Top or slop? AI output can look polished and still be weak.That's the habit worth teaching.

Why it matters

Addressing it in your class

What the activity covers

How to run it well

Get the activity

Top or slop? AI output can look polished and still be weak.
That's the habit worth teaching.