Friday Harbor周五港
№ 012 Monday, May 11, 2026 2026年5月11日星期一 AI safety · data training · content moderation · psychological exposure AI 安全 · 数据训练 · 内容审核 · 心理暴露

Who Sees the Dark for AI? 谁替 AI 看见黑暗

Data training, content moderation, and the psychological cost of safe AI. 数据训练、内容审核与安全 AI 的心理代价。

Loading redaction table... 遮蔽档案台加载中...

Safe AI often begins as a redaction table. The figure does not reproduce traumatic content. It shows the surrounding machinery instead: labels, policy calls, quotas, escalation marks, and black bars where harmful material has been intentionally withheld. The clean reply at the top is not weightless; it is held up by repeated human exposure beneath it. Exposure layer — content categories that workers may have to review without seeing a clean interface first. Care layer — consent, rotation, support, and refusal rights that should surround the work.
“安全 AI” 往往从一张遮蔽档案台开始。 这张图不复现创伤性内容,而是呈现它周围的机器:标签、政策判断、绩效计数、 升级标记,以及有意遮住有害材料的黑条。最上方干净的回复并不是轻盈出现的; 它由下方反复发生的人类暴露托住。 暴露层——劳动者可能必须先观看、再判断的内容类别。 照护层——知情同意、轮岗、支持与拒绝权,应该围绕这类劳动而存在。

The cleanest sentence in an AI interface may be the most misleading one: "I can't help with that." It sounds as if safety has been solved by a rule. But a rule is only the final surface of a longer system. Before the refusal appears, someone has gathered examples, labeled them, compared borderline cases, tested policy failures, ranked outputs, escalated exceptions, and looked at material the interface is designed to keep away from the user.

That is the hidden bargain of safe AI. Harm is removed from the foreground by moving it into a training and moderation workflow. Violence, hate speech, sexual abuse, self-harm, extremism, fraud, and graphic imagery become examples from which a model learns refusal, ranking, filtering, and policy boundaries. The user sees the polished endpoint. The worker may see the material that made the endpoint possible.

The workers asked to do this are often structurally vulnerable before the first item appears on screen. Many are contract workers, outsourced workers, migrants, young platform workers, or people in labor markets where refusal can mean losing income. This vulnerability is not incidental to the system; it helps make the system possible. The people least able to refuse exposure are often the people asked to absorb it.

The consequences are not abstract. Reports and mental-health studies describe symptoms associated with PTSD, anxiety, depression, panic, numbness, sleep disruption, intrusive memory, and substance use. These are not unfortunate side effects of an otherwise weightless technology. They are predictable injuries in a system that asks some people to metabolize the darkest material so others can experience safety as a clean interface.

The injury is not only exposure; it is exposure made repetitive, ambiguous, and managed. A single disturbing item can be difficult. Hundreds of ambiguous items per day become a workplace. The worker must decide, quickly and consistently, whether a text is hateful or merely offensive, whether an image belongs in a news context or a prohibited category, whether a model's refusal is too weak, too broad, or too late. The task asks for moral judgment, but the workplace often measures speed, agreement, and throughput.

This is why moderation and safety training should be understood as emotional labor as much as data labor. The worker must suppress disgust, fear, anger, sadness, and shock long enough to apply a policy label. They must remain sensitive enough to recognize harm, yet detached enough to continue through the queue. They convert affect into judgment: trauma becomes a category, a score, a refusal, a model update. Over time this can become a form of moral injury. The pain comes not only from seeing harmful material, but from being made useful to an institution by repeatedly seeing it.

This is also why the answer cannot be reduced to individual resilience. Counseling after the shift matters, but it is far from enough. Therapy may help a worker survive exposure; it cannot by itself change the production system that keeps assigning exposure. The real intervention has to happen before, during, and after the task: informed consent, meaningful refusal rights, exposure limits, rotation, realistic quotas, trauma-trained supervision, living wages, collective bargaining, and care that continues after the contract ends. Without those conditions, aftercare becomes a patch on a structure that keeps producing harm.

The harms do not stop inside the worker's body. They spill into sleep, family life, friendship, trust, and civic participation. A society that asks precarious workers to digest its violent, abusive, racist, sexual, and suicidal material is not merely solving a technical problem. It is relocating a social problem. The psychological pain of moderation and training work is one of the places where the broader violence of the internet, platform capitalism, outsourcing, and inequality comes to rest.

A more honest AI safety practice would document labor risk as clearly as it documents model risk. Dataset cards and model cards ask where data came from, what a model can do, and how it might fail. They should also ask who reviewed dangerous material, what warnings they received, how long they were exposed, whether they could refuse or rotate out, and what care remained available after the contract ended. The point is not to make AI less safe. It is to stop treating worker harm as the hidden price of user safety. A clean interface is not evidence that danger disappeared. It may only mean that danger was moved to a place, a contract, and a person the interface learned not to name.

AI 界面里最干净的一句话,可能也是最容易误导人的一句:"抱歉,我不能帮助你完成这个请求。" 它听起来像是安全已经被一条规则解决了。但规则只是最后露出来的表面。在拒绝回答出现之前, 有人收集样本、标注内容、比较边界案例、测试政策失败、排序输出、升级例外,并观看那些 界面本来想替用户挡住的材料。

这就是安全 AI 的隐秘交易。伤害从前台消失,是因为它被移动到了训练和审核流程之中。 暴力、仇恨言论、性虐待、自残、极端主义、诈骗和血腥图像,都会变成模型学习拒绝、 排序、过滤和政策边界的样本。用户看到的是被打磨过的终点;劳动者看到的,可能是让这个 终点成为可能的材料。

做这些工作的人,在第一条任务出现在屏幕上之前,往往已经处在结构性的脆弱位置。 他们可能是合同工、外包工、移民劳动者、年轻的平台劳动者,或者身处一个很难拒绝任务的 劳动力市场。这种脆弱状态不是系统之外的偶然背景,而是系统得以运转的一部分: 最难拒绝暴露的人,往往最容易被要求去承受暴露。

后果并不抽象。调查和心理健康研究中反复出现的,不只是“压力大”,而是与 PTSD、 焦虑、抑郁、惊恐、麻木、睡眠障碍、侵入性记忆和物质依赖相关的症状。这些不是 炫目技术的偶然副作用,而是一个制度把最黑暗的材料交给某些人吸收之后,很可能出现的伤。

伤害也不只是“看见”。它是被管理的、重复的、充满模糊边界的观看。单个令人不适的材料 已经足够困难;每天数百个含混材料,就变成了一种工作制度。劳动者必须快速而一致地判断: 一段文字是仇恨还是冒犯?一张图属于新闻语境还是禁限类别?模型的拒绝是太弱、太宽, 还是来得太晚?任务要求的是伦理判断,考核的却常常是速度、一致性和吞吐量。

因此,这也是一种情绪劳动,而不只是数据劳动。劳动者必须压住恶心、恐惧、愤怒、 悲伤和震惊,才能继续按照政策表打标签。他们既要保持足够敏感,才能识别伤害; 又要保持足够疏离,才能继续处理下一条任务。平台要求他们把情绪转化为判断: 把创伤变成类别、分数、拒绝、模型更新。久而久之,这可能接近一种道德伤害: 痛苦不只是来自看见有害内容,也来自一个机构反复要求你通过看见这些内容来变得“有用”。

所以,这个问题不能被简化为个人韧性。下班后的心理咨询当然重要,但远远不够。 治疗可以帮助一个劳动者从暴露中恢复,却不能单独改变那个不断分配暴露的生产制度。 真正的干预必须发生在任务前、任务中和任务后:知情同意、真实的拒绝权、暴露时长上限、 轮岗、合理 KPI、受过创伤训练的管理者、体面工资、集体协商,以及合同结束后仍然存在的照护。 如果没有这些条件,所谓后续照护就只是给持续制造伤害的结构贴上一块补丁。

这些伤害也不会停留在劳动者身体内部。它会进入睡眠、家庭、亲密关系、信任和公共生活。 一个社会如果要求脆弱劳动者替它消化暴力、辱骂、种族主义、性伤害和自杀性内容, 它并不只是在解决一个技术问题,而是在转移一个社会问题。审核与训练劳动中的心理痛苦, 正是互联网暴力、平台资本主义、外包制度和不平等结构沉积下来的地方之一。

更诚实的 AI 安全实践,应该像记录模型风险一样记录劳动风险。数据集说明和模型卡会问: 数据从哪里来,模型能做什么,可能如何失败。它们也应该问:是谁观看了危险材料? 他们收到怎样的风险告知?每天暴露多久?是否可以拒绝或轮岗?合同结束后,照护是否仍然存在? 这不是要让 AI 变得不安全,而是拒绝把劳动者伤害当作用户安全的隐性价格。一个干净界面 并不证明危险已经消失。它可能只是说明,危险被转移到了某个地方、某份合同和某个被界面 学会不去命名的人身上。

Endnote尾注

How to cite引用格式

Zhao, B. (2026, May 11). Who Sees the Dark for AI?. Friday Harbor (HGIS Lab Column), Article 12.
Humanistic GIS Lab, University of Washington. https://hgis.uw.edu/friday-harbor/2026-05-11-who-sees-the-dark-for-ai/