THE CASE THAT A.I. IS THINKING

发表于 2025年11月3日

Dario Amodei, the C.E.O. of the artificial-intelligence company Anthropic, has been predicting that an A.I. “smarter than a Nobel Prize winner” in such fields as biology, math, engineering, and writing might come online by 2027. He envisions millions of copies of a model whirring away, each conducting its own research: a “country of geniuses in a datacenter.” In June, Sam Altman, of OpenAI, wrote that the industry was on the cusp of building “digital superintelligence.” “The 2030s are likely going to be wildly different from any time that has come before,” he asserted. Meanwhile, the A.I. tools that most people currently interact with on a day-to-day basis are reminiscent of Clippy, the onetime Microsoft Office “assistant” that was actually more of a gadfly. A Zoom A.I. tool suggests that you ask it “What are some meeting icebreakers?” or instruct it to “Write a short message to share gratitude.” Siri is good at setting reminders but not much else. A friend of mine saw a button in Gmail that said “Thank and tell anecdote.” When he clicked it, Google’s A.I. invented a funny story about a trip to Turkey that he never took.

The rushed and uneven rollout of A.I. has created a fog in which it is tempting to conclude that there is nothing to see here—that it’s all hype. There is, to be sure, plenty of hype: Amodei’s timeline is science-fictional. (A.I. models aren’t improving that fast.) But it is another kind of wishful thinking to suppose that large language models are just shuffling words around. I used to be sympathetic to that view. I sought comfort in the idea that A.I. had little to do with real intelligence or understanding. I even celebrated its shortcomings—rooting for the home team. Then I began using A.I. in my work as a programmer, fearing that if I didn’t I would fall behind. (My employer, a trading firm, has several investments in and partnerships with A.I. companies, including Anthropic.) Writing code is, by many accounts, the thing that A.I. is best at; code has more structure than prose does, and it’s often possible to automatically validate that a given program works. My conversion was swift. At first, I consulted A.I. models in lieu of looking something up. Then I gave them small, self-contained problems. Eventually, I gave them real work—the kind I’d trained my whole career to do. I saw these models digest, in seconds, the intricate details of thousands of lines of code. They could spot subtle bugs and orchestrate complex new features. Finally, I was transferred to a fast-growing team that aims to make better use of A.I. tools, and to create our own.

The science-fiction author William Gibson is said to have observed that the future is already here, just not evenly distributed—which might explain why A.I. seems to have minted two cultures, one dismissive and the other enthralled. In our daily lives, A.I. “agents” that can book vacations or file taxes are a flop, but I have colleagues who compose much of their code using A.I. and sometimes run multiple coding agents at a time. Models sometimes make amateur mistakes or get caught in inane loops, but, as I’ve learned to use them effectively, they have allowed me to accomplish in an evening what used to take a month. Not too long ago, I made two iOS apps without knowing how to make an iOS app.

I once had a boss who said that a job interview should probe for strengths, not for the absence of weaknesses. Large language models have many weaknesses: they famously hallucinate reasonable-sounding falsehoods; they can be servile even when you’re wrong; they are fooled by simple puzzles. But I remember a time when the obvious strengths of today’s A.I. models—fluency, fluidity, an ability to “get” what someone is talking about—were considered holy grails. When you experience these strengths firsthand, you wonder: How convincing does the illusion of understanding have to be before you stop calling it an illusion?

On a brutally hot day this summer, my friend Max met up with his family at a playground. For some reason, a sprinkler for kids was switched off, and Max’s wife had promised everyone that her husband would fix it. Confronted by red-faced six- and seven-year-olds, Max entered a utility shed hoping to find a big, fat “On” switch. Instead, he found a maze of ancient pipes and valves. He was about to give up when, on a whim, he pulled out his phone and fed a photo into ChatGPT-4o, along with a description of his problem. The A.I. thought for a second, or maybe didn’t think, but all the same it said that he was looking at a backflow-preventer system typical of irrigation setups. Did he see that yellow ball valve toward the bottom? That probably controlled the flow. Max went for it, and cheers rang out across the playground as the water turned on.

Was ChatGPT mindlessly stringing words together, or did it understand the problem? The answer could teach us something important about understanding itself. “Neuroscientists have to confront this humbling truth,” Doris Tsao, a neuroscience professor at the University of California, Berkeley, told me. “The advances in machine learning have taught us more about the essence of intelligence than anything that neuroscience has discovered in the past hundred years.” Tsao is best known for decoding how macaque monkeys perceive faces. Her team learned to predict which neurons would fire when a monkey saw a specific face; even more strikingly, given a pattern of neurons firing, Tsao’s team could render the face. Their work built on research into how faces are represented inside A.I. models. These days, her favorite question to ask people is “What is the deepest insight you have gained from ChatGPT?” “My own answer,” she said, “is that I think it radically demystifies thinking.”

人工智能公司Anthropic的首席执行官达里奥·阿莫代伊（Dario Amodei）曾预测，到2027年，在生物学、数学、工程和写作等领域，“比诺贝尔奖得主更聪明”的人工智能可能会问世。他设想数百万个模型副本高速运转，各自进行研究：一个“数据中心里的天才国度”。今年6月，OpenAI的萨姆·奥尔特曼（Sam Altman）撰文指出，该行业正处于构建“数字超级智能”的风口浪尖。他断言：“21世纪30年代很可能将与以往任何时代都大不相同。”与此同时，大多数人目前日常接触的人工智能工具，都让人联想到曾经的微软Office“助手”Clippy，那个实际上更像是个“烦人精”的工具。Zoom的人工智能工具会建议你问它“会议破冰问题有哪些？”，或者指示它“写一条简短的感谢信息”。Siri擅长设置提醒，但除此之外就没什么别的了。我的一个朋友在Gmail里看到一个写着“感谢并讲个趣闻”的按钮。当他点击后，谷歌的人工智能就编造了一个关于他从未去过的土耳其之旅的有趣故事。

人工智能（AI）的仓促且发展不均的推出，造成了一种迷雾，让人很容易就得出结论——AI没什么了不起的，一切都只是炒作。诚然，炒作确实不少：阿莫代伊（Anthropic公司首席执行官）预测的时间表就充满了科幻色彩（AI模型并没有进步得那么快）。但如果认为大型语言模型仅仅是在机械地堆砌词语，那也是一种一厢情愿的想法。我曾认同这种观点。我曾借此安慰自己，认为AI与真正的智能或理解力关系不大。我甚至乐于见到它的缺点——像支持主队一样。后来，我开始在自己的程序员工作中运用AI，因为我担心如果不这样做就会落伍。（我的雇主是一家贸易公司，与包括Anthropic在内的多家AI公司有投资和合作关系。）普遍认为，编写代码是AI最擅长的领域；代码比散文更具结构性，而且通常可以自动验证给定的程序是否有效。我的转变是迅速的。起初，我不再查阅资料，而是咨询AI模型。接着，我给它们一些小型、独立的问题。最终，我把真正的任务——那些我整个职业生涯都在训练自己做的事情——交给了它们。我看到这些模型能在几秒钟内消化成千上万行代码的复杂细节。它们能发现细微的错误，并协调复杂的全新功能。最后，我被调到一个快速发展的团队，该团队旨在更好地利用AI工具并开发我们自己的AI工具。

科幻作家威廉·吉布森（William Gibson）曾说过，未来已来，只是尚未普及——这或许能解释为什么人工智能（A.I.）似乎催生了两种文化，一种是不屑一顾，另一种是为之着迷。在我们的日常生活中，那些能预订假期或申报税务的人工智能“代理”（A.I. agents）并不成功，但我的同事们很多人都用人工智能来编写代码，有时还会同时运行多个编码代理。这些模型有时会犯业余错误，或者陷入无意义的循环中，但随着我学会了如何有效地使用它们，它们让我可以在一个晚上完成过去需要一个月才能完成的工作。不久前，我在完全不懂如何制作iOS应用的情况下，开发出了两个iOS应用。

我曾经有位老板说，面试应该考察求职者的优势，而不是关注他们是否没有缺点。大语言模型有很多弱点：它们会编造听起来合乎情理的虚假信息；即使你错了，它们也可能顺从你的意思；它们还会被简单的谜题骗过。但我记得，曾几何时，如今人工智能模型所展现出的明显优势——比如它们语言的流畅性、表达的灵活性，以及“理解”人们所说内容的能力——都被认为是可望不可即的“圣杯”。当你亲身体验这些优势时，你会不禁思考：这种“理解”的假象，究竟要有多么逼真，我们才会停止将其称为假象呢？

今夏一个酷热难当的日子，我的朋友麦克斯和家人在游乐场碰面。不知何故，一个儿童洒水器被关了，而麦克斯的妻子已经向所有人承诺，她丈夫会把它修好。面对一群涨红了脸的六七岁孩子，麦克斯走进一个工具棚，希望能找到一个又大又显眼的“开启”开关。但映入眼帘的却是一堆老旧的管道和阀门，盘根错节。他正准备放弃时，心血来潮掏出手机，将一张照片连同他遇到的问题描述一起输入了ChatGPT-4o。这个人工智能“思考”了片刻，或者说，或许根本没有“思考”，但它还是指出，他看到的是一个灌溉系统常见的防回流装置。“他看到底部那个黄色的球阀了吗？那个可能就是控制水流的。”麦克斯照着做了，随着水被打开，欢呼声响彻了整个游乐场。

ChatGPT是在漫无目的地拼凑词语，还是它真正理解了问题？这个问题的答案或许能让我们对“理解”本身的意义有重要的认识。加州大学伯克利分校（University of California, Berkeley）的神经科学教授多丽丝·曹（Doris Tsao）告诉我：“神经科学家们必须面对这个令人警醒的真相。机器学习的进步，让我们对智能本质的理解，比过去一百年里神经科学所发现的任何事物都要多。”曹教授最著名的研究是解码猕猴（macaque monkeys）如何感知面孔。她的团队学会了预测当猴子看到特定面孔时，哪些神经元会活跃；更令人惊奇的是，根据神经元的放电模式，曹教授的团队甚至能将这张面孔重现出来。他们的研究是建立在关于人工智能（A.I.）模型中如何呈现面孔的成果之上的。如今，她最喜欢问别人一个问题是：“你从ChatGPT中获得的最深刻的见解是什么？”她说：“我自己的答案是，我认为它彻底揭开了‘思维’的神秘面纱。”