14 Comments
User's avatar
A student's avatar

I'm very curious to see where you land on your creative piece. I just spent roughly a year trying to help a marketing team 'automate' themselves and it was a fascinating experiment but ultimately fell on the creative 'verifier' from your article.

In marketing we use statistics to verify whether the creative is resonating but those statistics don't come cheap, so I put an aggressive bayesian in the loop assessing the posteriors with a minimal cost element. So then on the next loop the cost of regeneration (especially with video) to hit the new prior is higher than anyone wants to admit, especially when your verifier has to incorporate legal constraints and policy guidelines and then if you layer in branding and voice guidance....forget it, it's just crazy expensive and complicated when I can have a human take the guidance and produce something doing their own mini loop. So we landed on using the LLM to provide the analysis and still let the human run their own creative loops to meet the guidance, which is honestly no different than it is today. Just more robust analysis.

Sandeep Mehta's avatar

I am reminded of Isidor Rabi's mother (Physics Nobel 1944) who trained him as a child by asking him "Izzy did you ask a good question today?" on returning from school.

Suzi Travis's avatar

I really enjoyed this one. Your framing is really useful. The "what was the verifier, and who built it?" heuristic is the most portable thing I've read on AI-in-science in a while. I'll be borrowing it!

One thing I was wondering... the loop view (at least as you describe it here) seems per-discovery. That's probably where its strength is (you can look at Claude's Cycles or GNoME and say cleanly, here's the proposer, here's the verifier, here's the human work). But I do wonder what happens when we zoom out from one discovery to the whole field of science.

If the proposer slot used to be hand-engineered per domain, it was at least diverse by construction. The thing you flag as the key 2022 shift (general-purpose LLMs filling the proposer slot across domains) might also be the thing that homogenises what gets proposed. Are we not using the same handful of models, same training data, same predictability bias, that now sits in the proposer slot?

The verifier still catches the wrong answers. Sure. But the verifier might not tell you what wasn't proposed. A strong verifier with a narrow proposer will likely just give you reliable results inside a narrowed candidate space, and you'd never know what the wider space looked like.

I wrote about this back in 2024 through Messeri and Crockett's "illusion of exploratory breadth" — the idea that the belief AI can test all possible hypotheses is false, because the testable-with-AI space is narrower than it looks. I think your loop frame and that worry sit together. The loop does the discovery. But which discoveries are even candidates?

Which makes me curious: does a general-purpose LLM in the proposer slot narrow what gets proposed, or does cross-domain transfer broaden it?

I suspect you'd say cross-domain transfer broadens it — more candidates = more reach. But I keep wondering: is that breadth at the researcher level rather than the field level? Could the same shift that gives any individual researcher access to a wider candidate space also narrow the distribution of kinds of candidates across everyone running similar loops? Or am I drawing a distinction that doesn't hold up once you look at how these loops compose across a field?

Looking forward to the creative-domains piece next week. The verifier question gets genuinely interesting there.

Alejandro Piad Morffis's avatar

Thanks Suzi for the kind words and the thoughtful pushback! You're absolutely right, I think, in worrying about the homogenizing effect of LLMs, we're seeing that in writing, in coding, and basically everywhere these things get massively deployed.

The silver lining perhaps is that the loop as I described it starts with a human posing a worthwhile question, and every iteration of the loop has a human inside nudging where the next iteration should go. Without that, I do believe we collapse into whatever next-token prediction from existing knowledge can get us, not necessarily bad science but probably no novel, once-in-a-century, Einstein's relativity kind of science.

My guess is that for a while both will be true, we'll have people doing frontier but run-of-the-mill, immediate next step science, and that is super important because that is where the next vaccine is; and we should have people just thinking what hasn't been thought about yet. We are getting less and less of the latter since the institutionalization of science in the late 20th century and the paper mills.

Perhaps this will also bring a complete collapse of the paper-as-proxy-for-research metric. That would be for the better I think.

Sandeep Mehta's avatar

The paper-as-proxy-for-research has already collapsed. We just haven't woken up to it or are wilfully choosing to ignore it as the table stakes are still aligned to Google Scholar metrics etc. Funk et all talk about it regularly. Another dimension I'm curious to explore is that of the "AI lab member" as one who has the curious, imaginative, leaps that lead to "out of field" beautiful theories (a la Roger Penrose CCC) or if we are condemned to grind away at the myriad of theories the proposer outputs in the search for a provable universal cosmological model for example.

Dan Kinsky's avatar

The AlphaFold example is the most fascinating one to me. "Humans curated" meant choosing the problem - not reviewing any individual result. And the verifier was also a program we built, not a human judgment call. The human wasn't really in the loop for individual results at all.

Which suggests the human role is already shifting upstream: build the right machine, step back, let it run. And if one day AI can design the loop too, which doesn't seem that far off, then "upstream" starts is pushed back even more.

Brian Marick's avatar

Of some interest might be R-K selection theory (https://en.wikipedia.org/wiki/R/K_selection_theory). There are situations where it makes a lot of sense to make a zillion offspring/predictions, most of which will die. Others in which lavishing resources on a few offspring/predictions is the right strategy.

Does inserting LLMs into the scientific discovery loop shift progress from the latter to the other strategy? What are the implications?

Sandeep Mehta's avatar

Reading the post below on the recent Erdős disproof development, reminded of me Alejandro's model of deterministic harness on a probabilistic token engine. One can mimic some procedural rigor of elimination (part of the proof) but the verification is inevitably human. The output was simply a different walk down the n-dimensional matrix space. Ultimately neither maximalist nor dismissive but just a reliable AI lab member who knew all the past literature and how to use it.

https://medium.com/@vishalmisra/a-proof-is-a-long-multiplication-6e208c9582a9

Brian Marick's avatar

I had trouble understanding the role of the curator. My first thought is that the curator comes after the verifier (“a human curates what survives”) and decides what to do with the solution (publish?, ask the next question to restart the loop). But some text seems to suggest the curator also intervenes in other kinds of filtering: which problems to move forward with, which solutions to verify. That would give me this picture: https://blog.oddly-influenced.dev/uploads/2026/cleanshot-2026-05-28-at-12.30.242x.png. Does it capture your intention? If so, I will have more comments, I think.

Alejandro Piad Morffis's avatar

Yeah, I think so, the curator holds the vision. It is the only thing in the system that has motivations, desires, even taste I would say. Some workflows don't need (or don't benefit from) a curator inside the tight loop from generation to validation because e.g., validation is cheap (think formal theory proving) but when the validation is expensive, e.g. actually do some real chemistry in a lab, I think te curator role extends to the inner loop as well. I'm sure my wording is imperfect and the analogies can be improved, really appreciate any further comments :)

Brian Marick's avatar

Having wrenched your words into a diagram, let me complicate it in two ways.

First (and less importantly): in science, there are other feedback loops that I think are generally under-discussed. [Here's a picture](https://blog.oddly-influenced.dev/uploads/2026/inner-loops.png) (Let's see if substack does Markdown.)

The green dashed lines show feedback.

1. Creating a proposal quite frequently changes understanding of the problem. This often flows from the need to formalize or make rigorous the problem.

2. There might be difficulties or oddities in the experiment/verification that adjust the proposal and/or the understanding of the problem.

In a way, you could think of these as adding routes not contemplated by the "happy path" of the methodology.

The way I'd put it is that "Aha!" moments can arise from any role as it does its job. Your model assigns the LLMs to one (or more) of the roles. But if the creativity is smeared throughout all the roles, what are the implications?

Brian Marick's avatar

(Hey substack, get with the times.)

Picture: https://blog.oddly-influenced.dev/uploads/2026/interactions.png

I've shown one set of recurrent activities connected to another of the same type. I've drawn lines between the same roles in both because I have examples/citations for them. My main example is from https://en.wikipedia.org/wiki/Bohr%E2%80%93Einstein_debates because that's what I've been reading up on for the past week or so.

(I'm leaving out communications between, say, the Proposer of one loop to the Verifier of another.)

In Randall Collins [*The Sociology of Philosophies: A Global Theory of Intellectual Change*](https://www.hup.harvard.edu/books/9780674001879) (2009), Collin's describes how Problem Posers communicate in order to "find out where the action is" in their field, "to appropriate the puzzles which have the greatest significance for the future activites of their colleagues."

Example: Heisenberg went to Copenhagen to work with Bohr. That exposed him to leading issues of the day, from which he developed one of the two big 1926 frameworks for quantum mechanics. https://en.wikipedia.org/wiki/Werner_Heisenberg#Academic_career

Brian Marick's avatar

Interactions between Proposers are also important. Schrödinger had a rival theory. The interplay (rivalry) between those theories helped clarify fundamental issues like: should we be working assuming particles (Heisenberg) or waves (Schrödinger) or what?

Bohr and Einstein had a friendly but vigorous dispute about what quantum mechanics *meant*. https://en.wikipedia.org/wiki/Bohr%E2%80%93Einstein_debates That involved metaphysical concepts about causality (which I'd lump under the Poser role) and what entities theories were actually theorizing about (which looks like a Proposer-Proposer debate). A lot of the debate revolved around thought experiments – "what would happen if you constructed this apparatus..." – which seem to fit into the domain of the Verifier.

Brian Marick's avatar

Finally, [*Image and Logic: A Material Culture of Microphysics*](https://press.uchicago.edu/ucp/books/book/chicago/H/bo5969426.html), 1997 talks about how experimentation has traditions that are partly driven by the need to check predictions (solution ideas?) but also operate according to their own "internal logic". In his story, while theorists were having their own Kuhnian scientific revolutions, experimentalists had a not-temporally-connected revolution around whether you wanted pictures of particle tracks ("image") or whether counting was a better approach ("logic")

Possibly relevant: the Posers and Proposers were not ignorant of what the Verifiers could actually do. It was uncool for them to ignore that. Einstein said "There is no logical path leading to \[the highly universal laws of science]. They can only be reached by intuition, based upon something like an intellectual love of the objects of experience." Can that intellectual love be removed from either the Proposer or the Verifier? How well, in general, can we do without it?

More research is needed. Also, Substack should spend less time catering to Nazis and more time facilitating discussion. This is the worst text box I've seen in a long time. When I type a character, the screen immediately jumps so I can't see what I typed. But it's well known that authoritarians and those who ride in their wake are generally incompetent at execution.