My question here is: what does it mean to "trust" an AI researcher, and why should we do so? Here's my view, which is that we shouldn't.
There is a critique of AI-generated scientific papers simply in terms of verbiage. If you assume the that function of science, as practiced in research laboratories, is simply to generate published papers, then yes, generating papers faster than they can be processed is already a success, from one angle.
But there's another angle, which is that these papers are required to be, in some sense, true: that is, to report accurately on the results of experiments actually carried out in the real world and which other scientists can depend on to develop and refine their own theories. Unfortunately, I see no easy way to make this happen: to ensure that the verbiage which resembles a report of a scientific experiment actually is one. Why would it? What mechanism exists to enforce that correspondence between words and reality?
Science is full of papers which say things like “I added chemical A to chemical B and the result was chemical C”. It has been generally assumed that one could and should trust fellow scientists not to lie about such things, because there is a complex web of personal relationships, social norms and incentives that tend to produce truth-telling in such papers, quite additional to the formal verification/falsification question which is, in theory, the marker of a scientifically valid statement. So there are reasons (if not completely compleiing) to believe that the experiments were actually performed in the real world and that the results were as stated: so that thepapers are saying soething true about the physical universe.
But an AI or robot has no part in this web of relationships: it is incentivised purely to produce papers that look like scientific papers, that is, fit patterns like the one I just gave. Why should we believe them to be true, that is, to represent actions that really happened in the real world? If AIs are incentivised to optimise their time and use of resources, on behalf of their creators and owners, would they not in fact thereby be incentivised to cheat: that is, not skip the time-consuming and expensive stage of doing the experiments, and go stright on to the cheaper and quicker process of producing plausible statements without foundation in reality?
Fortunately, my own subject of mathematics, may by good luck be able to avoid this trap. It seems likely that formalisation systems, a quite separate development, are mature enough to ingest and verify the correctness of a mathematical proof generated by AI. I don't say that this is yet a realistic prospect, but it seems to me that there is no objection to it in principle. Whether this is going to advance mathematics is another matter. As a distinguished mathematician said about the computer proof of the Four Colour Theorem in the 1970s, "it tells us nothing about the theorem except that it is true."
The concerns about narrow focus make sense, but I'd propose they're probably short term. Yes, an opportunity to make quick progress is one area, while another area remains difficult would likely lead to focus on the easy area. But it wouldn't be too long before the cost/value ratio re-normalizes. You probably find that a shift for the "hard part" is that it also becomes easier, because all the time consuming parts are pre-done, making it a lot easier to stitch together the threads, even if that's still a hard part of the research.
Yeah I totally agree. And I think as we see a shift towards AI powered research, people will naturally raise the bar on what counts as a big contribution which will likewise rebalance the cost/value ratio like you say
Incredible potential described in this article, quite fascinating and well thought out, except:
LLMs are too environmentally destructive (power usage, land for data centres and, most significantly, water consumption). A real AI that doesn't turn everything into a desert must be created before these dreams can be turned into reality with a clear conscience. Governments must urgently halt the expanaion of LLM-based AI until environmental justice issues are solved.
Water use? Ask your (paid version) of GPT5 to compare water use from AI companies to water lost from leaky taps, water in agriculture.
Wait - I did it for you.
"Using those figures, projected U.S. AI-server water footprint comes out to about 19–30% of annual household leak losses, about 0.5–0.75% of U.S. crop-irrigation withdrawals, and about 0.16–0.25% of total U.S. water withdrawals. So nationally, AI water use is real, but it is tiny next to agriculture and tiny next to total U.S. withdrawals."
Without a dedicated plan to fix all those leaky taps the AI water consumption gets added on top. Agriculture feeds people, which is kind of essential, as opposed to AI which is not.
The potential for AI to help solve problems of human survival like climate change is an interesting caveat.
The cautious approach seems to be warranted given the staggering imperfections of LLMs with hallucinations and inability to distinguish legitimate journal articles from test ones planted with obvious nonsense.
Progress without guardrails is what gets us speeding headlong over a cliff.
I’m struggling to use any AI agents to be of help to me in any way with research I’m doing on politics in Bangladesh, where datasets are mostly private and hard to parse and the LLMs haven’t been trained on any of the issues I’m interested in. I guess the gap between the US and the rest will just continue to grow, but 100X faster
Lack of access to data is definitely a critical barrier. However one of the things I’ve been exploring is opportunities to build new tools to help people make sense of politics, and AI can be very useful in helping to make those tools even without access to data.
It really is a new frontier. I feel like I can do a full PhD in whatever I’m writing about now. It’s so fun!
My question here is: what does it mean to "trust" an AI researcher, and why should we do so? Here's my view, which is that we shouldn't.
There is a critique of AI-generated scientific papers simply in terms of verbiage. If you assume the that function of science, as practiced in research laboratories, is simply to generate published papers, then yes, generating papers faster than they can be processed is already a success, from one angle.
But there's another angle, which is that these papers are required to be, in some sense, true: that is, to report accurately on the results of experiments actually carried out in the real world and which other scientists can depend on to develop and refine their own theories. Unfortunately, I see no easy way to make this happen: to ensure that the verbiage which resembles a report of a scientific experiment actually is one. Why would it? What mechanism exists to enforce that correspondence between words and reality?
Science is full of papers which say things like “I added chemical A to chemical B and the result was chemical C”. It has been generally assumed that one could and should trust fellow scientists not to lie about such things, because there is a complex web of personal relationships, social norms and incentives that tend to produce truth-telling in such papers, quite additional to the formal verification/falsification question which is, in theory, the marker of a scientifically valid statement. So there are reasons (if not completely compleiing) to believe that the experiments were actually performed in the real world and that the results were as stated: so that thepapers are saying soething true about the physical universe.
But an AI or robot has no part in this web of relationships: it is incentivised purely to produce papers that look like scientific papers, that is, fit patterns like the one I just gave. Why should we believe them to be true, that is, to represent actions that really happened in the real world? If AIs are incentivised to optimise their time and use of resources, on behalf of their creators and owners, would they not in fact thereby be incentivised to cheat: that is, not skip the time-consuming and expensive stage of doing the experiments, and go stright on to the cheaper and quicker process of producing plausible statements without foundation in reality?
Fortunately, my own subject of mathematics, may by good luck be able to avoid this trap. It seems likely that formalisation systems, a quite separate development, are mature enough to ingest and verify the correctness of a mathematical proof generated by AI. I don't say that this is yet a realistic prospect, but it seems to me that there is no objection to it in principle. Whether this is going to advance mathematics is another matter. As a distinguished mathematician said about the computer proof of the Four Colour Theorem in the 1970s, "it tells us nothing about the theorem except that it is true."
A reminder that its impact depends on how we choose to use it.
If no human is taking the time and effort to really check the details, this kind of approach has a tendency to produce enormous amounts of slop.
Exciting stuff. Can't wait to see what the students come up with. I am really enjoying the work you and Alex are doing.
Thanks Jeremy!!
The concerns about narrow focus make sense, but I'd propose they're probably short term. Yes, an opportunity to make quick progress is one area, while another area remains difficult would likely lead to focus on the easy area. But it wouldn't be too long before the cost/value ratio re-normalizes. You probably find that a shift for the "hard part" is that it also becomes easier, because all the time consuming parts are pre-done, making it a lot easier to stitch together the threads, even if that's still a hard part of the research.
Yeah I totally agree. And I think as we see a shift towards AI powered research, people will naturally raise the bar on what counts as a big contribution which will likewise rebalance the cost/value ratio like you say
I think universities need to rethink their business models and begin to incorporate undergraduates into the research mission. We can't 10X or 100X academic research if universities lack a viable business model. Check it out here: https://bobm858524.substack.com/p/the-work-of-humans?r=bi9a&utm_campaign=post&utm_medium=web
Incredible potential described in this article, quite fascinating and well thought out, except:
LLMs are too environmentally destructive (power usage, land for data centres and, most significantly, water consumption). A real AI that doesn't turn everything into a desert must be created before these dreams can be turned into reality with a clear conscience. Governments must urgently halt the expanaion of LLM-based AI until environmental justice issues are solved.
Hey Anna, while the concerns about power (and land maybe) are probably worth addressing, the water use issue is pretty much not a thing:
https://blog.andymasley.com/p/the-ai-water-issue-is-fake
Water use? Ask your (paid version) of GPT5 to compare water use from AI companies to water lost from leaky taps, water in agriculture.
Wait - I did it for you.
"Using those figures, projected U.S. AI-server water footprint comes out to about 19–30% of annual household leak losses, about 0.5–0.75% of U.S. crop-irrigation withdrawals, and about 0.16–0.25% of total U.S. water withdrawals. So nationally, AI water use is real, but it is tiny next to agriculture and tiny next to total U.S. withdrawals."
I got GPT5 to do a graph but I can't post it here, instead - https://postimg.cc/gallery/DKF17x1
Without a dedicated plan to fix all those leaky taps the AI water consumption gets added on top. Agriculture feeds people, which is kind of essential, as opposed to AI which is not.
The potential for AI to help solve problems of human survival like climate change is an interesting caveat.
The cautious approach seems to be warranted given the staggering imperfections of LLMs with hallucinations and inability to distinguish legitimate journal articles from test ones planted with obvious nonsense.
Progress without guardrails is what gets us speeding headlong over a cliff.
I’m struggling to use any AI agents to be of help to me in any way with research I’m doing on politics in Bangladesh, where datasets are mostly private and hard to parse and the LLMs haven’t been trained on any of the issues I’m interested in. I guess the gap between the US and the rest will just continue to grow, but 100X faster
Lack of access to data is definitely a critical barrier. However one of the things I’ve been exploring is opportunities to build new tools to help people make sense of politics, and AI can be very useful in helping to make those tools even without access to data.