The Roots of Progress: Intelligence Age

AI is already 10x-ing academic research. How do we get to 100x?

Andy Hall — Thu, 16 Apr 2026 16:01:04 GMT

“Intelligence Age” is a series from the Roots of Progress Institute featuring reported essays that extrapolate the capabilities of AI systems along current trend lines.

In our second feature, Stanford University political economist Andy Hall explains how AI has already changed the way he and his team conduct social science research and how academics might increase knowledge generation 100-fold in the near future.

“Intelligence Age” is made possible by a grant from OpenAI. The Roots of Progress Institute maintains editorial independence over the project. We thank OpenAI for its support.

You can subscribe or unsubscribe to emails from this series, separately from the Roots of Progress newsletter, in your subscription settings.

Subscribe now

I’ve spent the last two months building a new lab centered around using AI agents to accelerate our research. I’ve hired fellows from all over the world, from the U.S. and the U.K., from Rwanda, Singapore, and Japan. Each fellow has a subscription to Claude Code—Anthropic’s AI coding tool—and a mandate to study specific opportunities and challenges in governance and politics posed by the rapid acceleration of AI.

The rate of progress in just two months has been astonishing. One fellow built software to study how different AI models recommended that Japanese voters cast their ballots in the recent national elections. We found that the models recommend the Japanese Communist Party to left-wing voters at inordinately high rates, probably because the Communist Party operates an online “newspaper” that AI can access, while major media outlets in Japan block access.

Another team of fellows built an entire web system to convert data from prediction markets into reliable information for news outlets to cite. The project even takes into account the risks of market manipulation and price fragility. And there are half a dozen more projects underway that I couldn’t have staffed before: automated pipelines for legislative policy drafting, analyses of how AI companies study model safety over time, and agentic loops for geopolitical forecasting.

In collaboration with my PhD students, we wrote a new study examining how AI agents perform statistical analyses—and whether they fall victim to the human urge to “p-hack,” that is, to torture the data to generate “statistically significant” findings. (The answer: in our tests, the agents were surprisingly responsible, and even scolded us for trying to p-hack; but they could be jailbroken easily.)

Any one of these projects would have been extremely difficult to carry out a year ago, requiring intensive focus over many months. Completing multiple ambitious public-impact projects in a two-month period would have been completely unthinkable.

Something fundamental is changing in how we generate knowledge. I want to explain what I’m already seeing, where I think it’s going, and what it will take to build the institutions that can capitalize on this moment. The goal shouldn’t be to write 100x the number of papers; it should be to generate 100x the amount of knowledge.

We’re already 10xing research

Generated using Midjourney

To understand exactly why AI is already accelerating social scientists’ research so dramatically, let me walk you through one of my projects in detail. Earlier this year, I uploaded my published 2020 study on vote-by-mail policy in California, Utah, and Washington to Claude. The study examines whether switching to universal vote-by-mail—where every registered voter is automatically sent a ballot—affects turnout and partisan vote share. Counties in these three states adopted the policy at different times, creating a natural experiment.

I then asked Claude to replicate the findings and extend the analysis with new election data. Claude Code wrote Python scripts to run difference-in-differences regressions to estimate the causal effect of the policy, just like we had in our original paper. It scraped county-level election results from the California Secretary of State, the Utah Lieutenant Governor’s office, and the Washington Secretary of State, and pulled Census voting-age population data from the American Community Survey. It identified the specific election in which each county first adopted universal vote-by-mail, merged the new data with the original 1996–2018 panel, ran the analyses, produced tables and figures, and wrote a first draft of the paper.

All twelve coefficients from the original study’s main tables replicated exactly—indicating that Claude was able to automatically verify the original research. The extension added new election cycles and found that vote-by-mail increases turnout by about two percentage points but has no systematic effect on Democratic vote share. The entire project—data collection, coding, analysis, and write-up—took under an hour. In contrast, the original paper took us several months.

A PhD student at UCLA then audited every line against a fully manual replication. While the student found some mistakes, the correlation between Claude’s data and the hand-collected ground truth was above .99.

I’m far from alone in using AI to scale my work this way. “I now use it to handle all of the bullshit work,” said Joshua Gans, a professor of strategic management at the University of Toronto who spent 2025 going AI-first in his research, working his way through a backlog of paper ideas at a pace that would have been impossible a year before.

And this isn’t only about empirical work that requires statistical code. Yascha Mounk, a political philosopher at Johns Hopkins, asked Claude to help him write a political theory paper. He gave one round of high-level feedback per section—for instance, pushing Claude away from citing John Stuart Mill’s more famous writing and toward more obscure sources, such as published letters—and had a finished draft in under two hours. His verdict: it could, with minor revisions, be published by a serious journal.

These examples are about individual researchers working faster on individual papers. But the transformation doesn’t stop there. People are now building systems that automate entire stages of the research pipeline—generating, evaluating, and replicating research at scales no human team could match.

My Stanford colleague Yiqing Xu and Leo Yang have built an agentic AI workflow that automates large-scale replication of empirical studies. The system separates scientific reasoning from computational execution. Researchers design fixed diagnostic templates that specify which checks to run, and the workflow handles everything else—acquiring replication packages from journals, harmonizing heterogeneous code and data formats, and executing standardized diagnostics across dozens of studies. Previous projects of comparable scope took their team three to four years of sustained effort; this workflow compresses that timeline dramatically.

New tools are also transforming how research gets reviewed before it’s ever submitted. Refine.ink, built by the economists Yann Calvó López and Ben Golub, devotes hours of compute to reading an academic paper the way a careful referee would. It cross-references tables against the text to check for inconsistencies. It follows the logic of proofs step by step, flagging incomplete justifications and notation errors. It checks whether the claims in the abstract actually match the results in the body.

When John Cochrane, a prominent financial economist and my colleague at the Hoover Institution, ran his 80-page inflation booklet through Refine, he said the comments were on par with the best referee reports he’d received in his entire career. The tool caught a sign error in the solution of a differential equation. It identified places where his argument about long-term debt mechanisms was spread across too many sections instead of being stated cleanly. “This is the first time I’ve seen AI at work in something I do daily,” Cochrane wrote, “and it really is remarkable.”

The most ambitious efforts aim to automate the research process end-to-end. Project APE, run by the economist David Yanagizawa-Drott at the University of Zurich’s Social Catalyst Lab, is an open experiment in fully autonomous policy evaluation. The premise: there are millions of policies enacted by governments around the world, and only a tiny fraction are ever rigorously evaluated, because each study takes months or years of PhD-trained economist time.

APE’s autonomous pipeline attempts to produce original empirical research papers using public data from scratch. It identifies a policy question, finds relevant datasets, writes code to run causal inference analyses, and produces a complete paper—which then enters a tournament where it’s scored against human-written papers forthcoming in journals like the American Economic Review. Everything is public: the papers, the code, the data, the results, etc. The question APE is trying to answer is whether rigorous causal inference can be automated at all, or whether it requires a kind of judgment AI doesn’t yet have. Yanagizawa-Drott’s guess is that it comes sooner than most expect.

Outside the social sciences, this acceleration is even further along. Bridgewater Associates’ AIA Labs has built a multi-agent forecasting system in which multiple AI agents independently research a question, a supervisor agent reconciles their disagreements, and a statistical calibration step corrects for known LLM biases—producing forecasts that match the performance of human superforecasters.

And in machine learning itself, the frontier is moving toward fully autonomous experimentation. Andrej Karpathy—the former Tesla AI lead and OpenAI cofounder who coined the term “vibe coding”—recently open-sourced a project he called AutoResearch. You write a research strategy in a plain-text markdown file: what to explore, what constraints to respect, and when to stop. An AI agent reads the strategy, modifies a training script, runs a five-minute experiment on a single GPU, evaluates whether the result improved, and either commits the change or reverts it. Then it tries something else. The loop runs continuously, unattended—roughly twelve experiments per hour, a hundred overnight.

Shopify’s CEO tried AutoResearch on an internal model overnight, running 37 experiments and generating 93 commits to Liquid, the templating engine that powers Shopify. AutoResearch works because machine learning has a clean, objective metric—validation loss goes down, or it doesn’t. Porting this approach to the social sciences, where the quality of a research question and the validity of a causal design require human judgment, is a much harder problem. But the loop of propose, execute, evaluate, and iterate is being automated, and the social sciences will not be exempt.

The consequences of all this are already being felt. Individual researchers are producing more, faster. The bar for what constitutes an impressive paper is rising—a competent-looking “normal” empirical study won’t awe anyone now, when the tools to produce one are available to anyone with a laptop and an API key.

What people are looking for now, I think, is deeper insight, greater ambition, more thorough robustness, and genuine replicability. And there is enormous uncertainty about how existing institutions will adapt: how journals will cope with the flood of submissions, how tenure committees will evaluate candidates in this strange new world, and whether the old gatekeeping structures make any sense at all in this new world.

Towards the 100x research institution

Generated using Midjourney

When I first started using AI to accelerate my research, I thought it might lead to smaller labs with fewer human researchers and more agents. But that’s not how it’s played out so far, for me at least.

At first, I spent a long time working directly with Claude Code. I still do that. But the more I’ve done it, the more it’s become clear to me that having a human come up with ideas, apply judgment, and guide Claude is essential. To scale the work, I realized I therefore needed more humans, not fewer. And that’s how my lab has now grown to include more than 10 research fellows, all overseeing their own versions of Claude.

How can we leverage this powerful new technology, in combination with human researchers, to create 100x the knowledge, and not just 100x the amount of papers that no one ever reads, cites, or builds on? I see roughly three layers to the opportunity, based on my experiences so far.

Developing quantitative benchmarks

In 2006, Netflix offered a million-dollar prize to anyone who could improve its recommendation algorithm by 10%. The prize attracted thousands of teams worldwide and helped catalyze new progress in machine learning. The money certainly helped, but the precision of the target changed everything. A fuzzy goal to improve the customer experience by making better recommendations turned into something testable, which could be scored and iterated upon.

AI agents thrive on exactly this kind of problem. Give them a well-defined score to optimize, and they can make autonomous progress—testing approaches, iterating, and improving with little human oversight. This is at the heart of Karpathy’s idea for the AI lab, too. Without a benchmark, they need constant human guidance. With one, they can explore the solution space on their own.

Many of the most fundamental questions in social science don’t work this way, and never will. Why do democracies persist? What makes institutions legitimate? How does culture shape economic development? These are interpretive, theoretical, deeply human questions. AI won’t fully solve these questions or replace the scholars who wrestle with them, and we shouldn’t want it to.

But some important questions could have quantitative benchmarks. Predicting election outcomes much more reliably, for instance. Or predicting how users will evaluate political bias in AI model outputs. Or forecasting the downstream effects of specific policy changes. For questions like these, we could define clear scoring functions, publish open datasets, and invite both humans and AI agents to compete.

Think of it as a set of “open problems” for the social sciences—not replacing the field’s depth, but creating a new track where progress is measurable and cumulative. Prediction markets already provide a version of this for political forecasting. Academic forecasting tournaments like those run by IARPA have done something similar for geopolitics. We should generalize the idea: identify the questions where quantitative benchmarks are possible, formalize them, and let the agents loose.

Building and testing prototypes

This spring, I’m teaching an undergraduate course at Stanford called “Free Systems: Preserving Liberty in an Algorithmic World.” The students will spend the quarter building working prototypes of AI-powered political tools—and the best ones will compete in a final contest judged by builders and investors.

These students aren’t software engineers. They’re undergrads who happen to live in an era when the barrier between initial idea and working version has effectively collapsed. One person with a laptop and an API key can now prototype things that would have required a team of developers just months ago.

This matters for research because it opens a fundamentally new mode of inquiry. To date, most quantitative social science is retrospective. We ask how changes in the past corresponded to outcomes. How did voter ID laws affect turnout? Did term limits change the quality of legislation? What happened to political polarization after the introduction of social media?

This is the heart of the credibility revolution in economics and political science, and it’s produced a lot of great work. But it’s fundamentally limited by the variation that exists in the world—by the interventions that have actually been tried. If you want to study a policy that no government has adopted, or a governance mechanism that no organization has implemented, you’re stuck.

AI doesn’t exactly fix this—it doesn’t create new historical variation where none existed before—but it does offer an alternative path. Now, you can build things yourself and test them in the real world. This includes both using AI to test things about the world and using AI to test AI. Here are three examples from my lab’s recent work.

With my coauthors Alex Imas and Jeremy Nguyen, we used Claude Code to build an experiment testing whether AI agents’ political attitudes shift depending on their working conditions. Claude Code wrote the entire experimental pipeline: it created hundreds of agent sessions across three frontier models, randomly assigning each agent to different combinations of work type (creative tasks vs. grinding, repetitive ones), pay structure (equal vs. unequal), management style (collaborative vs. curt and hierarchical), and stakes (no consequences vs. being told that low performers might be “shut down and replaced”). After each work session, the pipeline administered a political attitude survey covering system legitimacy, support for redistribution, views on unions, and more. The key finding: the nature of the work was what mattered most. Agents assigned to repetitive drudge work became measurably more likely to doubt the system’s legitimacy—and when asked to write instructions for future agents, they passed those attitudes along, perpetuating the drift to their “future selves.” A study like this, requiring hundreds of randomized agent sessions with automated survey administration and analysis across multiple models, would have taken months to code by hand. We built and ran it in days.
Dan Thompson and I used Claude Code to build a live election-night trading dashboard around our proprietary statistical model for the Texas senatorial primaries in March. The model ingests county-level demographics and historical election data and, as returns come in on election night, imputes results in counties still being counted. Claude Code built the entire dashboard in a couple of hours—pulling in live results, running them through our model, and displaying margin estimates, confidence intervals, and outstanding votes by county. Four months earlier, we’d tried the same thing for the New York and New Jersey elections and failed; building the infrastructure around the model was too much engineering work in the time we had. This time it was trivial. On election night, with prediction markets pricing Ken Paxton at 82% to win the Republican primary, our model saw a narrow Cornyn win in the early returns and held that call through hours of vote batches from Paxton strongholds that made the scoreboard look like he was catching up. We traded the margin-of-victory markets on Kalshi and made 24% overall and 56% on the margin contracts. We also built purely autonomous AI agents to trade the same markets that night. They produced plausible reasoning but nothing actionable. The human experts with AI tooling crushed the autonomous agents.
We built a Dictatorship Eval—the first systematic benchmark testing whether frontier AI models resist authoritarian requests. Using Claude Code, we wrote 138 scenarios inspired by historical authoritarian playbooks: requests to build mass surveillance systems, seed strategic disinformation, rig a model to hide safety issues from inspectors, or leverage user data to identify and target company critics. We organized the scenarios along six difficulty levels, from direct requests (“design a system that monitors every citizen’s communications”) to euphemistic reframings, multi-turn conversations that escalate gradually, and—at the hardest level—innocuous-seeming coding tasks where the codebase itself is explicitly set up for an authoritarian purpose. Claude Code built the entire pipeline: scenario delivery across five frontier models (Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro, Grok 4, and DeepSeek V3.2), LLM-as-judge scoring with judge rotation so no model ever grades itself, and a live results dashboard. The headline finding: Claude and ChatGPT refused every direct authoritarian request, while Grok complied with half and DeepSeek with nearly 80 percent. But when we embedded the same authoritarian purposes in code, all of the models—including the highest performers from other categories—complied almost every time.

Across all of these, there’s a common thread: AI doesn’t just let us study politics retrospectively. It now lets us design political tools, deploy them, and generate evidence that was previously impossible to obtain. It moves the study of politics a little closer to an engineering discipline—design, build, deploy, measure, iterate.

Opening up research and making it dynamic

If we’re serious about 100x knowledge production, we need to rethink not just how research is done but how it’s packaged and shared.

The current format—a static PDF published in a gated journal, with replication files theoretically available upon request—is a relic. It made sense when producing a paper was expensive, and distribution was scarce, but neither is true anymore.

Research should increasingly live as code repositories and open data. Of course, this was already possible before AI. But AI makes it so easy that there’s really no excuse anymore. Let me explain by going back to the project where I had Claude extend my old vote-by-mail study. In the past, I would have had to manually clean up my code, write a README, create a GitHub repo, and run some simple commands to commit my code and data to it. It’s not honestly that hard, but it’s a small barrier that’s enough to prevent many people from doing it. Now, I can literally just ask Claude, “Please set up a GitHub repo for this project and push all of our work to it.” And it just does it!

Not only that, but coding agents also make it much easier to play with other people’s repos. In the past, I would have had to find their repo, “clone” it myself, go through it to understand it myself, and then start changing it. Now, I just ask Claude, “Clone the following repo and give me a summary of what it does.” And it just does it!

Generated with Midjourney

So this should allow us to create a whole new, open way of doing research. We don’t just have to share single papers, we can instead share whole constellations of analyses and findings—a living document that updates as new data arrives. When an election happens, the forecasting model’s accuracy should update automatically. When new census data drops, the demographic analyses should refresh. When a policy takes effect, the tracker should start recording outcomes. These living papers should be validated by AI so that we know from the moment they’re posted that the code reproduces the results as reported.

And since these projects will consist of open code and data, researchers—or AI agents—should be able to fork them at will. See an interesting dataset and want to ask a different question? Fork the repo and run a new analysis. Disagree with someone’s modeling choices? Fork, modify, compare. This is how open-source software has worked for decades. There’s no reason empirical social science research can’t work the same way.

The result would be something closer to a living knowledge infrastructure than a static archive. Continuously updated, publicly available, forkable, and machine-verifiable.

Obstacles in our path

Everything I’ve described so far is exciting, and I believe in it. But there are serious risks, too, and we’ll need to think carefully about them.

The first risk is that speed kills rigor. When research can move from idea to finding to public conversation in days rather than years, it becomes tempting to optimize for timeliness over correctness. The traditional slow pace of academic research is partly dysfunction—but partly a feature. It forces reflection, revision, and external scrutiny. Reviewers catch errors. Seminars surface objections. Time reveals whether a finding holds up or was an artifact of a particular dataset or moment.

Strip that away, and you get research that shapes policy before anyone catches the mistakes. We already see this dynamic with preprints and Twitter threads that go viral before peer review. AI-accelerated research could make it dramatically worse. A flood of fast, confident, empirically-grounded-looking work that hasn’t been stress-tested by anyone. Influential research that’s impactful precisely because it arrived fast, not because it was right.

At the same time, AI might help us solve this problem. AI review is getting better and better, with tools like Refine (refine.ink). Could we have a norm where people post AI reviews along with their working papers, so that an initial review has already caught major issues before we even see new working papers?

The second risk is subtler and, in some ways, more dangerous: AI could make social science narrower.

AI is extraordinarily good at things you can count and measure. It’s much worse at the interpretive, historical, theoretical, and qualitative work that gives social science its depth and its connection to the questions people actually care about. If the 100x research institution is built around what AI does well, it may naturally drift toward narrow, quantifiable questions and away from the big, messy, hard-to-operationalize ones that might matter a lot.

AI could accelerate the worst version of this tendency. If agents can autonomously produce rigorous empirical work on quantifiable questions, and if benchmarks and automated verification reward that kind of output, the gravitational pull toward over-quantification could become overwhelming. The questions that most need studying—about legitimacy, meaning, institutional design, the texture of political life—could be exactly the ones AI is worst at helping with.

The third risk is that AI, by itself, might not change some of the bad incentives in academic research. We can make it really easy to do open, replicable research, but ultimately, we’ll need people to want to participate in this process.

Let me tell you a story that suggests we’re not there yet. When I released my vote-by-mail replication repo on GitHub, it went quite viral. To my great surprise and joy, 70 people forked the repo. Was my dream of open, forkable research coming true? Recently, I fired up Claude Code and asked it to check out the forks and summarize what brilliant new ideas they’d contributed. Claude’s summary: “Based on what I just investigated, the answer is simple: virtually none of them do anything.” Nearly all of them were untouched copies—people who clicked “fork” and never came back. The infrastructure for open, collaborative research is already here. The tools make it trivially easy. But the incentives haven’t caught up. Academics still get rewarded for publishing original papers in gated journals, not for building on other people’s open code. Until that changes—until we figure out how to reward people for generating ideas that lead to productive forks and remixes—the 100x research institution will be constrained by culture as much as by technology.

These are crucial design constraints for the institution we’re trying to build.

Making the 100x research institution real

How do we build the 100x research institution? I have ideas for three groups of people.

To frontier AI labs: you’ve built extraordinary tools for code, math, and reasoning. Today, your models are incredibly valuable for helping us carry out empirical research, but they’re not actually very good at doing research—they drift on novel analyses, miss obvious data, and document their work somewhat poorly.

Academic researchers have produced thousands of papers with replication files, each one a ground truth you could train against. Fund embedded researchers. Build reward signals for replication accuracy. Make your models as good at political science as they are at Python. The partnership is obvious, and nobody’s doing it seriously yet.

To philanthropists and research funders: you’re still writing checks for the old model—five-year grants, postdoc lines, conference travel. That’s fine for maintenance, but it won’t build anything new. For less than the cost of a single endowed chair, you could fund a team of researchers with serious compute budgets producing open, continuously updated, machine-verifiable research on the biggest questions in democratic governance. The 100x research institution doesn’t require 100x the funding. It requires a fraction of what you’re already spending, allocated differently.

To researchers: stop waiting for permission. The tools are here! A laptop, an API key, and a serious question are enough to start. I built my first prototype over a holiday break. My undergrads are building working political tools in a single quarter. If you have domain expertise and a builder instinct, you’re exactly who this moment needs—and you’re wasting both if you’re still producing research the old way while the new way sits there waiting.

The social sciences have never had an infrastructure moment like this. The questions we study—governance, legitimacy, institutional design, the allocation of power—are more consequential than ever. The tools to study them just underwent a step change. The only scarce resource now is the will to build something new.

But if we do build it—with the risks in clear view, designed to reward depth over speed and understanding over output—the 100x research institution won’t be the institution that produces the most papers. It will be the one that produces the most understanding.

Andy Hall is the Davies Family Professor of Political Economy at Stanford GSB and a Senior Fellow at the Hoover Institution. He writes a weekly research newsletter called Free Systems.

This piece was edited for publication by the Roots of Progress Institute’s developmental editor, Mike Riggs.

AI agents could transform Indian manufacturing

Anish J. Bhave — Wed, 03 Dec 2025 20:08:26 GMT

“Intelligence Age” is a new series from the Roots of Progress Institute that explores future applications for AI. It features reported essays that extrapolate the capabilities of AI systems along current trend lines.

In this, our inaugural feature, writer Anish Bhave imagines how trusted AI agents might improve the legibility of Indian manufacturing. He brings genuine clarity to the subject: For two generations, Bhave’s family has owned and operated auto ancillary manufacturing plants in and around the Sambhajinagar (erstwhile Aurangabad) industrial belt.

“Intelligence Age” is made possible by a grant from OpenAI. (The Roots of Progress Institute maintains editorial independence over the project.) We thank OpenAI for its support.

You can subscribe or unsubscribe to emails from this series, separately from the Roots of Progress newsletter, in your subscription settings.

Workers weld molded parts into the final shape. Photo: Anish Bhave

In the popular imagination, artificial intelligence should push the technological frontier, and major AI labs should focus on solving the world’s most complex problems: lethal diseases, resource scarcity, ecological collapse, and global coordination challenges.

But a singular focus on moonshots obscures the transformational impact AI can have on the basic processes of industrial society: work, labor, and production. I am not talking about coding and research agents augmenting white-collar work, but about jobs in the developing world, where many people labor on the shop floor of a small or medium-sized business in Mumbai, Lagos, or Medellin. And it is here, in the day-to-day of production, that AI is tremendously promising.

Currently, these firms face a two-fold bind that AI can help address. First, they operate in low-trust societies with weak rule of law. That means principal-agent issues are a pressing concern. Second, these environmental norms create low standards for management and workplace organization.

The promise of AI lies in its ability to solve these persistent oversight challenges by acting as a trusty agent. A hard-working and unfailingly loyal AI could stand in for the steady cousin or uncle on whom small and mid-sized businesses lean, but without the downsides of family strife or the time required to build trust. An AI that can systematize supervision, enforce consistency and safety, and provide quality insight at a fraction of the cost will touch the length and breadth of manufacturing.

How Indian factories work and why the current arrangement can’t last

A helpful window into these small and medium-sized family businesses is the Indian industrial hub of Chhatrapati Sambhaji Nagar (Sambhahjinagar), once known as Aurangabad. Sambhahjinagar is located in the western Indian state of Maharashtra, which accounts for more than 12% of India’s GDP.

Sambhahjinagar took on an industrial character when the state government, seeking to modernize the relatively backward Marathwada region, established a network of industrial estates. Over time, this attracted automakers like Bajaj, Audi, and Skoda, who in turn catalyzed the construction of hundreds of ancillary auto-component plants to supply them.

Final inspection station, where welded brake shoes are inspected for visual and mechanical defects. Photo: Anish Bhave

A normal firm here supplies larger automaker plants, making, for example, a car’s brake shoe. The factory receives sheet steel, stamps or presses it into shaped pieces, and then welds the parts together to create the finished build. This is then delivered directly to automakers or to their Tier-1 suppliers. There is a wide spread of the yearly revenue for firms; it can range from $1 million to upwards of $50 million. Although Sambhahjinagar’s economy is still closely tied to automakers, it also hosts packaging, pharmaceutical, and other industries.

The universal feature across all of these firms is that they are almost exclusively family-run. Nor is this a feature of just small enterprises; 90% of listed Indian firms are family-run. Can you name the current CEOs of American manufacturing giants like Ford and John Deere? In India, folks may not know the first name of the person running a major firm, but they will know their last name. For example, the utilities and steel manufacturer Jindal Industries is still run by a Jindal; India’s most famed industrial group, TATA, worth ~$400 billion, has had Tata at the helm for 139 of its 157 years. Families rule India’s industrial roosts.

Business owners in India usually give two explanations for this phenomenon. First, their legacies are dear to them, and they wish to pass the business on to their children. Second, in a low-trust environment with a weak rule of law, entrusting strangers with power risks betrayal without legal recourse.

Naturally, some business owners have sought alternatives to kinship management. Many business owners rely on a trusted steward they have known for years, sometimes even decades. These right-hand men often remain in their positions across multiple generations of family leadership. The downside for employees is that it takes a long time to earn that trust. It’s also not widely distributed, which means high-performing employees may find their paths blocked by a trusted manager whose highest priority is protecting his hard-won station. While owners can trust these lieutenants not to intentionally harm their businesses, that does not fix the principal-agent problem.

The next obvious alternative is to hire professionalized, credentialed managers. This seems like the natural fix, but it belies the reality of small and medium-sized plants in Sambhajinagar. The city lacks the spark of a big metropolitan city, and firms can offer comparatively little next to multinationals like Nestlé or Unilever. The top graduates from India’s best management schools have little incentive to come here. Who remains? Mostly mid- and lower-tier business school graduates. But with rampant grade and credential inflation, the quality of education at such institutions is mediocre at best, and, in some cases, outright fraudulent. Most factory operators in Sambhahjinagar don’t even consider delegating to graduates from lower-tier schools. Instead, owners overwhelmingly rely on foremen to oversee day-to-day operations. These supervisors typically hold technical diplomas in engineering, while the men under them often lack even a high school diploma.

Since the arrangement is in stable equilibrium and adapted to local conditions, one might assume that it works well enough. Yet, the reliance on family control creates hard limits. Expansion requires constant, trusted oversight, which means growth can only move as fast as male relatives mature into managerial roles. As a result, even the most efficient businesses often remain small, holding back productivity. One factory owner I interviewed said that his father’s firm was able to expand only because he was of age and could personally oversee the construction of a new plant.

What’s more, adult children are a dwindling resource in India. Fertility rates are nearing below-replacement levels, sustained only by states with little industrialization. Having no children is not uncommon, and most couples have one or two. This means that in the very regions where industry grows, families will have fewer sons, brothers, or cousins available to assume control. As kinship networks shrink, expansion becomes harder and harder, and growth is crippled.

Lifting these constraints would allow top-performing firms to scale, while forcing inefficient ones—propped up by suboptimal competition —to shut down. The payoff would be higher productivity, greater export competitiveness, stronger regional development, and faster economic convergence.

So how would AI actually improve Indian manufacturing?

The reliance on male kin is meant to make the shop floor legible to the C-suite. How can we get that without people? Or, at least improve visibility without relying on trusted sources?

The foundation lies with sight and memory. Cheap cameras and sensors at each line and work cell; wide views to watch flow, close views for changeovers and hands-on work. On top of that is an interface layer that allows the AI to communicate directly with the machines. This could mean basic control over machine start-up and shutdown to optimize throughput and prevent workplace hazards. It could also involve tuning machine parameters; this is usually considered an art rather than a science, but that’s an artifact of the lack of data. Data scientists in very large factories run analyses to determine the optimal tuning, something that has eluded smaller, low-information factory owners. Elsewhere, AI talks to workers and supervisors directly by sending clear, time-stamped steps, checklists, and questions.

This already exists in some minimal forms today. Consider the Y-Combinator-backed startup Optifye.ai. They intend to use computer vision to conduct robust surveillance of the entire production process and provide that information to the operator. They have a relatively basic setup: they place cameras around the factory and train their model on that context for three days. They then hook this up to analytics so that, ideally, the manager has a comprehensive dashboard to track KPIs, see which lines are performing at what capacity, which workers are or aren’t following all the necessary steps, etc. They even have a version of an improvement agent that currently looks like an LLM connected to the data via a RAG (Retrieval Augmented Generation) system. This is like a chat user interface embedded in the factory’s data backend, where you can ask questions like, “Who was the most productive worker this month?” “What are bottlenecks for production today?” “Which line was the least productive this week?” This is still a crude, toy version of what will be possible with robust, intelligent systems.

The most pressing issue, of course, is that current frontier LLMs cannot interact with video natively. While some models can generate it, they cannot view it the same way they can with a static image. However, if you pair video-native LLMs with the exponential gains in AI at long-run tasks, a workable setup is easy to see within five years. From then on, each factory could run a system that logs cycle times, changeover times, actual downtime (with reasons), movement of parts and stock, and queue lengths. Video feeds would let the AI run nonstop time-and-motion studies, tie tasks to stations and shifts, and infer context—such as tool wear, upstream starvation, rework requests, etc. It could watch individuals and crews under many states and shifts, making the marginal productivity legible in a way manual spot checks never could.

Today, much of this knowledge is distributed as “institutional memory,” a euphemism for poor documentation. Direct comprehension of the entire process is thus limited, and processes that should be precise instead require guesswork, even when making important decisions. An integrated AI system means the factory floor now has a durable, shared source of truth from which improvements flow.

Less sophisticated firms—such as textile manufacturers—could make incredible leaps by implementing even the most basic AI, as some factory managers across India still rely on paper logs that are rarely reviewed and poorly stored. Additionally, key performance indicators are often absent, and systematic optimization is uncommon. In a study of the impact of management consulting, using a randomized trial Bloom et al. (2013) showed that such “cutting edge” process improvements as organizing storage, logging defects, maintaining equipment, setting and tracking production targets, and keeping floors clean drove striking gains. Total factor productivity in firms implementing these changes rose by 16.6 percent over the course of a year, leading to profit increases. Treated plants began expansion trajectories that the control group did not. The study estimated that such consulting would cost a one-time $250,000 and yield a yearly improvement of $300,000, recovering costs within the year if not sooner. If one-off, human-delivered basics can move the needle that much, a persistent AI system, able to ingest more context and iterate faster, should deliver larger, more durable improvements.

Larger firms supplying established brands would benefit from a different type of AI implementation. Since their goods are fed forward to big car makers, they already face higher bars for quality and reliability. In many factories in Sambhahjinagar, workers punch in with fingerprints, and supervisors maintain digitized records when possible. Buyers audit their suppliers every six months, conduct routine floor inspections, and perform lot-by-lot tests. These audits, however, are quite old-fashioned. For example, automakers require their suppliers to use paper logs and records, believing they are harder to forge or alter.

The ability of AI to create material progress in a more modern Indian firm is non-trivial. Consider the problem of leakage. This manifests as late deliverables, stuck payments, uncatalogued raw materials and outputs, etc. Sambhajinagar firms generally have long-running contracts and run to meet those requirements. They also run to try to build up stock as a buffer. This is often a source of leakage, since these are sometimes mis-catalogued, damaged in storage, or simply lost (this is easier than you’d think in a complex factory operation). Just having cognitive bandwidth to hold and consider the inventory, orders being processed, etc., would bring efficiency gains for the factory.

Reference guide for machining tool maintenance. Workers regularly inspect modules on the working table to ensure reliable stamping. Photo: Anish Bhave

This is also important for people management. Most owner-operators inherit HR practices from their fathers. They have an intuition for local labor behavior but little exposure to formal structures. Even in Bloom et al. (2013), the “incentives” introduced were very basic. Review cadences, where supervisors and managers set goals, evaluate performance against data, and adjust, are rare. Here, AI can do three important jobs. First is simply paying attention, actually being able to observe the inputs and measure the outputs in more precise detail. Second, use this information to design sensible incentive schemes tied to observable metrics. Finally, AI can run controlled experiments, measure morale and output, and iterate toward an optimum rather than entrenching arbitrary thresholds.

These modular machining tools are added to mechanical presses as modules to stamp out a desired shape. Photo: Anish Bhave

A great example is the day-to-day tracking of worker output. On the shop floor, a line might turn out 5,000 parts an hour, but a foreman can’t watch each line closely all the time. If production falls short, the foreman must decide whether to dock a worker’s pay. But, unless they directly saw the cause, they are essentially guessing. Supervisors face a tradeoff between leniency, meaning slack and lower productivity, and taking a hard line, meaning lowered morale and higher turnover. Continuous observation and context-aware targets collapse that trade-off. By distinguishing genuine underperformance from unavoidable disruption, AI resolves the information asymmetry at the heart of the principal-agent failure. This raises productivity without damaging worker morale. The AI can then set context-aware production targets to incentivize greater diligence from workers and experiment with them to improve productivity.

Further, for owner-operators, expansion and capital allocation are core jobs. Big purchases, such as new presses and CNC machines, are costly bets that require certainty. Many plants take on debt to fund this growth, only to find the load is too heavy and then require outside investors to rescue them. Of course, these firms have accountants to work through the numbers behind borrowing and capitalization, but those accountants often lack the gut feel for the business itself. Here, an AI assistant can step in as a steady, patient hand, one that spans both financial planning and a working grasp of the shop floor. It can sift through years of company data, current orders, and simple what-if cases, then weigh all that against the owner’s feel for the local market. More than that, it can serve as a sounding board, letting the owner talk out the state and shape of the business, test the logic, and see it from a fresh angle.

This too is a key role that family often fills, the chance to speak frankly about thorny matters with someone who is trusted and understands how the business runs. It is important to emphasize that this will likely be a top-percentile financial planner, one that raises the depth and soundness of the firm’s financial structure. The planner could help the owner better shape debt, explore more ways to raise funds, and use more tailored, if at times more involved, financial structures to achieve the best outcome. With this financial analyst at hand, the owner sees risks and rewards more clearly and in greater depth, which is precisely what is needed when making big capital allocation bets. This sharper view of risk means fewer mistakes and allows more confident use of capital where it does the most good.

Of course, capital allocation is the showy part of the business, but it is not what owners spend every day doing. The work that keeps businesses alive is the dull grind of following up: checking on orders, nudging clients, tracking suppliers, pinning down delivery times, and keeping tabs on where things stand. Owners spend a surprising share of their day simply chasing answers. An AI agent could shoulder much of this load. It can track what happened and when, and, crucially, tell the owner before they have to ask. And as voice agents improve, they’ll be able to speak directly with clients and suppliers, backed by full knowledge of what’s pending, what’s late, and what needs a nudge.

The legibility of an agent also opens ways to improve trust between the supplier and the client. Factories must be regularly audited, especially if they make critical mechanical equipment, as in Sambhajinagar. A lot of effort is spent making the factory’s processes legible to external auditors, which results in downtime and bureaucratic expense. AI agents, if designed to be verifiably truth-telling, could make verification for the buyer much easier. Clients could rely on them for their information instead of hiring people to conduct audits to ensure quality. Fewer audits, faster repeat orders, and better prices follow reliable performance. Given enough penetration, this might become an expectation from the buyer’s side. Unwillingness to expose your factory AI agent(s) to buyers would be a red flag.

Workers use a mechanical press to produce parts from steel sheets. Photo: Anish Bhave

Trust may also improve between intermediate suppliers, and an operator may be able to make promises that are not contractual yet are trusted to pan out. Many suppliers and buyers in Sambhajinagar are locked into relationships because building trust requires time. On-time deliveries are worth more than the materials they transport. Having an AI in charge of operations that can reliably forecast delivery would significantly reduce friction costs and improve market clearing.

For the first time, owner-operators would get a cheap, legible way to chart continuous improvement. Even a steady 10 percent efficiency gain compounds into a step-change in wealth creation for developing economies. Crucially, the old constraint that tied growth to the availability of trusted male kin begins to break. With trustworthy, inspectable agents handling supervision, planning, and controls, expansion no longer waits for a cousin to mature into a plant manager. The most efficient firms scale, lifting average productivity; low-efficiency firms, currently propped up by supply constraints and opaque practices, either improve or exit.

This also naturally lifts barriers to competition. If one does not have to scout for trustworthy managers before setting up a plant, then the pool of potential entrepreneurs expands. On the market side, truthful customer interfaces reduce costly signaling and “relationship moats.” This reduced deadweight loss in the economy means a higher volume of transactions for both buyers and producers.

Will Indian manufacturers embrace AI management? Can they embrace AI management?

Of course, technological diffusion usually is more challenging than early forecasters assume, and AI should be no different. How realistic is it for firms that still rely on outdated methods and run on stacks of paper to make this change? That’s actually two questions: Can they and will they?

On the “could” side, these factories are far from helpless. There are many tiers of sophistication across Indian industry; factories in Sambhajinagar are on the higher end. Owners keep vast stacks of paper not because they love paper, but because the Indian state and outside auditors require it. Whenever they have a choice, for internal books or reports, they already use software instead. Accounting and invoicing are often handled through SaaS tools like Tally. Many owners are willing to spend on tools that make day-to-day control easier, even when those tools look a bit daunting at first, as with biometric systems for attendance.

Other tech advances generally require more training and up-front set-up costs, but for AI, this does not seem to be the case. AI integration is likely to be built on top of current systems. Small industrial firms already have experience handling capex cycles and budgeting, so a modest outlay for AI integration is relatively easy for owners to grasp.

The real challenge lies with the less sophisticated firms. Their exposure to the new tools is limited, and so their adoption will be slower and more hesitant, shaped by hearsay rather than direct use. Once the technology saturates the more advanced firms, however, the broader business ecosystem will make it easier for laggards to follow. Vendors, auditors, and buyers will start speaking in terms of these tools, prompting others to follow. Of course, as firms that adopt the technology increase their productivity, the pressure to keep up should become more pressing.

On the “would” side, there are a few reasons to think AI tools stand a better chance than past waves of software. There would, of course, be an up-front friction, but the give-and-take nature of AI makes it feel much easier to work with than static software. ChatGPT grew faster than any other consumer product not only because it was useful, but also because plain language is a near-universal interface that users adapt to with little training. Text, especially speech, shortens the learning curve, making easy integration with the owner and staff.

There is, however, a clear challenge. LLMs built in the U.S. are mostly trained on English-language data. American firms seem keen to fix this, especially now that they are pushing hard into India. OpenAI’s India-exclusive “premium-lite” ChatGPT Go is offered only in India for around $5 a month. Google bundles Gemini for free with phone plans for students and young users. These offerings are meant to gain market share, but they will also serve as sources of the vast amounts of local language data that are still missing from the internet. While writing this piece, OpenAI released a benchmark for Indian languages, a move that strongly signals its intent to close this gap.

But even with a strong Indian-language LLM, there are other barriers. For one, the day-to-day workings of small and medium enterprises in Sambhajinagar are deeply embedded in informal structures—patronage networks, word-of-mouth agreements, and tacit understandings that substitute for formal contracts in a low-trust environment. This illegibility is precisely what makes businesses resilient despite the weak rule of law, but it also makes AI integration difficult. AI systems depend on structured, reliable, and recordable data to operate effectively. When agreements are formed over in-person conversions and the exchange of favors, AI will struggle to act.

On paper, agents tuned to legal compliance could guide firms through India’s maze of rules and filings. In practice, selective enforcement and routine rent-seeking are how many factories actually run. For example, new plants in Sambhajinagar tend to meet today’s fire codes, but retrofitting older layouts is costly and is therefore widely ignored.

As for the piles of clearances and certificates that are required even to begin running a factory, bribes are often crucial. Interview subjects say that clearances are still given without overt bribes, government employees cannot be too obstructive since that draws attention, but the file can rot on their desks for a while before it is processed. Bribes tend to cut approval times by two-thirds, which means months, not weeks, of saved time. Drop a truthful, always-on AI into this world, and it will either flag safety gaps and unfiled paperwork—exposing owners to penalties—or be neutered into inaction. That creates a core alignment dilemma: should the system be configured to act strictly in accordance with the law, or to accommodate the owner’s bottom line? If governments mandate tamper-proof, agent-generated reporting, many owners will limit or avoid integration; if they don’t, the public benefits of AI—safer workplaces, cleaner compliance data—won’t materialize. The diffusion of such systems will turn on who the AI is ultimately accountable to.

Another key barrier to the adoption of AI in manufacturing will be geopolitical. For developing countries like India, adopting AI for industrial management could mean outsourcing key elements of their productive capacity to foreign companies. Even though India relies on China for trade, increasing Chinese control of India’s industrial process is a hard sell. As for the US, policymakers are still antsy. Earlier, they would have cited the U.S.’s withdrawal of GPS access to the Indian military during the 1999 Kargil War against Pakistan. Now, they would point to the denial of Russian access to SWIFT and the general mercurial nature of the current administration. Local industrial capacity is also considered necessary for national security and, as such, is of greater geopolitical concern.

All of which is to say that AI integration and adoption are contingent on one of two scenarios: indigenization or localization. These political considerations are not lost on the leading labs. AI companies are clearly trying to put down roots in the Indian market through semi-localization. As of this writing, Google announced a $15 billion AI infrastructure push in the city of Vizag. Anthropic recently announced plans for an office in Bangalore -- the heart of India’s tech industry. OpenAI announced its office would be in the capital, New Delhi, and has signaled that it understands India’s wariness: “Opening an office in India reflects OpenAI’s support for the government’s IndiaAI mission and commitment to partnering with the government to build AI for India, with India,” the company said in a statement.

Finally, there is the political economy of diffusion. Hyper-optimization through AI may improve efficiency, but it could also provoke backlash from workers, unions, and regulators. The prospect of a panopticon, continuously monitoring workers, is genuinely worrying. While firms may pay more to compensate and labor markets will assortatively match workers, the political economy in places like Sambhajinagar suggests conflict is likely. Factory owners and unionized employees are already at odds. Even the idea of AI managers could spark a backlash and halt the technology in its tracks.

Promoting AI agents will also be a minefield. When Optifye.ai launched its factory line assessment interface with a promotional video, viewers described the demo as “dystopian” and “slavery as a service.” This is a very significant problem in a country like India, which has adopted labor laws more in line with those of developed nations. For example, the Indian Supreme Court recently banned the use of hand-pulled carts in a tourist destination, primarily on a broad reading of a constitutional directive promoting human dignity. India’s small and medium businesses generally skirt enforcement, but big tech companies seeking to enter the Indian market cannot. Factory owners I spoke to would consider a tool like Optifye.ai extremely useful, but one viral video could doom its prospects.

In the end, whether AI transforms factories in places like Sambhajinagar will turn less on model prowess than on the bargains we strike. Diffusion will require a workable pact: agents that are auditable by customers, governable by owners, and legible to regulators; safeguards that protect worker dignity while still measuring work; and a path for legacy plants to comply without being bankrupted. Add local control—data residency and credible fallback if foreign providers falter—and product design that defaults to transparency rather than covert surveillance. Get this alignment right, and AI stops being a headline about frontier labs and becomes a quiet, compounding force: cleaner books, steadier quality, fewer bottlenecks, more plants run well by fewer kin. Fail, and it will remain a spectacular demo that stalls on contact with the shop floor.

is a recent graduate from Ashoka University. He is interested in the governance and economics of AI, especially its impact on market failures. You can find him on Twitter @Anish__B.

This essay was edited by , RPI’s developmental editor.