Free Will in the Age of AI

Crime and Punishment for Both Humans and AI

Oct 05, 2025

In 1848, there was a foreman for a railroad construction crew working in Vermont named Phineas Gage. He was described as well-balanced, responsible, and a smart businessman. One afternoon he suffered a horrific accident. An explosion sent a three-foot-long iron rod through his skull, destroying a large part of his brain’s frontal lobe. Miraculously, Gage survived, but the man everyone knew seemed to be gone. He could no longer control his actions and was often regarded as rude. This situation raises interesting philosophical questions about free will and identity.

This line of inquiry is especially relevant today when thinking about how we might handle AI in the future, given the ambiguity of where responsibility lies when a “black box” AI makes a poor choice. Doing so gives us tools to explore when we might punish machines themselves for their actions, rather than their creators. Instead of starting with the difficult free will questions it’s important to first explore whether it makes sense to punish a person or machine for their actions regardless of what we think of free will.

Justifications for Punishment

The important distinction to make is between punishing holding someone legally responsible and holding someone morally responsible. Instead of focusing on Gage, let’s use a thought experiment to make the point more potent.

Imagine a person who has never wronged anyone and believes in the utmost ethical good. One day, this person is struck in the head. The injury damages the part of their brain that is responsible for impulse control. Later, this once-kind person gets angry while driving and runs someone off the road, killing them. You might say that what happened isn’t their fault and they should not be held existentially responsible; however, it still makes sense to put them in prison.

Exploring this more closely involves looking at the different ways the legal system justifies punishment. One way is to claim a person is morally responsible for what they did and that punishment itself is justice; this is retribution. This is not the only reason to punish someone, as there are at least three others:

Deterrence: Making an act illegal deters people from committing the crime.
Public Safety: The idea that imprisoning someone prevents them from endangering the public.
Rehabilitation: The idea that forced punishment can be part of the path toward making a person no longer want to commit crimes.

Using these three justifications, we can shelve the idea of free will and moral responsibility and still find logical reasons for punishment regardless of free will.

AI, Punishment, and Free Will

We can imagine a future where these three ideas are applied to machines in the same way they are to humans. This concept is already implemented in subtle ways in the development of AI. You can essentially punish an AI already, and we constantly do so by altering the optimization function. An LLM could be thought of as getting “punished” when it produces a wrong output or hallucinates something nonsensical. This is because the company building the AI has incentives to prevent that from happening, so in the next training, the AI will not be rewarded for responses like that.

This acts as deterrence because fewer people will use an LLM that is frequently wrong. It serves public safety by preventing flawed AI from being widely released. And it is a form of rehabilitation because the data is used to build better AI in the future.

This could be developed further and formalized within our existing legal system. The common factor between these three reasons for punishment is that the goal is to change the outcome not punishment for the sake of it. This also helps answer our question about when to punish the creator or the machine. It all depends on which would be most effective at serving the goals of Deterrence and Public safety as well as who needs to be rehabilitated.

Now we need to talk about punishment as retribution. For retribution to be possible, it seems we must be dealing with a conscious being. First, if it is not conscious, how could it be “punished,” which inherently involves a kind of negative experience? So before ever thinking of retribution for AI we would need to be convinced its conscious. Secondly, when we punish someone for retribution, we are claiming they have moral responsibility for their actions, which implies they have free will, and free will is typically thought of as a conscious choice. To explore this further requires an understanding of free will, for which there are typically three main interpretations.

Hard Determinism: Free will does not exist because all of your actions are determined by prior events. There is no room outside the laws of physics to insert a kind of free will that would allow you to have done something else.
Libertarianism: You do have free will; therefore, you have the ability to decide your actions, and they are not entirely determined by previous events. Compatibilism: This view changes what we mean by “free will.” Instead of asking if our actions were determined by previous events, it posits that all that matters is that you are doing what you want and not being coerced by an external force. This seems to be closer to the legal idea of a “sound mind.”

When we say someone is morally responsible, we mean that their conscious experience somehow caused their actions; thus, it is reasonable to punish that conscious experience. This seems to require a belief in libertarian free will to justify retribution. Punishment for the other three reasons (deterrence, public safety, and rehabilitation), however, only requires a compatibilist view of free will.

The difficulty with Libertarianism is explaining how this works within the laws of physics. In some sense, it requires claiming the known laws of physics are incomplete. This difficulty makes it, in my opinion, the hardest of the three views to justify. There are also clear signs in biology that the structure of your brain is entirely determined by your genetics and environment, which are outside of your control.

This is not supposed to free you from responsibility. You are the one who made the choice based on your character and desires, and you can be held accountable in all the ways that matter for a functioning society. It just means that you are not existentially, or metaphysically responsible. To understand this better, let’s return to Gage.

Identity, Change, and Justice

Let’s use a popular thought experiment called the Ship of Theseus. It goes like this: if a ship has every one of its wooden planks replaced over time, is it still the same ship at the end? Compare that with a ship being entirely destroyed and rebuilt at once. The key is that the end result is the same for both—all new parts—and the only difference is the process, which doesn’t seem like it should matter in our evaluation of what the final product is.

In the case of Phineas Gage, he is the ship being replaced all at once. One day he is a kind person; the next, he becomes profane and rude. Someone who, over the course of a lifetime, becomes profane and rude due to genetic and environmental factors is like the ship replacing one wooden plank at a time. The end result for both is the same: they both have a brain structure that makes them rude and profane. The only difference is the process. This seems to show that if you have sympathy for someone like Gage, you should also have sympathy for the other.

Consider this: What is the difference between someone who suffers a traumatic brain injury and becomes a criminal and someone who was born with a brain structure that makes them a criminal? The same goes for AI. If we build an AI that harms people, is it fair to blame the AI solely and entirely?

Let’s end with a thought experiment that makes this clearer: the Veil of Ignorance, proposed by philosopher John Rawls. He asks us to imagine designing the principles of a just society from behind a “veil” where we don’t know our own place in it. Your conscious experience could be attached to any brain. From behind that veil, would any of us agree to a system based on pure retribution, knowing we could be the unlucky one whose brain chemistry leads them to crime?

Gage’s story shows us that because of our understanding of free will and identity, maybe we should not punish people or machines simply for retribution’s sake, but rather to serve the goal of building a better society.

Thanks for reading Consider This! If you enjoy my work consider sharing and subscribing!

redbert

Oct 5

Punish the person, not the machine, not the developer.

Unless God ordained the whole thing.

(escape hatch)

Then punish no one.

Wait....

Expand full comment

Rico

Oct 16

Observation — “ignoring the human factor of the court” — points to a more clinical model: treat criminality as an outcome, not a moral stain. From that standpoint, the questions become:

1. Can the person be made whole again?

This is the rehabilitation lens. If the brain injury caused the behavior, and modern neuroscience or therapy could restore impulse control, then punishment in the retributive sense makes little sense. Instead, treatment becomes the moral response — like fixing a broken brake line rather than punishing the car.

2. How can the public be kept safe?

This is the public safety or incapacitation justification. Even if the driver’s loss of control wasn’t “their fault,” society may still need to separate them temporarily, just as we would restrain a person in a medical crisis who’s flailing and dangerous. The focus shifts from guilt to risk management.

3. What signal does this send?

This is deterrence. Even without free will, laws shape social behavior. They act as environmental pressures: if consequences are predictable, behavior across the population changes. In this sense, punishment works statistically, not morally.

4. The human factor.

Courts still operate with moral language because humans need a shared emotional grammar to justify social order. Even if philosophy says free will might be illusory, people intuitively feel agency is real — so the system speaks in those terms.

Conclusion — that punishment becomes a matter of treatment, evaluation, and societal safety — lands squarely in the modern therapeutic jurisprudence view. It doesn’t excuse the act, but it reframes justice as repair and prevention, not revenge.

9 more comments...

Discussion about this post

Ready for more?