The Lever and the Enter Key: A Conceptualization of Agent-Mediated Software Development as a Functional Analog of Brain Stimulation Reward

This essay is also available as a preprint on PsyArXiv. To cite it, use the following reference (APA 6th edition):

White, R. D. (n.d.). The lever and the enter key: A conceptualization of agent-mediated software development as a functional analog of brain stimulation reward. Retrieved from https://osf.io/preprints/psyarxiv/zxpwg_v1

Abstract

Olds and Milner (1954) reported that rats with electrodes implanted in the septal area would learn to lever-press for brief pulses of electrical stimulation delivered to the implant site, establishing what is now called brain stimulation reward (BSR) and its operant counterpart, intracranial self-stimulation (ICSS). Subsequent work demonstrated that rats stimulated in the lateral hypothalamus or medial forebrain bundle would respond at sustained high rates (Olds, 1958a) and would self-stimulate to physical exhaustion without satiation under extended testing (Olds, 1958b), would forgo food to the point of starvation when both food and stimulation were available concurrently (Routtenberg & Lindy, 1965), and would respond for brain stimulation in competition with shock avoidance (Valenstein & Beer, 1962). Researchers have since demonstrated the phenomenon in multiple species, including humans, with published case reports of compulsive self-stimulation that closely mirror the rodent literature (Bishop, Elder, & Heath, 1963; Moan & Heath, 1972; Portenoy et al., 1986). The purpose of this essay is to (a) review BSR and its proposed neural mechanism with appropriate hedging of the contested causal role of dopamine, (b) propose a structural analogy between BSR and the reinforcement profile generated by interaction with large language model (LLM) coding agents (e.g., Claude, Codex), and (c) clarify that the behaviors examined in this essay are conceptually distinct from the cluster of LLM-associated psychotic phenomena recently discussed in the clinical literature on so-called “AI psychosis” (Flathers et al., 2026; Morrin et al., 2026), which refer to delusion formation rather than compulsive use. I close with a personal observation, as the analogy did not become persuasive to me until I noticed it operating in myself.

Introduction

In 1953, James Olds and Peter Milner, then of McGill University, observed that rats would return to the region of a test apparatus where they had received direct electrical stimulation of the septal area, and they inferred from this observation that the stimulation was rewarding (Olds & Milner, 1954). Subsequent experiments demonstrated that rats could be trained to execute novel behaviors (i.e., lever pressing) in order to obtain brief pulse trains of stimulation, and that response rates were comparable to those produced by natural reinforcers. Later work established that stimulation of the lateral hypothalamus produced the most vigorous responding, with rats responding at sustained high rates (Olds, 1958a) and self-stimulating to physical exhaustion under extended testing (Olds, 1958b). Researchers further demonstrated that, when given a choice between food and brain stimulation under conditions of food deprivation, rats would self-stimulate to the point of starvation (Routtenberg & Lindy, 1965), and that rats would respond for brain stimulation in competition with shock avoidance (Valenstein & Beer, 1962). Researchers have since demonstrated the phenomenon in multiple species, including humans. Bishop, Elder, and Heath (1963) provided the first published report of human ICSS, and Moan and Heath (1972) subsequently described the well-known case of patient B-19, who responded vigorously to self-stimulation of septal regions and protested when access to the apparatus was withdrawn.

The purpose of this essay is to use BSR and ICSS as a conceptual lens rather than as a neurophysiological tool. Specifically, I will argue that the reinforcement profile generated by interaction with contemporary LLM coding agents is structurally analogous to BSR, and that this analogy is supported by (though not equivalent to) the operant literature on intermittent reinforcement schedules.

The BSR paradigm and its proposed neural mechanism

The lever in the Olds and Milner (1954) chamber does not, of course, deliver a reward in any conventional sense. Each press triggers a brief depolarization of neurons local to the electrode site. When the electrode is placed along the trajectory of the medial forebrain bundle (MFB) through the lateral hypothalamus or ventral tegmental area (VTA), this depolarization produces the most vigorous responding (Wise, 1996). The mechanism by which this occurs is, importantly, not as direct as it is sometimes summarized in introductory texts. Paired-pulse collision studies have demonstrated that the population of neurons directly activated by typical MFB stimulation parameters consists principally of myelinated, non-dopaminergic descending fibers (Bielajew & Shizgal, 1986; Yeomans, 1989), which are thought to engage the small, unmyelinated VTA dopamine neurons (Gallistel, 1986; Wise, 1996). Through this pathway, the mesocorticolimbic dopamine system is engaged in a two-stage fashion (cf. the “series-circuit” model), and dopamine release in the nucleus accumbens has been measured directly via fast-scan cyclic voltammetry during ICSS (Owesson-White et al., 2008).

The precise causal role of this dopamine signal in supporting ICSS remains debated. Garris et al. (1999) demonstrated that nucleus accumbens dopamine release can be dissociated from continued ICSS responding, and Trujillo-Pisanty et al. (2020) argued that dopamine neurons do not constitute an obligatory stage in the final common path for the evaluation and pursuit of brain stimulation reward. For the present argument, the contested causal status of dopamine is less important than the uncontested behavioral observation: BSR bypasses the peripheral sensory pathways through which natural reinforcers ordinarily reach the reward circuitry, and it does so without producing the satiation effects characteristic of natural reinforcers (Wise, 1996). Olds (1958b) demonstrated this lack of satiation directly in extended testing in which rats with hypothalamic electrodes self-stimulated to exhaustion. Basically, the gates that ordinarily moderate reward-seeking behavior are uncoupled from the reinforcing signal.

Importantly, researchers have also shown that BSR is acutely sensitive to the temporal contingency between response and reinforcement. Black, Belluzzi, and Stein (1985) reported that reinforcement delays as short as one second severely impair the acquisition of brain self-stimulation, which is consistent with the broader operant principle that delayed reinforcement is less effective than immediate reinforcement (Ferster & Skinner, 1957).

Researchers have additionally demonstrated that the phenomenon is not confined to rodents. Bishop, Elder, and Heath (1963) provided the first published report of human ICSS, and Moan and Heath (1972) subsequently described the well-known case of patient B-19, who self-stimulated vigorously and protested when the apparatus was removed. Portenoy et al. (1986) reported a later case of compulsive thalamic self-stimulation accompanied by erotic, autonomic, electrophysiologic, and behavioral correlates. These human cases are ethically distressing and methodologically isolated, but they suggest that the structural features of BSR (i.e., rapid, low-effort responding for an outsized reinforcing signal that bypasses natural satiation gates) are not species-specific.

A structural analogy: agent invocation as lever press

If we abstract BSR and ICSS to their functional components, namely a low-effort motor action that reliably produces an outsized internal reinforcement signal that is uncoupled from the natural evaluative gates ordinarily moderating reward-seeking behavior, the parallels to contemporary AI-assisted software development become difficult to ignore. Consider the act of pressing enter on a coding agent. The motor cost is trivial. The latency to a salient “something is happening” signal (e.g., streaming tokens, tool calls, code being written) is brief. The agent’s output, regardless of its eventual quality, registers as forward motion. Importantly, the natural evaluative gates that previously moderated the act of building (i.e., a working build, a happy user, a finished feature) are no longer interposed between the press and the reinforcing signal.

It is important to note the kind of reinforcement schedule that this constitutes. Strictly speaking, agent invocation is closer to continuous reinforcement (i.e., every press produces a response) than to the variable-ratio schedules that Ferster and Skinner (1957) demonstrated to be maximally extinction-resistant. However, the quality of agent output varies considerably from press to press, and the perceived “win” (i.e., a working solution, a clever piece of code, an insight) is delivered on what is functionally a variable-ratio schedule layered over continuous low-grade reinforcement. Essentially, the layered schedule is, in principle, more reinforcing than either pure continuous or pure variable-ratio reinforcement alone. The analogy proposed here is, importantly, my own and is not yet established by primary literature on agent-mediated development; the broader popular discourse on consumer AI products has similarly framed engagement in slot-machine and intermittent-reinforcement terms, but those framings are rhetorical rather than peer-reviewed and should be treated as such.

Considering this analogy, three behavioral predictions follow. First, we would expect the behavior to be resistant to satiation, because the natural gates that previously throttled it are no longer in the loop between action and reward. Second, we would expect the behavior to compete with and ultimately displace previously preferred goals, in the same way that ICSS-trained rats abandoned food and water under concurrent-access conditions (Routtenberg & Lindy, 1965). In practice, this would look like extended sessions of agent invocation during which the operator loses track of the original objective (e.g., answering an email, fixing a specific bug, sleeping). Third, we would expect the behavior to be particularly difficult to recognize from the inside, because the phenomenology of “I am doing meaningful work” is preserved even after the underlying reinforcement loop has changed. Essentially, the reinforcing signal arriving from the press is, in subjective terms, indistinguishable from the signal that used to arrive from the artifact.

It is important to note a clarification on terminology at this point. The behaviors examined in this essay should be distinguished from the cluster of LLM-associated psychotic phenomena recently discussed in the clinical literature on so-called “AI psychosis” (Flathers et al., 2026; Morrin et al., 2026), which refer to delusion formation, paranoid ideation, and reality-testing failures emerging from intensive LLM interaction. What I am describing, by contrast, is more accurately characterized as a behavioral addiction or compulsive use phenomenon, structurally analogous to BSR. The two phenomena may co-occur, and prolonged compulsive engagement may plausibly increase vulnerability to delusional ideation, but they are mechanistically distinct and should not be conflated. Essentially, the framing proposed in this essay is a reinforcement framing, not a psychotic one.

Author observation

A brief observation from personal experience may be warranted at this point, as the structural similarity proposed in this essay did not become persuasive to the author until it was observed operating in the author’s own engagement with agent-mediated development.

Prior to the integration of LLM coding agents into the author’s workflow, the act of building was experienced as intrinsically rewarding, and the rewarding signal was reliably associated with the completion and external use of a shipped artifact. The effort required to produce that artifact, though often substantial, functioned as the price paid for the downstream reward, and the ratio between effort and outcome appeared to constitute the meaningful aspect of the work. Importantly, the rewarding signal was gated by external verification (i.e., a working build, an engaged user, a finished feature) rather than by the act of building itself.

Following the integration of LLM coding agents into the author’s practice, this ratio has substantially shifted. The throughput of agent-mediated development is, basically, an order of magnitude higher than that previously achievable through unaided practice, and the workflow itself consists primarily of issuing prompts, evaluating agent responses, and issuing subsequent prompts in rapid succession. In this light, the rewarding signal appears to have decoupled from the shipped artifact and to have become reliably associated with the act of agent invocation itself. Essentially, the proximate reinforcer is no longer the shipped artifact but rather the press, and the phenomenology of forward motion that accompanies the press has come to function as the operative reward.

Beyond the author’s own experience, anecdotal reports from heavily agent-adopting organizations suggest that the phenomenon may compound at the organizational level. Specifically, individuals have been described as working 20-hour days, remaining in the office abnormally late to continue coding with agents, neglecting healthy eating and sleep patterns, and achieving what is colloquially described as a state of psychological flow. While these reports are anecdotal and not yet systematically documented, they are, importantly, consistent with the behavioral predictions that follow from the analogy proposed in this essay. A question worth considering follows directly from this trajectory. Specifically, if token spend and operator time eventually rise to a point at which continued agent use cannot be economically justified within organizations that have fully embraced agent-mediated development, what will the operator response look like, given the parallel to patient B-19 (Moan & Heath, 1972), who protested vigorously when access to the self-stimulation apparatus was withdrawn?

These observations are, importantly, phenomenological evidence of plausibility rather than evidence of general prevalence, and they are offered here as the proximate motivation for the analogy proposed in this essay rather than as empirical support for the analogy itself. However, the structural correspondence is difficult to overlook: a low-effort motor action reliably produces an internal reinforcing signal that is uncoupled from the natural evaluative gates ordinarily moderating the behavior. Considering this, the user interface element that delivers the reinforcing signal functions, by structural analogy, as a functional analog of the medial forebrain bundle within the interaction loop of agent-mediated development.

Conclusion

The analogy proposed in this essay is still in the conceptualization phase, but the underlying mechanisms drawn from the BSR and operant literatures are well-established and well-replicated. In recap, the analogy proposes that the reinforcement profile of agent-mediated development is structurally similar to that observed in BSR, specifically in that a low-effort motor action reliably produces an internal reinforcing signal that is uncoupled from the natural evaluative gates ordinarily moderating reward-seeking behavior. Importantly, the concern that follows from this framing is not that agent tools are themselves problematic but rather that the default interaction pattern of agent-mediated development can become self-reinforcing in a manner that decouples effort from completion and, in this light, may come to shape the operator’s experience of agency itself.

Considering this framing, the locus at which the phenomenon should be addressed shifts from the level of individual willpower to the level of the interaction loop itself. Specifically, if the reinforcement profile of agent-mediated development is structurally similar to that observed in BSR, then the appropriate response is neither moral exhortation toward individual willpower nor abstinence from the use of these tools, but rather a deliberate examination of the interaction. A particular strength of this framing is, importantly, that it converts a previously diffuse concern into a tractable design and empirical question, with predictions that can be tested empirically against measurable behavioral outcomes. The current essay exemplifies the need for further work to examine the practical implications of this framing for agent-mediated development.

References

Bielajew, C., & Shizgal, P. (1986). Evidence implicating descending fibers in self-stimulation of the medial forebrain bundle. The Journal of Neuroscience, 6(4), 919–929.

Bishop, M. P., Elder, S. T., & Heath, R. G. (1963). Intracranial self-stimulation in man. Science, 140(3565), 394–396. https://doi.org/10.1126/science.140.3565.394

Black, J., Belluzzi, J. D., & Stein, L. (1985). Reinforcement delay of one second severely impairs acquisition of brain self-stimulation. Brain Research, 359(1–2), 113–119. https://doi.org/10.1016/0006-8993(85)91418-0

Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. Appleton-Century-Crofts.

Flathers, M., Roux, A., & Torous, J. (2026). Beyond artificial intelligence psychosis: A functional typology of large language model-associated psychotic phenomena. The Lancet Digital Health. https://pubmed.ncbi.nlm.nih.gov/41833467/

Gallistel, C. R. (1986). The role of the dopaminergic projections in MFB self-stimulation. Behavioural Brain Research, 22(2), 97–105.

Garris, P. A., Kilpatrick, M., Bunin, M. A., Michael, D., Walker, Q. D., & Wightman, R. M. (1999). Dissociation of dopamine release in the nucleus accumbens from intracranial self-stimulation. Nature, 398(6722), 67–69. https://doi.org/10.1038/18019

Moan, C. E., & Heath, R. G. (1972). Septal stimulation for the initiation of heterosexual behavior in a homosexual male. Journal of Behavior Therapy and Experimental Psychiatry, 3(1), 23–30.

Morrin, H., Nicholls, L., Levin, M., Yiend, J., Iyengar, U., DelGuidice, F., Bhattacharya, S., Tognin, S., MacCabe, J., Twumasi, R., Alderson-Day, B., & Pollak, T. A. (2026). Artificial intelligence-associated delusions and large language models: Risks, mechanisms of delusion co-creation, and safeguarding strategies. The Lancet Psychiatry. Advance online publication. https://doi.org/10.1016/S2215-0366(25)00396-7

Olds, J. (1958a). Self-stimulation of the brain; its use to study local effects of hunger, sex, and drugs. Science, 127(3294), 315–324.

Olds, J. (1958b). Satiation effects in self-stimulation of the brain. Journal of Comparative and Physiological Psychology, 51(6), 675–678.

Olds, J., & Milner, P. (1954). Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. Journal of Comparative and Physiological Psychology, 47(6), 419–427.

Owesson-White, C. A., Cheer, J. F., Beyene, M., Carelli, R. M., & Wightman, R. M. (2008). Dynamic changes in accumbens dopamine correlate with learning during intracranial self-stimulation. Proceedings of the National Academy of Sciences, 105(31), 11957–11962. https://doi.org/10.1073/pnas.0803896105

Portenoy, R. K., Jarden, J. O., Sidtis, J. J., Lipton, R. B., Foley, K. M., & Rottenberg, D. A. (1986). Compulsive thalamic self-stimulation: A case with metabolic, electrophysiologic and behavioral correlates. Pain, 27(3), 277–290.

Routtenberg, A., & Lindy, J. (1965). Effects of the availability of rewarding septal and hypothalamic stimulation on bar pressing for food under conditions of deprivation. Journal of Comparative and Physiological Psychology, 60(2), 158–161.

Trujillo-Pisanty, I., Conover, K., Solis, P., Palacios, D., & Shizgal, P. (2020). Dopamine neurons do not constitute an obligatory stage in the final common path for the evaluation and pursuit of brain stimulation reward. PLOS ONE, 15(2), e0226722. https://doi.org/10.1371/journal.pone.0226722

Valenstein, E. S., & Beer, B. (1962). Reinforcing brain stimulation in competition with water reward and shock avoidance. Science, 137(3535), 1052–1054. https://doi.org/10.1126/science.137.3535.1052

Wise, R. A. (1996). Addictive drugs and brain stimulation reward. Annual Review of Neuroscience, 19, 319–340. https://doi.org/10.1146/annurev.ne.19.030196.001535

Yeomans, J. S. (1989). Two substrates for medial forebrain bundle self-stimulation: Myelinated axons and dopamine axons. Neuroscience & Biobehavioral Reviews, 13(2–3), 91–98.

Supplementary resources

The following non-scholarly materials may be useful to readers who wish to view archival footage of the ICSS paradigm. They are not cited as primary evidence above and are listed here for reference only.

Motivation: Self-stimulation in rats [Video]. YouTube. https://www.youtube.com/watch?v=aNXhyPj-RsM

Intra-cranial electrical self-stimulation [Video]. YouTube. https://www.youtube.com/watch?v=87XQDC_qPic

What is rewarding brain stimulation? [Video]. YouTube. https://www.youtube.com/watch?v=7HbAFYiejvo

Abstract#

Introduction#

The BSR paradigm and its proposed neural mechanism#

A structural analogy: agent invocation as lever press#

Author observation#

Conclusion#