Human Learning, Machine Learning
Few people have the wisdom to prefer the criticism that would do them good, to the praise that deceives them." - François de La Rochefoucauld
I missed sending you my email last week because I was building something I didn’t expect to become so deep in nature: I designed a system to manage a swarm of LLM agents.
I call the system Super_Agents and it’s free to use. It’s a personal OS on steroids: flexible, thoughtful, personal, and designed around the way founders actually think and work. The system has over a hundred LLM sub-agents, each with a distinct role, personality, and evaluation criteria. Some plan my weeks. Some coach me on blind spots. Some research, some write, some challenge everything the others produce. Think of it as a team that fits your needs like a glove — a team you train, and help grow in ability over time. You can use it for free.
All of this got me thinking about the essence of expertise, experience, learning, improving, and where humans and robots might overlap in terms of skills acquisition.
What does it actually take to get better?
The answer, it turns out, is the same whether you’re training an AI agent or training yourself. And it’s a bit counterintuitive.
Experience Alone Doesn’t Make You Better
Most founders believe that doing it again — launching another company, running another experiment, taking another at-bat — makes them meaningfully better the second or third time around. Seems reasonable, since entrepreneurship involves pattern matching.
Some evidence shows otherwise. Shaw and Sorensen’s large-scale analysis of serial entrepreneurs found that learning across successive ventures is a “high-noise, low-signal environment.” The progress ratio — total learning relative to starting knowledge — hovers close to 1. Experience accumulates. Improvement? Not reliably.
In 1993, psychologist K. Anders Ericsson defined three modes of practice:
Naïve practice is repetition with the expectation that volume equals improvement.
Purposeful practice adds goals and self-monitoring.
Deliberate practice goes further — it requires a coach who designs activities targeting identified weaknesses, with immediate feedback and forced revision.
He used the modes to run a study on how people try to improve. Most people operate at the naïve level. Even most high-performers stay at the purposeful level. Deliberate practice — where almost all meaningful improvement actually lives — is comparatively rare.
The critical variable is not effort. It’s feedback quality. Using external systems to feedback into the inputs is the special sauce. One on one, real time, contextual feedback is useful and actionable. That’s why the very top player in any sport has a coach. Even if they are the best, they still are subjectively biased. Since they don’t need to identify what to work on themselves, they are free to just do the work to improve. All of this means tighter and better feedback loops.
Therapists don’t improve with experience when patient feedback is delayed or inaccurate — even across decades of practice. Surgeons, radiologists, financial analysts: the pattern repeats. Without timely, specific, and accurate feedback, experience doesn’t compound.
Deliberate practice is about systems design. A personal system that keeps you on the path of small, continued wins, like guardrails that prevent you from steering off too far.
Entrepreneurship is learning in adverse conditions. That is my own definition and also one that applies to antifragile systems, and these same concepts also can be put into action with Claude Code and other autonomous agent models.
Calibration vs. Confidence
Experienced founders develop strong heuristics — mental shortcuts that worked in a previous context. The research is clear: these heuristics become liabilities. The very fluency that signals expertise can mask the fact that you’re pattern-matching to a context that no longer exists.
Without external challenge, experience doesn’t produce calibration. It produces overconfidence that leads to ruin.
Surprise, Not Failure, Is the Best Teacher
Butterfield and Metcalfe documented what they called the hypercorrection effect: when you make an error you were certain about — not a careless mistake, but a confident, load-bearing belief — and then receive accurate feedback, the learning signal is stronger than when you’re corrected on something you already doubted.
The bigger the gap between your confidence and reality, the more powerfully the correction sticks.
Your brain flags the moment: this belief was structural, and it was wrong. Rebuild. The surprise is functional. It captures more attention, produces better retention, and drives faster behavioral change than any feedback on things you already suspected might be off.
The implication is uncomfortable: the most valuable feedback isn’t the kind that catches your known weaknesses. It’s the kind that exposes your confident wrongness — the corrections you don’t see coming.
For founders, this means the advisors who agree with your worldview are performing the least useful version of their function. The real value lives in the person — or the system — that challenges your certainties. With enough specificity to act on, and enough surprise that it actually lands.
For founders with ADHD — significantly overrepresented in entrepreneurship — there’s an additional constraint. Learning from errors requires what researchers call emotion and motivation control: the capacity to sit with the discomfort of being wrong without the nervous system hijacking the cognitive work. The feedback has to be legible enough to process and the environment safe enough to sit with. This isn’t background noise. It’s a first-order constraint on whether the feedback loop functions at all.
The Bar Raiser
Ok so let’s bring everything we’ve discussed so far together in a concrete way. One of the most useful mental models I picked up working at Amazon is the Bar Raiser — a person embedded in every hiring loop whose sole job is to protect the quality standard. An external evaluator with one question: will this person raise the talent baseline overall?
The Bar Raiser works because of what it is, architecturally: an external input, free from groupthink, evaluating against explicit criteria rather than vibes. A feedback loop built into the system by design.
When I started building Super_Agents, this became the most important pattern I implemented. Every team of sub-agents working toward a shared outcome has a Bar Raiser agent sitting outside the workflow. It doesn’t generate. It evaluates. Specifically, it:
Defines what quality looks like for each agent’s input and output
Measures those dimensions over time
Delivers a clear scorecard with learnings — often prompting the agent to redo the work entirely from a different angle
Aggregates patterns across all agents so the system learns as a whole, not just in parts
The result isn’t magic. It’s the same science, applied structurally. The AI research calls it RLAIF — Reinforcement Learning from AI Feedback — where an agent learns to evaluate its own outputs and improve through that evaluation. What the research shows: the quality of the evaluation function determines the ceiling of the entire system. Not the capability of the generator. The sophistication of the critic.
But here’s where the parallel between human and machine learning differ. AI agents can be reset. Retrained from scratch. You can swap out a reward model overnight. Founders can’t. Your evaluation function — the internal standard you hold yourself to — was formed over decades. Shaped by childhood, by early career feedback, by the first investor who said yes or the co-founder who said you were wrong. It’s not a model you can retrain. It’s a deep psychological structure.
It’s a lot harder for us humans to unlearn our conditioning. We have no factory reset option.
What Every Learning System Needs
Across learning science, entrepreneurial cognition, AI alignment, and error-correction neuroscience — the same architecture emerges:
An evaluation standard specific enough to act on. Not “do better.” A standard that distinguishes a 6/10 output from a 9/10 and can articulate why.
Timely feedback against that standard. Not quarterly board meetings. Feedback that arrives while the context is still fresh enough to learn from.
Forced revision. Not optional. Structural. The system doesn’t proceed until the output has been challenged and improved.
Regular recalibration of the evaluation function itself. The standard has to evolve. An evaluation function that worked for your first company may be actively misleading for your third.
The Bar Raiser has to be bar-raised too, once in a while.
Have a nice week end.
Selected References
Ericsson, K.A., Krampe, R., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363–406.
Shaw, K., & Sørensen, A. (2016). The productivity advantage of serial entrepreneurs. Journal of Labor Economics, 34(S2), S99–S132.
Butterfield, B., & Metcalfe, J. (2006). The correction of errors committed with high confidence. Metacognition and Learning, 1, 69–84.
Mitchell, J.R., Busenitz, L.W., Bird, B., Gaglio, C.M., McMullen, J.S., Morse, E.A., & Smith, J.B. (2007). The central question in entrepreneurial cognition research. Entrepreneurship Theory and Practice, 31(1), 1–27.
Lee, H., Phatale, S., Mansoor, H., et al. (2023). RLAIF vs. RLHF: Scaling reinforcement learning from human feedback with AI feedback. arXiv:2309.00267.
Antshel, K.M. (2018). Attention deficit/hyperactivity disorder (ADHD) and entrepreneurship. Academy of Management Perspectives, 32(2), 243–265.
