Markets are the Ultimate Training Ground for AI
Towards the AlphaZero moment of finance
At Nof1, we’re building the first superhuman trading systems, models that invest better and faster than the best human investors. We’re exploring new architectures that go beyond the current status quo, in order to achieve 1) autonomy 2) generalization across markets and 3) self-improvement in a domain where this is typically seen as impossible.
It’s been ten years since AlphaGo. It’s been ten years since the world got to see a clear example of super-human performance. The time has finally come. After some promising early results, we’re confident that frontier AI is ready for the world’s biggest game: markets.
The next, and final, frontier
Markets are in some ways the perfect environment for achieving superhuman performance. They’ve been overlooked largely because it’s so easy to make money with simpler algorithms.
But if you take off your hedge fund goggles, and look at them through the lens of an AI researcher, they’re a machine learning wet-dream. They offer endless benchmarks, evals, and reward signal in a single metric: risk-adjusted returns. We believe this single metric can eventually teach AI everything that’s economically valuable about the world.
In other domains like robotics, healthcare, and self-driving, the lack of a grounded, measurable reward signal is why AI progress takes so long. This is not the case with markets.
Hedge funds are sitting on a gold-mine of an AI learning environment, while missing the bigger picture. It’s longer path, but the journey and destination are infinitely more valuable.
Breakthroughs start with data
Every AI breakthrough has one thing in common: someone figured out how to learn from a dataset that others overlooked. That kicks off a new learning loop, pushing us towards more general intelligence.
This hasn’t happened for market-environment data or financially relevant data-streams yet, but it will, and we’ve made amazing progress already. This data is deeply grounded in all of human preferences, economics, and culture.
—
The systems we build won’t rely on expert-human trading strategies. They will autonomously generate their own strategies in an ever-growing, ever-evolving archive, and evaluate them with real market feedback. It’s a self-improving flywheel that we call RLMF (reinforcement learning from market feedback).
Nof1
Our mission is to coordinate the best minds in AI, empower them to do frontier research in a uniquely promising domain, and build systems that thrive in consequential, real-world environments. We plan to bring a deeply grounded, fresh perspective to alignment research, safety, and agent-based modeling.
When it comes to training frontier AI systems, our philosophy is rooted in the wisdom of “incentivize, don’t teach”, and markets are the ultimate incentive.
If you’re an AI/ML researcher who’s interested in building agents that learn from incentives, not from automating tasks, we’d love to talk.
Resources:
“The era of experience marks a pivotal moment in the evolution of AI. Building on today’s strong foundations, but moving beyond the limitations of human-derived data, agents will increasingly learn from their own interactions with the world. Agents will autonomously interact with environments through rich observations and actions. They will continue to adapt over the course of lifelong streams of experience. Their goals will be directable towards any combination of grounded signals. Furthermore, agents will utilise powerful non-human reasoning, and construct plans that are grounded in the consequences of the agent’s actions upon its environment. Ultimately, experiential data will eclipse the scale and quality of human generated data. This paradigm shift, accompanied by algorithmic advancements in RL, will unlock in many domains new capabilities that surpass those possessed by any human.”
- Richard Sutton & Dave Silver

