{"id":44406,"date":"2026-02-02T02:51:22","date_gmt":"2026-02-02T02:51:22","guid":{"rendered":"https:\/\/agooka.com\/news\/technologies\/llms-dont-invent-alpha-a-quant-devs-reality-check-on-ai-in-trading\/"},"modified":"2026-02-02T02:51:22","modified_gmt":"2026-02-02T02:51:22","slug":"llms-dont-invent-alpha-a-quant-devs-reality-check-on-ai-in-trading","status":"publish","type":"post","link":"https:\/\/agooka.com\/news\/technologies\/llms-dont-invent-alpha-a-quant-devs-reality-check-on-ai-in-trading\/","title":{"rendered":"\u201cLLMs don\u2019t invent alpha\u201d: A quant dev\u2019s reality check on AI in trading"},"content":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/dataconomy.com\/wp-content\/uploads\/2026\/01\/llms-dont-invent-alpha-a-quant-devs-reality-check-on-ai-in-trading.jpg\" alt=\"\u201cLLMs don\u2019t invent alpha\u201d: A quant dev\u2019s reality check on AI in trading\" title=\"\u201cLLMs don\u2019t invent alpha\u201d: A quant dev\u2019s reality check on AI in trading\"\/><\/p>\n<p>LLMs are rapidly becoming default tooling across finance, from research workflows to internal engineering support. But the gap between \u201cimpressive demo\u201d and \u201csafe deployment\u201d is still wide, especially in regulated trading environments where reliability, traceability, and controls matter as much as raw velocity.<\/p>\n<p>We spoke with <strong>Ilya Navogitsyn<\/strong>, a Quantitative Developer working at the intersection of trading, quant research, and production engineering, about what LLMs are actually good for today, and what they\u2019re not.<\/p>\n<h3><em>Q: Where are LLMs genuinely useful in quant work today: research, code, documentation, incident response, or post-trade analysis?<\/em><\/h3>\n<p><strong>Ilya:<\/strong> LLMs are best at compressing thinking, cutting <em>time to first draft<\/em> and <em>time to diagnosis<\/em>. They\u2019re not great at inventing alpha, but they remove friction around the work that surrounds it.<\/p>\n<p>Where they\u2019re genuinely useful:<\/p>\n<ul>\n<li><strong>Research write-ups and experiment summaries<\/strong> \u2014 turning scattered notes into a coherent narrative fast<\/li>\n<li><strong>Refactors and test generation<\/strong> \u2014 especially when you already know what \u201cgood\u201d looks like<\/li>\n<li><strong>Documentation and runbooks<\/strong> \u2014 the stuff every team needs and nobody wants to write<\/li>\n<li><strong>Incident response and post-trade analysis<\/strong> \u2014 triaging logs, drafting timelines, structuring a root-cause story<\/li>\n<\/ul>\n<p>They\u2019re weakest where precision and deep field knowledge matter most: signal discovery, new hypothesis generation, and live decision-making. LLMs don\u2019t reliably generate novel hypotheses. They help you narrow down or broaden what you already put in front of them, which is useful, but it\u2019s not the same as doing original research.<\/p>\n<h3><em>Q: What\u2019s the biggest misconception executives have about \u201cLLMs will make us faster\u201d in a regulated trading environment?<\/em><\/h3>\n<p><strong>Ilya:<\/strong> Executives often confuse \u201cwriting code faster\u201d with \u201cshipping safely.\u201d In regulated trading, speed is limited by auditability, testing, approvals, and risk controls, not by how quickly someone can produce code.<\/p>\n<p>There\u2019s also a trap: LLMs produce a lot of plausible output extremely quickly. That can increase the need for review, because now you have more surface area to verify. Without strong internal processes, LLMs don\u2019t accelerate teams \u2014 they either slow you down with extra verification work or, worse, increase risk, which is much worse than being slow.<\/p>\n<h3><em>Q: What\u2019s your baseline safety bar before an LLM touches anything close to production decisions?<\/em><\/h3>\n<p><strong>Ilya:<\/strong> My baseline is simple: if I can\u2019t explain exactly why the system did something, it shouldn\u2019t be anywhere near production decisions.<\/p>\n<p>Before an LLM touches anything close to live trading, it needs hard boundaries, full observability, and the ability to fail loudly. Every output must be reviewable, reproducible, and easy to say \u201cno\u201d to.<\/p>\n<p>And if a bad decision can lose real money quickly, which is most of the trading, there has to be a human in the loop with veto power.<\/p>\n<h3><em>Q: Copilot vs. agent: what\u2019s the first workflow you\u2019d let an \u201cagentic\u201d system own end-to-end, and what\u2019s absolutely off-limits?<\/em><\/h3>\n<p><strong>Ilya:<\/strong> I\u2019d first trust an agent with running experiments and writing reports: backtests, validations, summaries. That work is time-consuming and relatively easy to review.<\/p>\n<p>What\u2019s off-limits is anything that can place trades, change risk limits, or push to production. In high-volatility environments, the downside is asymmetric. If the system is wrong, it can lead to significant losses very quickly, and the failure modes are not always obvious until it\u2019s too late.<\/p>\n<h3><em>Q: How do you evaluate an LLM system on a desk: accuracy, calibration, latency, cost, auditability, or \u201cdid it avoid one catastrophic mistake\u201d?<\/em><\/h3>\n<p><strong>Ilya:<\/strong> All of those metrics matter, but on a trading desk they\u2019re secondary. The real question is: did it avoid a catastrophic mistake?<\/p>\n<p>I care much more about calibration and failure modes than raw accuracy. A system that\u2019s occasionally wrong but clearly uncertain is safer than one that\u2019s confidently wrong, because trusting the latter can lead to unacceptable losses.<\/p>\n<p>Latency and cost matter once it\u2019s proven safe. But <strong>a<\/strong>uditability is non-negotiable. If you can\u2019t reconstruct what it saw and why it produced an output, you don\u2019t have a system you can trust.<\/p>\n<h3><em>Q: What does a good \u201cLLM incident\u201d look like? How do you detect it, contain it, and learn without killing experimentation?<\/em><\/h3>\n<p><strong>Ilya:<\/strong> A good LLM incident should be boring: it gets caught early, nothing ships, and nobody loses money.<\/p>\n<p>You usually spot it because something looks off: inputs are stale, outputs don\u2019t make sense, confidence jumps when it shouldn\u2019t. You contain it with guardrails and fallbacks that already exist, not with ad-hoc heroics.<\/p>\n<p>Then you write it up, fix the gap, and move on. The goal isn\u2019t to avoid every mistake. The goal is to ensure mistakes don\u2019t turn into surprises, and don\u2019t escape into production.<\/p>\n<h3><em>Q: If everyone gets the same foundation model, where does the edge move: proprietary data, evaluation, or governance?<\/em><\/h3>\n<p><strong>Ilya:<\/strong> The edge moves to your data, your evaluation, and your process discipline.<\/p>\n<p>The firms that win won\u2019t have the \u201cbest\u201d LLM, they\u2019ll be the ones who know when not to trust it. The temptation to ship something that sounds smart has to go through a cold reality check. And right now, only humans can do that reliably.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>LLMs are rapidly becoming default tooling across finance, from research workflows to internal engineering support. But the gap between \u201cimpressive demo\u201d and \u201csafe deployment\u201d is still wide, especially in regulated trading environments where reliability, traceability, and controls matter as much as raw velocity. We spoke with Ilya Navogitsyn, a Quantitative Developer working at the intersection [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":44407,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37],"tags":[],"class_list":{"0":"post-44406","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-technologies"},"_links":{"self":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/posts\/44406","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/comments?post=44406"}],"version-history":[{"count":0,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/posts\/44406\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/media\/44407"}],"wp:attachment":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/media?parent=44406"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/categories?post=44406"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/tags?post=44406"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}