{"id":50281,"date":"2026-05-11T10:41:37","date_gmt":"2026-05-11T10:41:37","guid":{"rendered":"https:\/\/agooka.com\/news\/business\/cuda-proves-nvidia-is-a-software-company\/"},"modified":"2026-05-11T10:41:37","modified_gmt":"2026-05-11T10:41:37","slug":"cuda-proves-nvidia-is-a-software-company","status":"publish","type":"post","link":"https:\/\/agooka.com\/news\/business\/cuda-proves-nvidia-is-a-software-company\/","title":{"rendered":"CUDA Proves Nvidia Is a Software Company"},"content":{"rendered":"<p>Save StorySave this storySave StorySave this story<\/p>\n<p>Forgive me for starting with a clich\u00e9, a piece of finance jargon that has recently slipped into the tech lexicon, but I\u2019m afraid I must talk about \u201cmoats.\u201d Popularized decades ago by Warren Buffett to refer to a company\u2019s competitive advantage, the word found its way into Silicon Valley pitch decks when a memo purportedly leaked from Google, titled \u201cWe Have No Moat, and Neither Does OpenAI,\u201d fretted that open-source AI would pillage Big Tech\u2019s castle.<\/p>\n<p>A few years on, the castle walls remain safe. Apart from a brief bout of panic when DeepSeek first appeared, open-source AI models have not vastly outperformed proprietary models. Still, none of the frontier labs\u2014OpenAI, Anthropic, Google\u2014has a moat to speak of.<\/p>\n<p>The company that does have a moat is Nvidia. CEO Jensen Huang has called it his most precious \u201ctreasure.\u201d It is not, as you might assume for a chip company, a piece of hardware. It\u2019s something called CUDA. What sounds like a chemical compound banned by the FDA may be the one true moat in AI.<\/p>\n<p>CUDA technically stands for Compute Unified Device Architecture, but much like <em>laser<\/em> or <em>scuba<\/em>, no one bothers to expand the acronym; we just say \u201cKOO-duh.\u201d So what is this all-important treasure good for? If forced to give a one-word answer: parallelization.<\/p>\n<p>Here\u2019s a simple example. Let\u2019s say we task a machine with filling out a 9\u00d79 multiplication table. Using a computer with a single core, all 81 operations are executed dutifully one by one. But a GPU with nine cores can assign tasks so that each core takes a different column\u2014one from 1\u00d71 to 1\u00d79, another from 2\u00d71 to 2\u00d79, and so on\u2014for a ninefold speed gain. Modern GPUs can be even cleverer. For example, if programmed to recognize commutativity\u20147\u00d79 = 9\u00d77\u2014they can avoid duplicate work, reducing 81 operations to 45, nearly halving the workload. When a single training run costs a hundred million dollars, every optimization counts.<\/p>\n<p>Nvidia\u2019s GPUs were originally built to render graphics for video games. In the early 2000s, a Stanford PhD student named Ian Buck, who first got into GPUs as a gamer, realized their architecture could be repurposed for general high-performance computing. He created a programming language called Brook, was hired by Nvidia, and, with John Nickolls, led the development of CUDA. If AI ushers in the age of a permanent white-collar underclass and autonomous weapons, just know that it would all be because someone somewhere playing <em>Doom<\/em> thought a demon\u2019s scrotum should jiggle at 60 frames per second.<\/p>\n<p>CUDA is not a programming language in itself but a \u201cplatform.\u201d I use that weasel word because, not unlike how The New York Times is a newspaper that\u2019s also a gaming company, CUDA has, over the years, become a nested bundle of software libraries for AI. Each function shaves nanoseconds off single mathematical operations\u2014added up, they make GPUs, in industry parlance, go <em>brrr<\/em>.<\/p>\n<p>A modern graphics card is not just a circuit board crammed with chips and memory and fans. It\u2019s an elaborate confection of cache hierarchies and specialized units called \u201ctensor cores\u201d and \u201cstreaming multiprocessors.\u201d In that sense, what chip companies sell is like a professional kitchen, and more cores are akin to more grilling stations. But even a kitchen with 30 grilling stations won\u2019t run any faster without a capable head chef deftly assigning tasks\u2014as CUDA does for GPU cores.<\/p>\n<p>To extend the metaphor, hand-tuned CUDA libraries optimized for one matrix operation are the equivalent of kitchen tools designed for a single job and nothing more\u2014a cherry pitter, a shrimp deveiner\u2014which are indulgences for home cooks but not if you have 10,000 shrimp guts to yank out. Which brings us back to DeepSeek. Its engineers went below this already deep layer of abstraction to work directly in PTX, a kind of assembly language for Nvidia GPUs. Let\u2019s say the task is peeling garlic. An unoptimized GPU would go: \u201cPeel the skin with your fingernails.\u201d CUDA can instruct: \u201cSmash the clove with the flat of a knife.\u201d PTX lets you dictate every sub-instruction: \u201cLift the blade 2.35 inches above the cutting board, make it parallel to the clove\u2019s equator, and strike downward with your palm at a force of 36.2 newtons.\u201d<\/p>\n<p>You can begin to see why CUDA is so valuable to Nvidia\u2014and so hard for anyone else to touch. Tuning GPU performance is a gnarly problem. You can\u2019t just conscript some tender-footed undergrad on Market Street, hand them a Claude Max plan, and expect them to hack GPU kernels. Writing at this level is a grindsome enterprise\u2014unless you\u2019re a cracker-jack programmer at DeepSeek.<\/p>\n<p>A disclosure: In previous Machine Readable columns, I was already familiar with the languages I was analyzing. Not so here. In the interest of maintaining this standard, I decided to spend a day with CUDA. It ruined my afternoon.<\/p>\n<p>A simple matrix multiplication that usually takes me three lines in PyTorch\u2014a popular machine-learning framework\u2014took me 50-plus lines in CUDA. Wringing out the last drop of performance, it turns out, is an admirable but tedious business. Having dipped my toe in the moat, I can report that it is indeed deep and forbidding.<\/p>\n<p>CUDA\u2019s dominance is built not just on the quality of its ecosystem but on a lock-in effect. Because modern machine-learning frameworks are built on CUDA, which crucially runs on Nvidia chips, AMD\u2019s chips underperform even when they have more cores and memory. Comparing chips by spec sheets is like comparing race cars by cylinder count, whereas real performance can only be measured on the track.<\/p>\n<p>A second disclosure: I intended to benchmark two chips, but there was no way to expense an Nvidia H100 and an AMD MI300X without landing on Cond\u00e9 Nast\u2019s blacklist. Instead, you will have to take the word of independent researchers who found that even with better specs on paper, AMD was outmatched by Nvidia.<\/p>\n<p>Nvidia\u2019s edge in software might be that, unusual for a chip company, it hires more software engineers than hardware engineers. If I were running AMD, I might follow suit. (But who\u2019s asking me?)<\/p>\n<p>Every year, there are new hopefuls attempting to drain Nvidia\u2019s moat, only to drown in it. OpenCL, an open standard backed by a consortium that included Apple, AMD, and Qualcomm, was a kind of Android manqu\u00e9 to CUDA\u2019s iOS. It barely gained traction.<\/p>\n<p>Meanwhile, AMD\u2019s answer to CUDA, ROCm, is an even worse name than CUDA\u2014is it pronounced \u201crock cum\u201d? (Forget about hiring more programmers\u2014get a new marketing team.) It has also has been so plagued by bugs and compatibility issues that its subreddit reads like a support group.<\/p>\n<p>Let\u2019s not forget Intel. While it\u2019s easy to brush it off as a failing chipmaker, its recent history reveals it\u2019s also a failing software company. In a last spasm at relevance, it launched oneAPI, but as of 2026, we know for dead sure that CUDA still reigns. If there\u2019s any challenger, it\u2019s Modular, led by Chris Lattner, the legendary language designer who counts among his creations Apple\u2019s Swift and LLVM.<\/p>\n<p>But the open secret is that, much as theoretical physicists cannot change a tire to save their lives, most AI researchers can\u2019t so much as write a single line of C++. There are very few good GPU kernel engineers alive, and many of them are employed by Nvidia. Long before AI researchers started trafficking in clout, these engineers were diligently working on CUDA without kudos. Even trusty coding agents still hobble through kernel code.<\/p>\n<p>Nvidia, in the end, may be closer to Apple than to AMD or Intel. It\u2019s a great hardware company because it\u2019s a software company. Apple\u2019s moat against Android was never just the iPhone but the ecosystem: iOS, the App Store, and its developers. Sure, you can fold a Samsung Galaxy in half, but do you really want to use Samsung Pay? In the meantime, the industry will have to live with Nvidia\u2019s offensive price tags.<\/p>\n<p><em>This is the first of a three-part Machine Readable series on AI-enabling languages.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Save StorySave this storySave StorySave this story Forgive me for starting with a clich\u00e9, a piece of finance jargon that has recently slipped into the tech lexicon, but I\u2019m afraid I must talk about \u201cmoats.\u201d Popularized decades ago by Warren Buffett to refer to a company\u2019s competitive advantage, the word found its way into Silicon [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":50282,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36],"tags":[],"class_list":{"0":"post-50281","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-business"},"_links":{"self":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/posts\/50281","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/comments?post=50281"}],"version-history":[{"count":0,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/posts\/50281\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/media\/50282"}],"wp:attachment":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/media?parent=50281"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/categories?post=50281"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/tags?post=50281"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}