
X’s engineering team published the code powering its “for you” recommendation algorithm last month, with Elon Musk calling it a transparency victory unmatched by other social media companies. Researchers, however, describe the release as a redacted version lacking meaningful insight into the system’s operations.
The publication occurred in a context where X stands alone among major social networks in open-sourcing elements of its recommendation algorithm. Elon Musk stated, “We know the algorithm is dumb and needs massive improvements, but at least you can see us struggle to make it better in real-time and with transparency.” He added, “No other social media companies do this.” This move followed a similar but earlier release in 2023, yet the current version draws criticism for its limitations.
We know the algorithm is dumb and needs massive improvements, but at least you can see us struggle to make it better in real-time and with transparency.
No other social media companies do this. https://t.co/UMvBlD1ZpV
— Elon Musk (@elonmusk) January 20, 2026
John Thickstun, an assistant professor of computer science at Cornell University, characterized the code as a “redacted” version of X’s algorithm. He expressed concern in an interview with Engadget: “What troubles me about these releases is that they give you a pretense that they’re being transparent for releasing code and the sense that someone might be able to use this release to do some kind of auditing work or oversight work. And the fact is that that’s not really possible at all.” Thickstun’s assessment underscores the gap between the release and actual usability for external analysis.
User reactions emerged swiftly on X after the code’s publication. Creators shared extensive threads interpreting the code to enhance visibility. One post, viewed more than 350,000 times, advised that X “will reward people who conversate” and urged users to “raise the vibrations of X.” A second post, exceeding 20,000 views, asserted that “posting video is the answer.” Another recommended sticking to a “niche” because “topic switching hurts your reach.” These interpretations proliferated despite the code’s constraints.
Thickstun cautioned against deriving strategies from the release. He stated, “They can’t possibly draw those conclusions from what was released.” The code offers limited operational details, such as filtering out content older than one day, which provides a glimpse into post eligibility but leaves most mechanisms inaccessible.
Thickstun noted that much of the disclosed information remains “not actionable” for content creators seeking to influence recommendations. This scarcity of practical insights aligns with the redacted nature, restricting applications beyond basic filtering rules.
A structural shift distinguishes the current algorithm from the 2023 version. The new system employs a Grok-like large language model to rank posts. Ruggero Lazzaroni, a Ph.D. researcher at the University of Graz, explained the prior approach: “In the previous version, this was hard coded: you took how many times something was liked, how many times something was shared, how many times something was replied … and then based on that you calculate a score, and then you rank the post based on the score.”
Lazzaroni detailed the change: “Now the score is derived not by the real amounts of likes and shares, but by how likely Grok thinks that you would like and share a post.” This transition replaces explicit metrics with model-generated predictions, altering the ranking foundation entirely.
The reliance on a large language model increases opacity, according to Thickstun. He observed, “So much more of the decision-making … is happening within black-box neural networks that they’re training on their data.” He continued, “More and more of the decision-making power of these algorithms is shifting not just out of public view, but actually really out of view or understanding of even the internal engineers that are working on these systems, because they’re being shifted into these neural networks.”
The latest release omits details previously available in 2023 regarding interaction weightings. That earlier version specified, for example, that a reply equaled 27 retweets and a reply generating a response from the original author equaled 75 retweets. X redacted these weightings, citing “security reasons.” This removal eliminates a key quantitative element from public view.
Absence of training data details further limits understanding. The code provides no information on the dataset used to train the model. Mohsen Foroughifar, an assistant professor of business technologies at Carnegie Mellon University, emphasized this gap: “One of the things I would really want to see is, what is the training data that they’re using for this model. If the data that is used for training this model is inherently biased, then the model might actually end up still being biased, regardless of what kind of things that you consider within the model.”
Foroughifar’s comment highlights potential persistent biases originating from training data, independent of model architecture adjustments. Such disclosure would enable assessments of foundational influences on recommendations.
Lazzaroni, involved in an EU-funded project simulating real-world social media platforms to test alternative recommendation approaches, views research access as highly valuable. His work replicates platform dynamics to evaluate methods, yet he finds the released code inadequate. He stated, “We have the code to run the algorithm, but we don’t have the model that you need to run the algorithm.” Without the underlying model, reproduction proves impossible.
Studying X’s algorithm holds broader relevance. Thickstun noted parallels with emerging technologies: “A lot of these challenges that we’re seeing on social media platforms and the recommendation [systems] appear in a very similar way with these generative systems as well. So you can kind of extrapolate forward the kinds of challenges that we’ve seen with social media platforms to the kind of challenges that we’ll see with interaction with GenAI platforms.”
Thickstun’s observation connects social media recommendation issues to those in AI chatbots and generative systems, where comparable transparency hurdles arise. Researchers anticipate recurring patterns from social platforms in these domains.
Lazzaroni, experienced in simulating toxic social media behaviors, critiqued optimization priorities: “AI companies, to maximize profit, optimize the large language models for user engagement and not for telling the truth or caring about the mental health of the users. And this is the same exact problem: they make more profit, but the users get a worse society, or they get worse mental health out of it.”
This perspective frames engagement-driven design as a shared concern across platforms, linking profitability to societal and individual costs. Lazzaroni’s simulations underscore behaviors amplified by such systems.
The 2023 release included explicit scoring based on quantifiable interactions like likes, shares, and replies, enabling clearer comprehension of ranking logic. The shift to neural network predictions obscures these processes, as internal model computations evade direct inspection.
Filtering mechanisms, such as excluding posts over one day old, represent one of few retained specifics. This rule ensures recency in feeds, prioritizing fresh content in user timelines.
User threads post-release exemplify rapid community analysis, though researchers deem them unsubstantiated. Claims about conversation rewards, video prioritization, or niche adherence stem from partial code views, not comprehensive evidence.
Musk’s transparency claim positions X distinctly, yet redactions on weights and data preserve core proprietary elements. Security justifications for exclusions maintain competitive edges.
Foroughifar’s focus on training data reveals a critical audit barrier. Biased datasets propagate effects through model outputs, complicating debiasing without source visibility. Lazzaroni’s EU project context illustrates practical research needs. Simulations demand full algorithmic fidelity, unmet by model-absent code.
Featured image credit































