Perplexity upgrades Deep Research tool with Claude Opus 4.5 integration

0
17

Perplexity upgrades Deep Research tool with Claude Opus 4.5 integration

Perplexity announced on Wednesday an upgrade to its Deep Research tool, now running on Anthropic’s Claude Opus 4.5 model integrated with the company’s proprietary search engine and sandbox infrastructure. The upgrade became available immediately to Max subscribers and will extend to Pro users in coming days.

The company released DRACO, an open-source benchmark for evaluating deep research agents. DRACO stands for Deep Research Accuracy, Completeness and Objectivity benchmark. It assesses performance based on real-world usage patterns instead of isolated skills. The benchmark contains 100 tasks spread across 10 domains.

Those domains consist of Academic, Finance, Law, Medicine, Technology, General Knowledge, UX Design, Personal Assistant, Shopping, and Needle in a Haystack. Each of the 100 tasks undergoes evaluation against approximately 40 expert-defined criteria. These criteria span four dimensions: factual accuracy, breadth and depth of analysis, presentation quality, and citation quality.

We're also releasing a new open-source benchmark for evaluating deep research agents.

The Deep Research Accuracy, Completeness, and Objectivity (DRACO) Benchmark is grounded in how people actually use deep research.

Read more about how the benchmark was built:… pic.twitter.com/QjcOBhGUJk

— Perplexity (@perplexity_ai) February 4, 2026

Perplexity’s Deep Research tool recorded a normalized score of 67.15 percent on DRACO. This outperformed Google Gemini Deep Research at 58.97 percent and OpenAI Deep Research at 52.06 percent using the o3 model. The company’s results from its accompanying paper detail these comparisons directly.

Performance rankings stayed consistent when evaluated by different judge models, such as GPT-5.2 and Sonnet-4.5. Perplexity showed the largest gaps over the second-best system in Medicine, General Knowledge, and Technology domains. Those margins reached 9 to 12 percentage points.

The highest absolute scores for Perplexity Deep Research occurred in Law at 86.0 percent and Academic at 80.2 percent. DRACO construction drew from anonymized requests made to Perplexity Deep Research. Developers augmented these into complex, open-ended tasks that replicate actual research requirements.

Unlike traditional benchmarks focused on fact retrieval or trivia, DRACO tests comprehensive research processes. It incorporates efficiency measurements alongside accuracy. Perplexity Deep Research posted the lowest average latency of 459.6 seconds while securing the top accuracy scores.

This upgrade builds on the initial Deep Research launch in February 2025. That version introduced multi-pass querying and cross-source verification features. Perplexity signed a reported $750 million cloud deal with Microsoft in January 2025.

CEO Aravind Srinivas stated that “for finance specifically, data accuracy is a must and high stakes.” The company views Deep Research as a core element in its approach to deliver research-grade analysis. This positions it against products from Google and OpenAI.

Featured image credit