Quantifying Search Retrieval Performance in Financial Research

Fusion Search for AI: Comparison with Full Text (Sparse), Hybrid RRF (Sparse & Dense Vector) and Knowledge-Graph Boosted Configurations

As financial institutions turn to AI for insight extraction and research engagement, the limits of both conventional and AI search become increasingly clear.

This article presents a comparative analysis of search approaches, revealing measurable gaps in precision, recall, and contextual relevance. All of which are fundamental to search and GenAI / chatbot projects.

Benchmark Summary

Fusion Search achieved up to 3x greater recall than conventional methods, recovering semantically tagged content natively, without external filtering or interpretation layers.
Vector embeddings and hybrid search methods improved baseline performance but still missed up to two-thirds of relevant content, especially on thematic queries.
Fusion Search operates as a semantic enhancement layer, integrating with full text, vector embeddings, and hybrid search engines to orchestrate relevance with architectural intent.
Temporal fidelity is embedded, enabling recency-aware retrieval without heuristic decay or bolted-on filters.
These results reflect a shift in retrieval philosophy, from keyword matching to structured semantic interpretation, engineered for financial nuance.

For analysts, sales teams, and compliance officers, the implications are operational:
faster insight access, cleaner workflows, and defensible retrieval logic.

Why Quantifying Semantic Precision in Financial Research Matters

Generic search tooling treats financial research as flat text. Without semantic segmentation, even advanced models struggle to surface meaningful results. The consequence: missed nuance, diluted signal, and operational inefficiency.

Methodology

We tested synthetic and thematic queries across real-world financial research documents, evaluating three search approaches:

Full-Text Search (baseline keyword retrieval, also known as sparse vector search)
Hybrid Search (dense vector embeddings + keyword)
Fusion Search (Limeglass semantic knowledge-graph engine)

Each method was assessed for:

Recall: proportion of the most relevant paragraphs retrieved
Contextual fidelity: ability to segment and interpret thematic relevance

Relevance was determined using the ‘Limeglass Ensemble Retrieval & Re-Ranker Test’ (LERRT), which uses a diverse ensemble of search profile requests. Relevance was scored by a large language model (LLM) acting as a semantic judge, ensuring consistent ground truth across methods – at the paragraph level.

A score of 100% would mean that the retrieval process has provided the best possible set of results, whereas a score of 0% means that no relevant content was provided in the results.

These results are based on single-pass search profiles. Fusion Search supports composite orchestration across timespans and thematic layers, which can yield even greater performance in production environments.

Fusion Search uses a financial-domain knowledge graph to tag and segment content natively.
This enables structured semantic interpretation beyond keyword matching or vector proximity.

Synthetic vs. Thematic Question Benchmarking

What Are Synthetic Questions?

Synthetic questions are artificially constructed queries designed to closely mirror the phrasing and structure of source documents. They test retrieval systems under ideal conditions, where lexical overlap is high and ambiguity is minimal. While useful for benchmarking, synthetic queries don’t reflect the complexity of real-world financial research, where thematic nuance, abstraction, and semantic drift are common.

Why This Matters

Retrieval systems often perform well on synthetic queries, where the language closely mirrors source text. But real-world financial research demands more: thematic interpretation, semantic segmentation, and contextual nuance.

This section compares performance across two distinct query types:

Synthetic Questions: Highly specific fact-based questions. Lexically aligned, low ambiguity – ideal conditions scenario
Thematic Questions: Human-generated, abstract, and semantically rich – real life scenario

By benchmarking both, we expose the limits of conventional search and the native uplift of Fusion Search across complexity tiers.

In short: synthetic questions test surface match.
Thematic questions test semantic understanding.

Key Findings for Synthetic Questions

Synthetic Query Performance

Even when queries were synthetically precise (closely mirroring the source text), Full-Text Search still missed over a third of relevant content.

In these conditions, Hybrid Search improved matters significantly as expected, achieving 80.9% recall in this best-case scenario for search.

Signal Loss in Synthetic Queries

Search Method	Recall Rate	Signal Loss
Full-Text Search	64.5%	−35.5%
RRF Hybrid	80.9%	−19.1%

Signal loss = 100% − recall. Based on paragraph-level tagging.

These results highlight the limitations of keyword-based retrieval, even in ideal conditions.

Fusion Search – Synthetic Recall Uplift

Fusion Search improves performance by reducing both false positives and false negatives, recovering semantically tagged content natively.

Comparison	Baseline Recall	Fusion Recall	Absolute Uplift	Relative Uplift
vs. Full-Text Search	63.5%	79.9%	+15.0 pts	+23.3%
vs. RRF Hybrid	80.9%	86.5%	+5.6 pts	+7%

All values based on paragraph-level recall across synthetic queries.

Full-Text Search, when enhanced with Fusion Search, achieved a +23.3% uplift over baseline performance, even when queries were synthetically aligned with the source text.

Despite the high Hybrid scores in these ideal conditions, Fusion Search was able to increase performance by a further 7%, reaching 86.5% recall.

There is also an additional benefit of reducing false positives, which will be explored in a future benchmark.

Chart 1: Synthetic Recall Comparison (Paragraph-Level)

Why This Matters

Synthetic queries test lexical proximity. They represent the best-case scenario for Full-Text Search, yet still expose significant signal loss.

Fusion Search’s uplift here isn’t just about matching, it’s about semantic precision, contextual fidelity, and native tagging.

This sets the foundation for a step up to consider the real-life challenge of thematic questions, where language is abstract, multi-layered, and context-dependent.

Key Findings for Thematic Questions

As expected for thematic, human-generated queries, Full-Text Search (‘Sparse’) alone struggled to surface relevant content.

Modern approaches attempt to mitigate this by integrating sparse keyword signals with Dense Vector Embeddings, forming hybrid solutions that blend results using techniques like Reciprocal Rank Fusion (RRF).

While RRF hybrids improved baseline performance, they still suffered from signal loss, missing relevant content due to lack of semantic segmentation and contextual alignment.

Signal Loss in Existing Methods

Search Method	Recall Rate	Signal Loss
Full-Text Search	15.4%	−84.6%
RRF Hybrid	32.2%	−67.8%

Signal loss = 100% − recall. Based on paragraph-level tagging.

These results highlight the limitations of both traditional keyword search and modern hybrid retrieval when faced with real-world, thematic questions. Without native semantic tagging and graph-based context, even advanced configurations miss two-thirds of the most relevant content.

Fusion Search – Recall Uplift

Applying Fusion Search to each of these methods showed significant improvements across the board:

Comparison	Baseline Recall	Fusion Recall	Absolute Uplift	Relative Uplift
vs. Full-Text Search	15.4%	31.7%	+16.4 pts	+106%
vs. RRF Hybrid	32.2%	43.3%	+11.1 pts	+35%

All values based on paragraph-level recall across thematic queries.

The Fusion Search Hybrid approach achieved up to 3x greater recall than conventional baseline full text search, retrieving semantically tagged content natively. This was achieved without any external filtering or interpretation layers.

Chart 2: Thematic Recall Comparison (Paragraph-Level)

Temporal Sensitivity

Fusion Search also demonstrated superior handling of recency-aware queries, surfacing relevant content from recent publications without sacrificing thematic precision. Unlike time filters bolted onto sparse methods, or those relying on arbitrary exponential time decay.

Fusion Search’s temporal fidelity is embedded in its semantic graph logic, enabling native recency modulation without heuristic compromise.

Why Thematic Precision Matters

Thematic questions reflect how analysts and clients actually engage with financial research – abstract, multi-layered, and context-rich. Conventional search methods struggle here, often retrieving tangential or irrelevant content.

Fusion Search’s uplift isn’t just quantitative, it’s operational: enabling faster insight access, cleaner workflows, and reduced interpretive risk.

Fusion Search as a Semantic Enhancement Layer

Fusion Search is not a standalone search engine, it’s a semantic augmentation layer designed to enhance existing retrieval systems using knowledge-graph concept tagging.

It integrates with:

Full-Text Search: enriching keyword-based retrieval with semantic tagging and rank features
Dense Vector Search: adding domain-specific segmentation and thematic alignment
Hybrid Search: orchestrating sparse and dense signals with graph-based context and native tagging

By layering semantic precision on top of conventional methods, Fusion Search transforms retrieval from keyword matching to structured interpretation, without disrupting existing infrastructure.

This architectural flexibility allows Fusion Search to operate within legacy environments while delivering native uplift in recall, ranking, and relevance.

Temporal Fidelity: Beyond Time Filters

Fusion Search introduces a new dimension to retrieval: native temporal sensitivity. Unlike conventional systems that rely on bolted-on time filters or arbitrary exponential decay functions, Fusion Search encodes recency directly into its semantic graph logic. This allows it to:

Surface recent insights without sacrificing thematic precision
Modulate relevance based on contextual recency, not just timestamps
Avoid false positives from outdated but lexically similar content

This advancement is especially critical in financial workflows, where timing and context are inseparable. Analysts, sales teams, and compliance officers benefit from faster access to current, relevant insights – without signal dilution.

Temporal fidelity is not a filter, it’s a semantic signal.
Fusion Search treats time as meaning, not metadata.

Strategic Implications

For analysts, sales teams, and compliance officers, the difference is operationally significant:

Faster access to relevant insights
Topic-aware filtering across asset classes and macro themes
Recency-aware retrieval without signal dilution
Defensible workflows grounded in contextual precision

Fusion Search enables not just retrieval but, more importantly, interpretation.

Conclusion

Conventional search and retrieval methods weren’t built for financial nuance, thematic abstraction, or recency. Even advanced hybrid approaches involving dense vector embeddings miss up to half of the relevant content.

Fusion Search bridges that gap, recovering more semantically tagged insights natively across every baseline.

These results reflect not just recall uplift, but a shift in retrieval philosophy: from keyword matching to semantic interpretation.

While dense vector embeddings represent a step toward semantic retrieval, they often operate without domain-specific boundaries or contextual scaffolding, treating documents as undifferentiated text blocks. Fusion Search goes further, combining semantic knowledge-graph tagging, graph-based context, and thematic alignment to recover meaning natively. It’s not just semantic, it’s structured semantic precision, engineered for financial nuance and contextual fidelity.

This distinction matters: dense embeddings may generate plausible responses, but without structured segmentation, they risk surfacing noise or missing thematic depth.

Fusion Search retrieves relevance with architectural intent and remains LLM-agnostic by design.

Already in production via the Limeglass ResearchGenie AI Chat Bot, Fusion Search is actively increasing research engagement across client workflows.

Limeglass Fusion Search is redefining what AI-powered search can achieve – when engineered for semantic precision.

Request a meeting to see the benefits of Fusion Search for AI in action or explore a free benchmark test with your own in-house search or chat bot.

Request a meeting

Glossary

Term	Definition
Sparse Vector Search (Full-Text Search)	Keyword-based retrieval using inverted indexes
Dense Vector Embeddings	Semantic representations using neural models
RRF (Reciprocal Rank Fusion)	Technique to merge ranked lists from multiple retrieval models
Semantic Tagging	Annotating content with domain-specific knowledge-graph concepts
Temporal Fidelity	Native recency-aware relevance scoring

News & Insights

6th October 2025

Quantifying Search Retrieval Performance in Financial Research

Fusion Search for AI: Comparison with Full Text (Sparse), Hybrid RRF (Sparse & Dense Vector) and Knowledge-Graph Boosted Configurations

Why Quantifying Semantic Precision in Financial Research Matters

Methodology

Synthetic vs. Thematic Question Benchmarking

What Are Synthetic Questions?

Why This Matters

Key Findings for Synthetic Questions

Synthetic Query Performance

Signal Loss in Synthetic Queries

Fusion Search – Synthetic Recall Uplift

Why This Matters

Key Findings for Thematic Questions

Signal Loss in Existing Methods

Fusion Search – Recall Uplift

Temporal Sensitivity

Why Thematic Precision Matters

Fusion Search as a Semantic Enhancement Layer

Temporal Fidelity: Beyond Time Filters

Strategic Implications

Conclusion

Glossary

Quantifying Search Retrieval Performance in Financial Research

Limeglass + GenAI = ResearchGenie AI Chat

Making Search Work for Investment Research Part 2

Making Search Work for Investment Research Part 1