Quantifying Search Retrieval Performance in Financial Research
Fusion Search for AI: Comparison with Full Text (Sparse), Hybrid RRF (Sparse & Vector) and Knowledge-Graph Boosted Configurations
As financial institutions turn to AI for insight extraction and research engagement, the limits of both conventional and AI search become increasingly clear.
This article presents a comparative analysis of search approaches, revealing measurable gaps in precision, recall, and contextual relevance. All of which are fundamental to search and GenAI / chatbot projects.
Benchmark Summary
- Fusion Search achieved up to 3x greater recall than conventional methods, recovering semantically tagged content natively, without external filtering or interpretation layers.
- Vector embeddings and hybrid search methods improved baseline performance but still missed up to half of relevant content, especially on thematic queries.
- Fusion Search operates as a semantic enhancement layer, integrating with full text, vector embeddings, and hybrid search engines to orchestrate relevance with architectural intent.
- Temporal fidelity is embedded, enabling recency-aware retrieval without heuristic decay or bolted-on filters.
- These results reflect a shift in retrieval philosophy, from keyword matching to structured semantic interpretation, engineered for financial nuance.
Why Quantifying Semantic Precision in Financial Research Matters
Generic search tooling treats financial research as flat text. Without semantic segmentation, even advanced models struggle to surface meaningful results. The consequence: missed nuance, diluted signal, and operational inefficiency.
Methodology
We tested synthetic and thematic queries across real-world financial research documents, evaluating three search approaches:
- Full Text Search (baseline keyword retrieval, also known as sparse vector search)
- Hybrid Search (dense vector embeddings + keyword)
- Fusion Search (Limeglass semantic engine)
Each method was assessed for:
- Recall: proportion of relevant paragraphs retrieved
- Contextual fidelity: ability to segment and interpret thematic relevance
Relevance was determined using the ‘Limeglass Ensemble Retrieval & Re-Ranker Test’ (LERRT), which uses a diverse ensemble of search profile requests. Relevance was scored by a large language model (LLM) acting as a semantic judge, ensuring consistent ground truth across methods.
Synthetic vs. Thematic Question Benchmarking
What Are Synthetic Questions?
Synthetic questions are artificially constructed queries designed to closely mirror the phrasing and structure of source documents. They test retrieval systems under ideal conditions, where lexical overlap is high and ambiguity is minimal. While useful for benchmarking, synthetic queries don’t reflect the complexity of real-world financial research, where thematic nuance, abstraction, and semantic drift are common.
Why This Matters
Retrieval systems often perform well on synthetic queries, where the language closely mirrors source text. But real-world financial research demands more: thematic interpretation, semantic segmentation, and contextual nuance.
This section compares performance across two distinct query types:
- Synthetic Questions: Lexically aligned, low ambiguity
- Thematic Questions: Human-generated, abstract, and semantically rich
By benchmarking both, we expose the limits of conventional search and the native uplift of Fusion Search across complexity tiers.
Key Findings for Synthetic Questions
Synthetic Query Performance
Even when queries were synthetically precise (closely mirroring the source text), Full Text Search still missed over a third of relevant content.
Signal Loss in Synthetic Queries
Search Method | Recall Rate | Signal Loss |
---|---|---|
Full Text Search | 63.5% | −36.5% |
Signal loss = 100% − recall. Based on paragraph-level tagging.
These results highlight the limitations of keyword-based retrieval, even in ideal conditions.
Fusion Search – Synthetic Recall Uplift
Fusion Search improves performance by reducing both false positives and false negatives, recovering semantically tagged content natively.
Comparison | Baseline Recall | Fusion Recall | Absolute Uplift | Relative Uplift |
---|---|---|---|---|
vs. Full Text Search | 63.5% | 79.9% | +16.3 pts | +25.7% |
All values based on paragraph-level recall across synthetic queries.
Full Text Search, when enhanced with Fusion Search, achieved a +25.7% uplift over baseline performance, even when queries were synthetically aligned with the source text.
Chart 1: Synthetic Recall Comparison (Paragraph-Level)
Why This Matters
Synthetic queries test lexical proximity. They represent the best-case scenario for Full Text Search, yet still expose significant signal loss.
Fusion Search’s uplift here isn’t just about matching, it’s about semantic precision, contextual fidelity, and native tagging.
This sets the foundation for a step up to consider the real-life challenge of thematic questions, where language is abstract, multi-layered, and context-dependent.
Key Findings for Thematic Questions
As expected for thematic, human-generated queries, Full Text Search (‘Sparse’) alone struggled to surface relevant content.
Modern approaches attempt to mitigate this by integrating sparse keyword signals with Dense Vector Embeddings, forming hybrid solutions that blend results using techniques like Reciprocal Rank Fusion (RRF).
While RRF hybrids improved baseline performance, they still suffered from signal loss, missing relevant content due to lack of semantic segmentation and contextual alignment.
Signal Loss in Existing Methods
Search Method | Recall Rate | Signal Loss |
---|---|---|
Full Text Search | 15.6% | −84.4% |
RRF Hybrid | 35.3% | −64.7% |
Advanced RRF Hybrid | 52.4% | −47.6% |
Signal loss = 100% − recall. Based on paragraph-level tagging.
These results highlight the limitations of both traditional keyword search and modern hybrid retrieval when faced with real-world, thematic questions. Without native semantic tagging and graph-based context, even advanced configurations miss nearly half of the relevant content.
Fusion Search – Recall Uplift
Applying Fusion Search to each of these methods showed significant improvements across the board:
Comparison | Baseline Recall | Fusion Recall | Absolute Uplift | Relative Uplift |
---|---|---|---|---|
vs. Full Text Search | 15.6% | 26.9% | +11.3 pts | +72% |
vs. RRF Hybrid | 35.3% | 43.8% | +8.5 pts | +24% |
vs. Advanced RRF Hybrid | 52.4% | 60.5% | +8.1 pts | +15% |
All values based on paragraph-level recall across thematic queries.
The Fusion Search Hybrid approach achieved up to 3x greater recall than conventional baseline full text search, retrieving semantically tagged content natively. This was achieved without any external filtering or interpretation layers.
Chart 2: Thematic Recall Comparison (Paragraph-Level)
Temporal Sensitivity
Fusion Search also demonstrated superior handling of recency-aware queries, surfacing relevant content from recent publications without sacrificing thematic precision. Unlike time filters bolted onto sparse methods, or those relying on arbitrary exponential time decay.
Fusion Search’s temporal fidelity is embedded in its semantic graph logic, enabling native recency modulation without heuristic compromise.
Why Thematic Precision Matters
Thematic questions reflect how analysts and clients actually engage with financial research – abstract, multi-layered, and context-rich. Conventional search methods struggle here, often retrieving tangential or irrelevant content.
Fusion Search’s uplift isn’t just quantitative, it’s operational: enabling faster insight access, cleaner workflows, and reduced interpretive risk.
Fusion Search as a Semantic Enhancement Layer
Fusion Search is not a standalone search engine, it’s a semantic augmentation layer designed to enhance existing retrieval systems using knowledge-graph concept tagging.
It integrates with:
- Full Text Search: enriching keyword-based retrieval with semantic tagging and rank features
- Dense Vector Search: adding domain-specific segmentation and thematic alignment
- Hybrid Search: orchestrating sparse and dense signals with graph-based context and native tagging
By layering semantic precision on top of conventional methods, Fusion Search transforms retrieval from keyword matching to structured interpretation, without disrupting existing infrastructure.
This architectural flexibility allows Fusion Search to operate within legacy environments while delivering native uplift in recall, ranking, and relevance.
Temporal Fidelity: Beyond Time Filters
Fusion Search introduces a new dimension to retrieval: native temporal sensitivity. Unlike conventional systems that rely on bolted-on time filters or arbitrary exponential decay functions, Fusion Search encodes recency directly into its semantic graph logic. This allows it to:
- Surface recent insights without sacrificing thematic precision
- Modulate relevance based on contextual recency, not just timestamps
- Avoid false positives from outdated but lexically similar content
This advancement is especially critical in financial workflows, where timing and context are inseparable. Analysts, sales teams, and compliance officers benefit from faster access to current, relevant insights – without signal dilution.
Strategic Implications
For analysts, sales teams, and compliance officers, the difference is operationally significant:
- Faster access to relevant insights
- Topic-aware filtering across asset classes and macro themes
- Recency-aware retrieval without signal dilution
- Defensible workflows grounded in contextual precision
Fusion Search enables not just retrieval but, more importantly, interpretation.
Conclusion
Conventional search and retrieval methods weren’t built for financial nuance, thematic abstraction, or recency. Even advanced hybrid approaches involving dense vector embeddings miss up to half of the relevant content.
Fusion Search bridges that gap, recovering more semantically tagged insights natively across every baseline.
These results reflect not just recall uplift, but a shift in retrieval philosophy: from keyword matching to semantic interpretation.
While dense vector embeddings represent a step toward semantic retrieval, they often operate without domain-specific boundaries or contextual scaffolding, treating documents as undifferentiated text blocks. Fusion Search goes further, combining semantic knowledge-graph tagging, graph-based context, and thematic alignment to recover meaning natively. It’s not just semantic, it’s structured semantic precision, engineered for financial nuance and contextual fidelity.
This distinction matters: dense embeddings may generate plausible responses, but without structured segmentation, they risk surfacing noise or missing thematic depth.
Fusion Search retrieves relevance with architectural intent and remains LLM-agnostic by design.
Already in production via the Limeglass ResearchGenie AI Chat Bot, Fusion Search is actively increasing research engagement across client workflows.
Limeglass Fusion Search is redefining what AI-powered search can achieve – when engineered for semantic precision.
Request a meeting to see the benefits of Fusion Search for AI in action or explore a free benchmark test with your own in-house search or chat bot.
Glossary
Term | Definition |
---|---|
Sparse Vector Search (Full Text Search) | Keyword-based retrieval using inverted indexes |
Dense Vector Embeddings | Semantic representations using neural models |
RRF (Reciprocal Rank Fusion) | Technique to merge ranked lists from multiple retrieval models |
Semantic Tagging | Annotating content with domain-specific knowledge-graph concepts |
Temporal Fidelity | Native recency-aware relevance scoring |