Identification of Topic Correlations at paragraph level

Problem Solved: Difficulties in finding tradeable instruments thematically.
Client Benefits: Automated detection of co-location of specific topics in individual paragraphs allows for the discovery of cross-asset thematic links.
Post-Publication: xx
Paragraph Level Tags: Limeglass atomises your documents, breaking them down into individual paragraphs, tagging each one in great detail meaning co-location of tags can be tracked while preserving the context in the original text.

Ingestion & Atomisation
Post-Publication

Automatic Tagging
Paragraph Level Tags

Output
API

Client Type
Sales

Where does the Client use it?
Data Science

Specific Client Use Cases:

Sell Side Analyst collaboration using Key Research Indicator tags
- Set up alerts for certain key tags so that analysts can know when their colleagues are writing about things that might have a material impact on their own coverage
Buy Side Tradeable Instrument ideas
- Track the co-location of certain topic tags with tradeable instrument tags. For example, any stocks mentioned in the context of an important theme.

Helping you connect the dots.

Structure your Unstructured Data!

Investment Research is, by its nature, a curious combination of highly structured information (Macro or Micro financial forecasts, Equity Ratings, financial ratios) with completely unstructured information (investment theses, background information, company descriptions).

Some of the structured information can be used to ensure that people are reading the right content and that certain important connections have been picked up. When an Oil analyst publishes a piece updating their Brent Crude forecast, for example, if that forecast is stored in a database, it can be flagged to colleagues (perhaps Airline analysts and Economists) who rely on it for inputs to their own models.

However, there are plenty of situations where that sort of useful automated cascading of information is impossible because the information is not structured. And this is where paragraph-level topic correlations comes in.

When an Economist writes about the relationship between consumer spending and jobless claims, you would hope that the Economist’s Consumer Stock analyst colleagues would read the research to understand hard or soft impacts on their forecasts. Do they need to update the growth rates plugged into their DCF valuation models?

Sell Side Analysts are Resource-Rich but Time-Poor

But even though they theoretically have this information available to them, they are unlikely to read it. Analysts are resource-rich but time-poor. And, given that the information they need is unstructured text, there is no way to set up a system to flag it to the right people.

However, if that document has been processed and atomised by Limeglass, the content tags for each paragraph will be readily available and an automated system could be built off that information. ‘Consumer Spending’ and ‘Jobless Claims’ would be separate tags that would both occur in this paragraph.

If the Consumer Stock analysts were set up to receive alerts for the ‘Consumer Spending’ tag, they would be made aware that there is a potentially interesting co-occurrence with the ‘Jobless Claims’ tag. This may pique their interest enough to read the relevant paragraph, thus improving the information flow between different subject matter experts on the Sell Side.

Buy Side clients want to see “joined-up thinking” from their Research providers

The same logic works for the actual consumers of the research as well: the Buy Side. A fund manager will be much more likely to take research seriously if they can see that their Sell Side counterpart is making collaborative investment recommendations with joined-up thinking. If they are reading the Economist’s report and can be alerted to ‘related research’, they could click through to the Consumer analyst’s report which references the other’s work.

A system that can achieve this is not any ordinary NLP tagger. Limeglass can do this because of its use of two key proprietary technologies: Atomisation and Rich NLP.

Atomisation breaks documents into individual paragraphs and then preserves all the relevant metadata and context whenever these paragraphs are then used. Rich NLP, unlike most NLP tools, uses knowledge graphs based on both the plain text extracted from the document, and important context like positioning on the page, formatting, and the use of headings and sub-headings. It is only by bringing these two systems together that you can analyse tag co-location in a meaningful way.

All 6