- Who We Serve
- What We Do
- About Us
- Insights & Research
- Who We Serve
- What We Do
- About Us
- Insights & Research
How Does Natural Language Processing Work?
The Key Questions to Ask
- Insights
- Factor Investing
- Portfolio Construction
Over 75% of the world’s data is textual1 — spanning firm filings, financial news, social media, earnings calls and press releases. Natural language processing (NLP), or the analysis of a large volume of language using artificial intelligence, can capture this vast body of unstructured information that numbers alone often miss, opening compelling opportunities for investors. But the barriers to generating durable alpha from text are steeper than they appear, and investors should take care to perform due diligence on investment managers who say they effectively incorporate NLP into their process.
Why NLP Is Hard to Do Well
With NLP, the raw inputs — news feeds, transcripts and filings — are widely available. The models to process them are increasingly commoditized. Anyone with basic coding skills can set up an AI agent that runs basic NLP algorithms on publicly available sources of company information. The advantage does not come from access to textual data, but from the quality, sensibility and intuition behind the economic questions you ask and the rigor with which you test the answers.
Several structural challenges make NLP particularly treacherous for investors:
- Overfitting is endemic. Text datasets have many dimensions and a relatively short history. Models can appear to work brilliantly in-sample but capture noise rather than signal.
- Publication bias compounds the problem. Academic NLP findings tend to be reported only when they work. The strategies that do not survive publication scrutiny are invisible to most market participants.
- Crowding happens fast. When a vendor offers a pre-packaged sentiment score to the market, its informational edge begins decaying from the moment of first sale.
Our recent paper in the Journal of Portfolio Management, “Natural Language Processing for Asset Managers: Turning Text to Alpha”2 analyzes these structural challenges and offers a framework of best practices that separates disciplined application of NLP from exploratory data mining that tends to disappoint out of sample.
What the Research Tells Us
Most durable NLP signals are built on economic intuition first and data second. When practitioners invert that sequence, i.e. they mine text for patterns and then retrofit a story, the results rarely survive live implementation.
To ensure that NLP applications provide a sustainable edge in investment decisions, best practice suggests a robust and repeatable framework assessing sensibility, predictability, consistency and additivity. All four criteria should be met and understood prior to deployment – see NLP Investment Checklist below.
The Signal Landscape: What Has Held Up
Our research2 surveys a broad range of text-based signals, from classic bag-of-words sentiment derived from earnings calls, to more sophisticated peer similarity measures using neural embeddings. A few categories stand out for their consistency:
- Graph theory. This models text as a network to uncover dependencies and information flow beyond simple word sequences and sentiment.
- Peer and industry structure. Text-based firm similarity measures derived from business descriptions identify competitive dynamics and revenue exposures missed by Global Industry Classification Standards (GICS).
- Forward-looking language in filings. The specificity and confidence of guidance-related language in 10-K and 10-Q filings correlates with subsequent earnings quality.
What these approaches share is clear economic rationale in advance. Management teams that hedge more are signaling something. Further, investing decisions are always peer relative. Firms’ descriptions of their own business provide a richer view than discrete industry classifications this can aid in identifying mispriced securities. These are testable propositions, not data artifacts.
Where the Evidence Is Weaker
Our research2 is equally clear about where NLP signals tend to disappoint. High-frequency sentiment derived from news wires suffers from rapid crowding and implementation friction. Social media signals, despite their popularity, have shown limited robustness in institutional equity settings after accounting for transaction costs. Generic vendor sentiment scores, applied without adjustment for sector or firm characteristics, tend to degrade quickly as the data becomes widely distributed.
Using NLP in this manner may add value at the margin that is meaningfully smaller than research abstracts often suggest — especially after considering implementation costs. That is not a reason to ignore the space; it is a reason to approach it with discipline.
What This Means in Practice: An Investor’s Checklist
For institutional investors allocating to a strategy that incorporates textual signals (NLP), thorough due diligence is important in this domain where the temptation to over-engineer is unusually high and the track record for live implementation is unusually short.

Northern Trust's Perspective
At Northern Trust Asset Management, our approach to quantitative equity investing has always prioritized economic logic over pattern recognition. NLP is no exception. We see real potential in text-based signals, particularly in areas like peer structure and management language analysis. However, we apply the same standards of out-of-sample validation, risk management and cost discipline that we bring to any quantitative investment strategy.
- Tam Harbert, “Tapping the power of unstructured data,” MIT Sloan School of Management (Ideas Made to Matter), February 1, 2021, https://mitsloan.mit.edu/ideas-made-to-matter/tapping-power-unstructured-data
- Guido Baltussen, Gijsbert de Lange, Ashraf Mansur, Olivera Rakic, and Machiel Westerdijk, “Natural Language Processing for Asset Managers: Turning Text into Alpha,” The Journal of Portfolio Management 52, no. 2 (Quantitative Tools 2025): 184–211, https://www.pm-research.com/content/iijpormgmt/52/2/184
How AI-Based Models Are Driving Alpha Generation
Contact Us
Interested in learning more about our expertise and how we can help?
IMPORTANT INFORMATION
Northern Trust Asset Management (NTAM) is composed of Northern Trust Investments, Inc., Northern Trust Global Investments Limited, Northern Trust Fund Managers (Ireland) Limited, Northern Trust Global Investments Japan, K.K, NT Global Advisors, Inc., 50 South Capital Advisors, LLC, Northern Trust Asset Management Australia Pty Ltd, and investment personnel of The Northern Trust Company of Hong Kong Limited and The Northern Trust Company.
Issued in the United Kingdom by Northern Trust Global Investments Limited, issued in the European Economic Association (“EEA”) by Northern Trust Fund Managers (Ireland) Limited, issued in Australia by Northern Trust Asset Management (Australia) Limited (ACN 648 476 019) which holds an Australian Financial Services Licence (License Number: 529895) and is regulated by the Australian Securities and Investments Commission (ASIC), and issued in Hong Kong by The Northern Trust Company of Hong Kong Limited which is regulated by the Hong Kong Securities and Futures Commission.
For Canada, Asia-Pacific (APAC) and Europe, Middle East and Africa (EMEA) markets, this information is directed to institutional, professional and wholesale clients or investors only and should not be relied upon by retail clients or invest. This document may not be edited, altered, revised, paraphrased, or otherwise modified without the prior written permission of NTAM. The information is not intended for distribution or use by any person in any jurisdiction where such distribution would be contrary to local law or regulation. NTAM may have positions in and may effect transactions in the markets, contracts and related investments different than described in this information. This information is obtained from sources believed to be reliable, its accuracy and completeness are not guaranteed, and is subject to change. Information does not constitute a recommendation of any investment strategy, is not intended as investment advice and does not take into account all the circumstances of each investor. Unless otherwise noted, the statements expressed herein are solely opinions of Northern Trust. Northern Trust does not make any representation, assurance, or other promise as to the accuracy, impact, or potential occurrence of any events or outcomes expressed in such opinions.
This report is provided for informational purposes only and is not intended to be, and should not be construed as, an offer, solicitation or recommendation with respect to any transaction and should not be treated as legal advice, investment advice or tax advice. Recipients should not rely upon this information as a substitute for obtaining specific legal or tax advice from their own professional legal or tax advisors. References to specific securities and their issuers are for illustrative purposes only and are not intended and should not be interpreted as recommendations to purchase or sell such securities. Indices and trademarks are the property of their respective owners. Information is subject to change based on market or other conditions.
Forward-looking statements and assumptions are NTAM’s current estimates or expectations of future events or future results based upon proprietary research and should not be construed as an estimate or promise of results that a portfolio may achieve. Actual results could differ materially from the results indicated by this information.
© 2026 Northern Trust Corporation. Head Office: 50 South La Salle Street, Chicago, Illinois 60603 U.S.A.