Databases designed for specific purposes often fail when asked to solve a different problem. As an example, the securities finance databases of leading data providers such as FIS Astec, Datalend, and IHS Markit, designed more than 20 years ago for performance benchmarking, are inadequate when queried for the purpose of the loans themselves. Even regulatory databases enriched with new SFTR filings can only help supervisors monitor leverage based on end-of-day positions, and are unable to determine the propriety of the loans without mapped flow data.
None of the existing databases were intended or designed to map loans edge-to-edge, that is, from the principal lender to the principal borrower. Usually the loan of securities is made by a pension or mutual fund through a series of financial intermediaries to the ultimate borrower, which is generally the trading desk at a hedge fund or broker-dealer. The fungibility of securities allows the systems of the intermediaries to pool the loans and distribute the borrowed securities through a highly-efficient netting system that breaks the chain of loans and borrows.
Data vendors obtain their records from the intermediaries. Client account identities are always screened. Even in regulatory reporting, such as the SFTR system, the prime brokers and agent banks are permitted to use encryption shields for account identities -- so long as they can uncover the accounts for bank examiners and tax auditors.
Data aggregation vendors cannot trace the loan flows of the underlying accounts because they're not provided with data that could identify the clients. For example, in a lawsuit involving a global mutual fund family, the performance measurement service that had been outsourced to a leading vendor produced ROA charts for each of the family's fund accounts. Each chart was encoded with 4 digit alphanumeric "Account Name" code and an 8- to 12-digit "Account Path" code.
The lending agent's system decrypts the vendor codes and distributes the charts based on their actual account identifier. The peer groups chosen for benchmarks are very general: country and fund category. As a result, it is impossible for the vendor to identify and associate the fund with its risk profile, much less to link the source to the use of the borrowed securities. The Legal Entity Identifier (LEI) developed for the SFTR reports improves the possibility of linking but SFTR data aggregators are prohibited from using the account identities for any purpose other than regulatory reporting, unless so authorized by the underlying account. Non-disclosure agreements (NDA's) require the lending agents to firewall the vendor codes and LEIs.
A full mapping is needed to determine the purpose of the borrow. But that cannot be done with existing databases. The private and regulatory databases rely on loan reports that terminate at the prime broker, not the hedge fund. Without a connection to the true demand source, i.e., the trading desk, it is impossible to determine the purpose and therefore the propriety of the loan. The fluidity of the current market infrastructure adds unpredictable fluctuations to the degree that securities lending activity relates to short selling, especially when attempting to forecast published short interest.
IHS Markit has published a paper on the January 2021 short squeeze that candidly explains the analytic problem as two-fold. First, the ability of securities purchasers to on-lend their newly-acquired positions means that more than 100% of the share float can be on loan at any one time. Second, the ability of prime brokers to use internal resources, such as authorized hedge fund or even retail long positions, means that loans to hedge fund short sellers do not always correspond closely to their borrows from agents for the lenders.
Any suggestion of change from the prior short interest has the potential to introduce error, so a substantial recognition of changes in shares on loan should only be done when the two series are highly correlated, grading slowly toward a very limited reliance on equity finance data where there is a low expectation for forecasting success. In this view, the forecast performed as expected with the inputs available. It would have been possible for the Jan 29 short interest to print at 50m shares, which would have been interpreted as a substantial uptick in dealer inventory, likely the result of an increase in hedge fund longs (possibly also some index related Delta-1 longs). Given the events which unfolded over the last week of January, along with the decline in shares on loan, that may have been deemed unlikely, but is important not to discount as a possibility when considering the model output. 
In the Markit paper, the tracking error between short interest published by the exchanges and the loan interest from agents is shown to be off by as much as 20% or more. Therefore, using existing databases for anything other than relative agent performance introduces a significant error factor.
The supply of shares from beneficial owners in securities lending programs can be tracked as a real-time indication of availability from institutional owners of shares, while the gap between the exchange short interest and borrowed shares provides an indication of shares sourced by broker dealers away from the traditional securities finance channel.
1. Sam Pierson, "Short squeeze by the numbers," IHS Markit, 12 February 2021, at https://ihsmarkit.com/research-analysis/short-squeeze-by-the-numbers.html?