Leading Edge R&D:LEVERAGING UNSTRUCTURED DATA IN INVESTMENT MANAGEMENTAdvances in text mining technology and emerging XML standards in finance are revolutionizing the ability to extract value from unstructured and semi-structured data and will allow to increasingly structure business processes at financial firms. This is the conclusion of the new management report "Leveraging Unstructured Data in Investment Management" from The Intertek Group. The report looks at text mining technology and applications at buy- and sell-side financial institutions. Two unfolding developments are behind the report's conclusions. First, basic text mining functionality is now mature, with advanced functionality such as summarization and visualization adding powerful analytical capability. Second, XML standards in finance are well on the way to being defined. These standards will stipulate how the entire universe of financial information, from time series to analyst and corporate reports and news, will be described. This will take us from free text search to the ability to perform powerful queries on semi-structured data. "In today's difficult business environment, with the number of analysts on the investment side dropping and the number of firms being researched likewise dropping, these developments hold the promise of allowing market participants to use more sources of information efficiently," said Sergio Focardi, co-author of the report. "We also expect to see much more in the way of combining text and data mining in applications such as fundamental research, event analysis, compliance and CRM," Mr. Focardi added. "There are already implementations at financial firms and recent moves by data mining software suppliers indicate that the major players are or will shortly be present." The report discusses the functionality that allows to access and analyze huge amounts of textual data and looks at how the technology is being employed at buy- and sell-side institutions. It reviews where we are with XML standards in finance and examines how these standards will make searches more powerful and facilitate combining data and text mining operations in applications such as the analysis of events on stock prices and volumes traded. Implications for the IT infrastructure (the database, data model and query languages) are discussed. The last section of the report looks at factors affecting the take-up of the technology and the structuring of the supply side. According to the authors of the report, the challenge to suppliers is triple:
The report "Leveraging Unstructured Data in Investment Management" is based on conversations with more than 50 persons from buy- and sell-side financial institutions, technology vendors, content and ASP providers, representatives from industry consortia working on establishing XML standards, and researchers from academia and industry. About The Intertek GroupThe Intertek Group is a Paris-based firm that provides research, consulting and training on advanced IT and modeling techniques in the financial services sector and industry at large. It counts among its clients major financial institutions, industry associations and technology vendors. Title of the report: "Leveraging Unstructured Data in Investment Management" Authors: Sergio Focardi and Caroline Jonas, Number of pages: 65, Available: May 2002 Price: Euro 195.- The management summary and table of contents as well as the order form are available on The Intertek Group Web site: www.theintertekgroup.com. Management Summary: Information Overload?Information is the raw material and analytics the machinery for the "manufacture" and sale of financial products. Times series analysis has benefited from data mining techniques, now extensively used throughout the industry to engineer products, manage risk and profile clients. Textual information has remained largely outside the domain of automatic handling, but this is now changing. Though a commonplace, financial firms are confronted with the problem of information overload:
Conversely, there is also a dearth of (processed) information. It has been estimated that only one third of the roughly 10,000 US public companies are covered by meaningful Wall Street research; there are thousands of companies quoted on the US exchanges with no Wall Street research. It is unlikely the situation is better relative to the tens of thousands of firms quoted on other exchanges throughout the world. Yet increasingly companies are providing information including financial results on their Web site, adding to the more than 2 billion pages now on the World Wide Web. Any manual solution to the problem would be costly and ineffective. One widely adopted solution is to simply ignore much of this information, relying on a small number of trusted sources. In the globalized world of finance, this non-solution can prove to be expensive; the technology is now there to help. A quiet revolution is taking place in the way unstructured information, in particular textual information, is handled. A key aspect of this change: unstructured information is progressively being transformed into self-describing, semi-structured information that can be managed by computers. Technologies that allow computers to "understand" the content of documents are now widely used in basic functions such as free text search and navigation. This basic functionality is already on the desktop -- thanks to Web search engines and industry content providers -- without any need to know the technology or plan for its implementation. But leveraging the technology throughout the firm to gain an information advantage or to further structure and automate processes such as the investment management process will require much more. The components in the revolution in handling unstructured data are:
The emergence of standards for the handling of "meaning" is a major development. It implies that unstructured textual information, which some estimates put at 80% of all content stored in computers, will be largely replaced by semi-structured information ready for machine handling at a semantic level. The eXtensible Markup Language (XML) and its Resource Description Framework (RDF) are already a reality. Industry- and application-specific standards are being developed around the general-purpose XML. In finance, standards are being defined to stipulate how the entire universe of financial information, from time series to analyst and corporate reports and news, will be described. This will greatly facilitate text mining. The diffusion of XML-based standards and text mining functionality will have consequences on the way the business is managed and, ultimately, on performance:
The challenge to buy- and sell-side firms alike is multiple:
Text-mining technology will have major implications for individual firms and will likely affect relationships within the industry. But it is unlikely that the technology will have a big effect on financial markets themselves. As occurred with the diffusion of data mining techniques, markets might become slightly more efficient; the shape of price processes might change somewhat, but the global behavior of markets will be largely unaffected. Contact Caroline Jonas, Partner, The Intertek Group, 94, rue de Javel F-75015, Paris, +33 1/45 75 51 74, intertekcj@aol.com. |