If you are a shareholder of various assets, chances are that you have access to more information than you could possibly process. There is no shortage of high quality and relevant news pieces with different perspectives on assets. While this text data is useful for some active traders, its use is limited for the bulk of smaller investors. The sheer volume of articles published every day might seem exhausting for a non-professional investor owning a few stocks and funds. However, the information in the news and analyses can still be valuable. Perhaps this data could be summarized in a way that gives a higher value to the end-user?     

This post suggests a way of delivering news in a concise and intuitive manner. By using natural language processing of news related to the assets in a portfolio, an average sentiment score overall news pieces can be calculated. Rather than having a large set of news titles covering your portfolio summary, it can be neatly compressed into an indicator giving the general mood of the current news landscape related to your assets.   

Natural language processing (NLP) is concerned with modelling human language in a variety of ways. It is used to transform spoken words into written text, translate languages, answer questions and even generate synthetic text pieces. The last few years have seen a leap in the performance of NLP models on relevant benchmarking tasks. This progress is often compared to the improvement of image recognition technology during 2010-2015, where the performance on benchmarking tasks surpassed human ability. We are now seeing how this has had a commercial impact on, for instance, facial recognition, self-driving cars and improved medical diagnosis.     

One of the keys to this shift in both computer vision and NLP is transfer learning. Transfer learning is the concept of training a model on a large, general dataset and then fine-tuning the model to a specific task, where data is more scarce. For NLP models, this is quite intuitive. The current state of the art models such as GPT-3 and variations of BERT are trained on huge datasets of unlabelled text pieces. The model extracts a generic view of how language is structured, and what pieces are important. In May 2019, researchers trained a BERT model on 3.3 billion words from 11,000 books and a dump of the English Wikipedia. After training, the model could set the new state-of-the-art scores for a variety of different NLP tasks, despite not being specifically tweaked for those tasks. A similar process is used for the model GPT-3, which after training can synthesize quite astonishing pieces of text. So how does this field translate into finance? And how can it be used to serve the customers of financial institutions with relevant solutions?   

A study conducted by Kidbrooke and Lund University explores the validity of using NLP models on financial text pieces. The input data consists of 114,000 daily financial news pieces regarding the US market spanning over 7 years. This is then used to predict whether the S&P 500 and US treasury rates would rise or fall the coming day given the news of today. Furthermore, the study examined if the NLP model could predict whether a traditional time series estimate of the future price would be too high or too low.   

This project concludes that only using financial news to predict the movement of the S&P 500 scores well above a random baseline - i.e. compared to flipping a coin. This suggests that methods for natural language processing are able to extract relevant information from a financial context. The study was conducted as a general machine learning process to ensure that the results are generalizable. The data available for fitting the model is split into two parts - training and test data. The model is trained on the training set and then evaluated on the test data. This secures the test score reflecting the performance on out-of-sample data, i.e. data the model has not encountered during the fitting.   

Knowing that these NLP models are in fact capable of retrieving relevant financial information from text pieces, we can explore a use case for a customer owning a number of assets. There are already news feeds customized for a customer's portfolio of assets. It is probably more useful to get a condensed view of the sentiment of the news through an intuitive metric, rather than skimming through 30 news pieces. Suppose a range of moods describing the essence of the articles. An NLP model can be used to classify each news piece in a scale ranging from negative - neutral - positive. The fraction of news pieces in each category is then a useful and easy-to-digest indicator for how the news landscape is currently mentioning your assets.   

While these automated summarizations make no claim of being independently deeply analytical, they do indeed serve the purpose of packaging information in a way that engages the end-user. This functionality expands the available measures to the customer for swift and succinct analysis. If the essence of the news for the assets is negative, it might intrigue you to investigate further. Using technological progress to increase the value for the customer isn’t always trivial, and potentially implied complexity for the customer must be respected. This solution lowers the threshold for users to engage in their finances and adds another dimension to the analytical toolbox, without increased complexity.