Knowledge-Driven Stock Trend Prediction and Explanation via Temporal Convolutional Network

Deep neural networks have achieved promising results in stock trend prediction. However, most of these models have two common drawbacks, including (i) current methods are not sensitive enough to abrupt changes of stock trend, and (ii) forecasting results are not interpretable for humans. To address these two problems, we propose a novel Knowledge-Driven Temporal Convolutional Network (KDTCN) for stock trend prediction and explanation. Firstly, we extract structured events from financial news, and utilize external knowledge from knowledge graph to obtain event embeddings. Then, we combine event embeddings and price values together to forecast stock trend. We evaluate the prediction accuracy to show how knowledge-driven events work on abrupt changes. We also visualize the effect of events and linkage among events based on knowledge graph, to explain why knowledge-driven events are common sources of abrupt changes. Experiments demonstrate that KDTCN can (i) react to abrupt changes much faster and outperform state-of-the-art methods on stock datasets, as well as (ii) facilitate the explanation of prediction particularly with abrupt changes.


INTRODUCTION
Stock trend prediction has been widely studied due to its scientific and economic merits, and recent efforts mostly focus on exploring potential of deep neural network models. Although such methods [22,26,44] perform well in many tasks, they show weakness in tackling the problem of capturing unexpected abrupt changes, and falling short of giving explanations for prediction results. Stock trend prediction with abrupt changes. In stock trend prediction, abrupt changes mean that stock prices fluctuate sharply in an extremely short time interval [7,16,19,28,43]. For example, as shown in Figure 1, the DJIA (Dow Jones Industrial Average) index increased by 1.29% on 23rd June, 2016, while fell by 3.39% sharply on the next day. For the sake of adapting to abrupt trend changes, [26] have proposed a TreNet to learn both local and global numerical features from stock prices. However, it can be insufficient to merely utilize price data. [17] has shown that stock prices can only reflect all known information, and price movements are in response to news or events. Such as in Figure 1, between 23rd June and 24th June, an unexpected event of British referendum happened. In order to encode events for stock trend prediction, [11] have demonstrated the validity of deep learning methods for event-driven stock market prediction, through event-embedding-based news representations. Despite news events help people capture abrupt changes of stock trend swiftly, they are often disordered and sparse. To address this problem, we import exogenous knowledge to represent events. Knowledge, coming from knowledge graphs (KGs), have two major advantages: (i) enriched semantic information in knowledge can help to establish association among discrete events, and (ii) knowledge in KG is structured and facile to be parameterized.
Deep prediction models lack explanations. Machine learning (ML) explanation, such as interpreting prediction models or justifying prediction results, can significantly increase decision makers' confidence on prediction and boost its application [4,33]. Even if deep prediction models successfully detect abrupt changes in the stock market, it is hard for them to make people without ML expertise understand why these changes happen. For example, in event-driven stock trend prediction, people may be concerned with which events have greatly influenced stock fluctuation, and how these events take effects. To tackle this problem, we encode interpretable knowledge in deep prediction models, making prediction explainable.
Above all, in this paper, we propose a novel Knowledge-Driven Temporal Convolutional Network (KDTCN), incorporating background knowledge, news events and price data into deep prediction models, to tackle the problem of stock trend prediction and explanation with abrupt changes. We choose Temporal Convolution Network (TCN) [3] because it outperforms canonical RNNs such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. The intuition is that many events are responsible for abrupt changes in the stock market, and the correlation analysis between events and changes offer explanations.
Specifically, to address the problem of prediction with abrupt changes, we extract events from financial news and structurize them into event tuples, e.g., "Britain exiting from EU " is represented as (Britain, exitinд f rom, EU ). Then entities and relations in event tuples are linked to KGs, such as Freebase [5] and Wikidata [39]. Secondly, we vectorize structured knowledge, textual news, as well as price values respectively, and then concatenate them together. Finally we feed these embeddings into a TCN-based model. Experiments demonstrate that KDTCN can react to abrupt changes in the stock market more swiftly than state-of-the-art methods. Furthermore, based on prediction results with abrupt changes, we address the problem of making explanations. We visualize the effect of events, and also present the linkage among events with the use of KG. By doing so, we make explanations of (i) how knowledgedriven events influence the stock market fluctuation in different levels, and (ii) how knowledge helps to associate events with abrupt changes in stock trend prediction.
To the best of our knowledge, KDTCN is the first one to utilize Temporal Convolution Network to make stock trend prediction, integrating structured knowledge graph, textual news as well as time-series price values. Additionally, KDTCN is capable of explaining prediction results particularly with abrupt changes. In this paper, the next section reviews related work on stock price prediction and ML explanation. Section 3 presents the architecture of KDTCN and specifically introduces the model. Section 4 presents the experiments and evaluation. Section 5 makes a conclusion of the paper and discusses the future work.

RELATED WORK 2.1 Deep Models for Stock Prediction
Traditional models of stock prediction are mostly based on sequence modeling with sequence data input. Traditional Recurrent Neural Networks (RNNs) [14,35,42] are powerful in discovering the dependency of sequence data, however suffer from vanishing gradients and thus have difficulty in capturing long-term dependencies. Long Short-Term Memory (LSTM) [21] overcomes this limitation. There have already existed some RNN-based stock prediction. [26] have proposed TreNet, a novel end-to-end hybrid neural network, to learn local and global contextual features for predicting stock trend. A dual-stage attention based RNN (DA-RNN) [31], is able to capture long-term temporal dependencies appropriately with attention mechanism. These ML methods with time-series values input, are adapted to the relatively stable stock market, but they have difficulty in reacting to abrupt changes of the stock market swiftly.
Except for value-based models [13,46], there are also some methods using texts. [22] have proposed hybrid attention networks to predict stock trend based on the sequence of recent news. [10,11] have extracted events from news and demonstrated that deep learning is useful for event-driven stock movement prediction. Although these models convert unstructured text to structural events, they only utilize texts, making performance limited. [1,44] use both numerical and textual data. [1] have modeled temporal effects of past events on opening prices with LSTM. [44] have presented a deep generative model to predict stock movement from tweets and historical stock prices. However, these two models are unable to represent chaotic social text effectively, as they only utilize bag-ofwords or word embeddings without capturing structured relations.

Knowledge-driven Models and Explanation
Despite substantial efforts have been made for stock prediction, most of them only learn features from numerical and textual data, while ignore background knowledge. [9] have demonstrated that incorporating knowledge can help capture inconsistent evolution of stream data, thereby make more accurate prediction. [12] have proposed to incorporate KG into the learning process of event embeddings, which can encode valuable background knowledge. Besides, knowledge-driven ML models also show their strengths in other broad domains, such as recommender systems. [23] have proposed a novel knowledge enhanced sequential recommender, integrating RNN-based networks with Key-Value Memory Network (KV-MN). [41] have proposed a deep knowledge-aware network (DKN), incorporating KG to news recommendation.
In these knowledge-driven models, eXplainable AI (XAI) is very important. In ML literature, work on explanation often focuses on visualizations of prediction. Beyond that, research focuses on two broad approaches to explanation [4]. The first is prediction justification, where a (usually non-interpretable) model [8,18,38] and prediction are given, and a justification for the prediction must be produced. The second is interpretable models, aiming to devise models that are intrinsically interpretable and can be explained by reasoning. In this paper, we focus on the first one. There are many producing justifications, and they focus on interpreting predictions of specific complex models, often by proposing to isolate contributions of individual features. [33] have proposed to explain models by presenting representative individual predictions and their explanations in a non-redundant way. They have explained predictions of each classifier by learning an interpretable model locally.

METHODOLOGY
In this section, we first present the overview of the proposed knowledge-driven temporal convolutional network (KDTCN) for stock trend forecasting and explanation. Then we introduce each model component in detail.

Model Overview
The overview of KDTCN architecture is shown in Figure 2. Original model inputs are price values X, news corpus N , and knowledge graph G. The price values are normalized and mapped into the price vector, denoted by where each vector p t represents a real-time price vector on a stock trading day t, and T is the time span.
In news corpus, pieces of news are represented as event sets E, and are structurized by open domain information extraction (Open IE [15]) through leveraging linguistic structure. Each event is structurized into an event tuple e = (s, p, o), where p is the action or predicate, s is the actor or subject and o is the object on which the action is performed. Then, each item in event tuples is linked to KG. Note that event items in this paper refer to the s, p and o in the event tuple (s, p, o), and they also correspond to entities and relations in KG. We obtain event embeddings V by training both event tuples and KG triples. Finally, event embeddings, combined with price vectors are input into a TCN-based model [3] for stock trend prediction and explanation particularly with abrupt changes.
Our purpose is to forecast the movement of target stock index trend y with abrupt changes. We predict the binary movement, in which 1 denotes rise and 0 denotes drop, defined by where x t denotes the stock price value on the stock trading day t.

Event Embedding
The goal of event embedding is to learn low-dimension dense vector representations for event tuples e = (s, p, o). We first extract structured event tuples from financial news, and then link them to KG. An event tuple embedding is calculated by multi-channel concatenation of KG embeddings and word vectors for each item.

Event Extraction and Structuralization.
We convert unstructured news texts into structured event tuples by Open IE [15], the goal of which is to read a sentence and extract tuples with a relation phrase and arguments that are related by that relation phrase. Originally, Open IE extracts binary tuples [29], i.e., two arguments connected by one relation phrase. E.g., "Britain exiting from the EU ", with the subject-predicate-object structure, is structured into an event tuple (s = Britain, p = exitinд f rom, o = EU ).
After event extraction based on Open IE, there can be lots of redundancy in generated event tuples. Thus we have also removed useless words (e.g., adjectives and adverbs) in fundamental sentence structures, to ensure that event tuples E are concise enough.

Entity Linking and Extension.
After getting concise event tuples, we construct a sub-graph from KG by utilizing the technique of entity linking [36], in order to disambiguate named entities in texts by associating them with predefined entities in KG. Note that the subject s, predicate p and object o in an event tuple may not always have linkage in KG. Besides, information in a single event tuple may be sparse and lack diversity. Thus, we enrich the sub-graph by importing immediate neighbors of linked entities within one hop in KG. To formulate these concepts, we propose linkinд(e) and linkinд(r ) to define the entity and relation in an event tuple linked to KG, as well as context (e) to define the immediate neighbors of linked entities in KG, denoted by

Knowledge-driven
Multi-channel Concatenation. We choose TransE [6] as the KG embedding method in this paper, as it could generally preserve structural information in KG with great robustness. As mentioned before, not all entities and relations in event tuples can be linked to KGs. In these situations, zero padding [40] is Besides, the context of each linked entity tend to be more than one entity and relation, so the context embedding is calculated by averaging. We then parameterize event representations in different channels, denoted by V l in the channel of KG linking, V c in the channel of KG context, and V w in the channel of words.
where e s l , e o l ∈ linkinд(e), and r p l ∈ linkinд(r ); e s c , e o c ∈ context (e), and r p c ∈ G; V e s w , V r p w , and V e o w are the word vectors of s, p, and o respectively; V * represents the embedding of * .
Then we concatenate V l , V c and V w in multiple channels to get the final event embedding, denoted by

Temporal Convolutional Network
We refer to the presented TCN architecture proposed by [25,32], and note that the basic TCN model we adopt in this paper is based on a generic architecture described by [3]. TCN [3,25,32] uses a 1-D fully-convolutional network (FCN) architecture [27], where each hidden layer has the same length as the input layer, and zero padding [40] is added to keep subsequent layers the same length as previous ones. In this way, the network can produce an output of the same length as the input. Besides, TCN  Then, in the following, we describe how techniques from present convolutional architectures are integrated into a TCN, considering both deep networks and long-range dependence.

Dilated Convolutions.
Formally, for a 1-D sequence input X ∈ R n and a filter F , the dilated convolution operation on j th element in the sequence X is defined as where d is the dilation factor, k is the filter size, and the subscript X j − i · d denotes the direction of the past. In fact, dilation can be regarded as importing a fixed step between every two adjacent filter taps. Each layer consists a set of dilated convolutions with rate parameter d, a non-linear activation f (·), and a residual connection that combines the layer's input and the convolution signal. d increases consecutive layers within a block, calculated by d l = 2 l . Convolutions are only applied over two timestamps, t and t − d. Specifically, filters can be parameterized by weight matrices W = [W 0 ,W 1 ] and the bias vector b, where W i ∈ R F w ×F w , b ∈ R F w , and F w denotes the number of filters. Z are results after dilated convolution and adding the residual connection at timestamp t respectively, denoted bỹ where V ∈ R F w ×F w denotes the weight matrix and e ∈ R F w denotes the bias vector for the residual block.

Residual
Connections. [20] has presented that a residual learning framework can ease network training, indicating residual blocks benefit very deep networks. Referring to [3,32], we define a series of residual blocks, each of which contains a series of L convolutional layers. Activations in the l th layer and j th block are Z (l, j ) ∈ R F w ×T , where T is the time span, and F w is same at each layer. The calculation of Z (l, j ) is defined in Equation (12). Within a residual block, the TCN has two layers of dilated causal convolution and non-linearity, and we use ReLU [30]. For normalization, we apply weight normalization to convolutional filters. In addition, a spatial dropout [37] was added after each dilated convolution for regularization: at each training step, a whole channel is zeroed out.
However, whereas in standard ResNet [20] the input is added directly to the output of the residual function, in TCN (and ConvNets in general) the input and output could have different widths. To account for discrepant input-output widths, we use an additional 1 × 1 convolution to ensure element-wise addition ⊕ receiving tensors of the same shape.

Interpretation of Event Effects
We adopt a well-explored concept, effect, in the prediction, meaning its contribution towards or against the predicted class [34,45]. In this paper, stock trend prediction can be regarded as the problem of binary classification. To make it simple, the discriminant function for data instance (event tuple e) in the binary-class classifier is where e i denotes the instance value of events, and the weight coefficients θ i is learned from training data for each class y. Then the classifier predicts the class of the event instance as the one that maximizes the predictor function, through a monotonic non-linear distortion function φ: So the effect of the i t h event towards or against predicting class y for a data instance can be denoted as

EXPERIMENTS
The experiments mainly consist of two parts: (i) prediction evaluation and (ii) case-based explanation for prediction.

Datasets & Baselines & Settings
Datasets. Datasets in this paper are listed below.
•  [39]. We construct a sub-graph based on them, which contains 64958 entities and 716 relations totally. Baselines. We consider commonly-used baseline model variations as shown in Table 1.

Raw Data
Processed Training Data ARIMA [2] X Settings. With regard to settings of training process, stochastic gradient descent (SGD) [24] optimizer is used, with k kernels and L levels residual blocks. The best performance is gotten when k = 2 and L = 10. The dimension of hidden units is 100, and the dimension of word embedding, entity embedding as well as relation embedding are all set to 50. In KDTCN, a dropout rate of 0.5 is used to avoid over-fitting, and the learning rate is 1 × 10 −5 . We split the raw value dataset into training set and testing set with ratio 0.8 and 0.2. We evaluate prediction performance with two different evaluation metrics, including (i) Accuracy and (ii) F 1 score .

Prediction Evaluation
Performance of KDTCN is shown in three progressive aspects: (i) evaluation of basic TCN architecture, (ii) influence of different model inputs with TCN, and (iii) TCN-based model performance for abrupt changes.

Basic Evaluation for TCN.
In order to demonstrate that generic TCN architecture can outperform some traditional prediction models, we make comparisons with them, shown in Table 2. Note that all experiments reported in this part are only input with price values.
In Table 2, we observe that TCN greatly outperforms baseline models on the stock trend prediction task. TCN achieves much better performance than either traditional ML models (such as ARIMA), or deep neural networks (such as LSTM and CNN), indicating that TCN has more obvious advantages in sequence modeling and classification problems. Therefore, we choose TCN as our basic prediction model in this paper.

Different Model Inputs with TCN.
For the sake of validating effectiveness of integrating knowledge graph, financial news corpus, and price values in stock trend prediction, we compare prediction performance of models with different inputs, shown in Table 3.
As seen, WB-TCN and EB-TCN both get better performance than TCN, indicating textual information helps to improve forecasting. Analogously, compared PVWB-TCN and PVEB-TCN with WB-TCN and EB-TCN, the comparison results reflect that price values are also useful in stock trend prediction. KDEB-TCN outperforms other baselines, which accounts for structured knowledge greatly working on stock trend prediction. Moreover, KDTCN gets both the highest accuracy and F 1 scores, and such a result demonstrates the validity of model input integration with structured knowledge, financial news, and price values.

Model Performance for Abrupt Changes.
We intend to verify if knowledge-driven models can effectively capture abrupt changes of stock trend prediction in this part. We first get time intervals of abrupt changes by figuring out the difference of stock fluctuation degree D f luctuat ion between two adjacent stock trading days [22], calculated by where x t and x t −1 denotes the stock price value on the stock trading day t and t −1 respectively. Then the difference of fluctuation degree C is defined by: Intuitively, the larger |C i | is, the more likely for the i th day locates in the time interval of abrupt changes. If |C i | exceeds a certain threshold, it can be considered that the stock price abruptly changes at the i t h day. In order to identify a proper range of thresholds, we show the performance based on data distribution of |C| in Figure 3.  Figure 3, overall, higher accuracy is achieved when |C| are in the interval of 0.015 to 0.036, except for TCN marked in olive. TCN achieves the much worse performance, demonstrating that the numerical-data-based stock prediction model may perform poorly on prices with abrupt changes. Besides, KDTCN achieves more stable and better performance than other baselines, showing advantages of knowledge-driven models with integration of knowledge, texts and values. Then we calculate the average Accuracy and F 1 score of KDTCN and baselines on datasets with abrupt changes, where |C| are in the interval of 0.015 to 0.036, presented in Table 4.  Table 4: Stock trend prediction results over the local DJIA index dataset of abrupt changes, with different model inputs.

Seen from
We observe that models with knowledge-driven event embedding input, such as KDEB-TCN and KDTCN, can greatly outperform numerical-data-based and textual-data-based models. These comparison results indicate that knowledge-driven events have significant impacts on the stock trend prediction with abrupt changes, and knowledge-driven models have advantages in reacting to abrupt changes in the stock market swiftly. Furthermore, KDTCN achieves better performance than KDEB-TCN, demonstrating that integrating price data to knowledge-driven models can also benefit stock trend prediction with abrupt changes.

Explanation for Prediction
Explanation in this paper belongs to human-centric justification of ML predictions [4]. We explain why knowledge-driven events are common sources of abrupt changes to human without ML expertise. The explanations are accomplished in two aspects: (i) visualizing effects of knowledge-driven events on prediction results with abrupt changes, and (ii) retrieving background facts of knowledge-driven events by linking the events to external KG.

Effect Visualization of Events.
We calculate different event effects on stock trend prediction with Equation (14) and (15), then we visualize the results in Figure 4.
The prediction result in Figure 4 is that trend of DJIA index will drop. Note that the bars of the same colour have the same event effect, the height of bars reflects the degree of effects, and the event popularity 3 declines from left to right. Intuitively, events with higher popularity should have greater effects on stock trend prediction with abrupt changes, but not always. As seen, events of Brexit and EU Referendum both play an important role in forecasting downward trends. Nearly all other events with negative effect are related to these two events, e.g., (British Pound, drops, nearly 5%) and (Northern Ireland, calls for poll on United Ireland). Although there are also some events have positive effects of predicting stock trend to rise, and have high popularity, i.e., (Rich, Getting, Richer), the total effect is negative. Therefore, abrupt changes of the stock index fluctuation can be viewed as the combined result of effects and popularity of events.

Visualization of Event Tuples
Linked to KG. We present the sample KG triples linked to event tuples in Figure 5.  First, we search the event tuples with great effects or high popularity in stock trend movements. Then, we backtrack to the news texts containing these events. Finally, we retrieve associated KG triples linked to event tuples by entity linking. In Figure 5, each event tuple is marked in blue, and entities in it are linked to KG. We also mark DJIA in red. As seen, we illustrate that knowledge can help to associate events with abrupt changes in stock trend prediction together. These listed event tuples, such as (Britain, existing from, EU ), (United Kingdom, votes to leave, European Union), (British Pound, drops, nearly 5%), (J. K. Rowlin, leads the charge for, Scottish independence), and (Northern Ireland, calls for poll on United Ireland), are not strongly relevant literally. However, with the linkage to KG, they can establish association with each other, and strongly related to events of Brexit and EU Referendum. Besides, these knowledge-enhanced events can also have connections with DJIA, thus originally sparse events can be closely linked and have combined effects on DJIA index. Thus, examples in Figure 5 explain how knowledge-driven events work on stock movements, and why knowledge-driven models are valid. Moreover, by incorporating explanations of event effects, we justify that knowledge-driven events are common sources of abrupt changes.

CONCLUSIONS AND FUTURE WORK
In this paper, we propose a novel knowledge-driven temporal convolutional network (KDTCN) to tackle the problem of stock trend prediction and explanation with abrupt changes. We extract structured event tuples from financial news, and utilize background knowledge from KG to associate discrete event tuples with each other. Through training both event tuples and KG triples, we get knowledge-driven event embeddings. Furthermore, we integrate price vectors and event embeddings as prediction model inputs by multi-channel concatenation. We utilize TCN to predict stock trend, and also explain prediction results based on knowledge. The experiments on stock datasets demonstrate that integrating structured knowledge to TCN can (i) greatly outperform present deep models when forecasting stock trend with abrupt changes, and (ii) make explanation on prediction results with abrupt changes. Through the event effect visualization and knowledge-enhanced event tuple visualization, we explain how knowledge influences greatly on stock trend with abrupt changes.
Based on research in this paper, we have identified several potential directions of this work, mainly includes a more general evaluation of different event effects on stock trend, a study on longrange dependency of events, and a more specific experiment on prediction explanations. At a stock trading day, there are various events influencing stock movements. We will figure out different effects of them and categorize these events based on effects, for example, which type of events affects stock trend to a great extent. Beside, the effect of an event may change as time goes by, thus to capture the dynamic effects of events is valuable. Furthermore, in this paper, we only give a case-based experiment on how knowledge help to establish association among sparse events. In the future, we will propose some quantitative indices to evaluate the effectiveness of knowledge, and give more specific explanations.