Multidimensional Time Series Analysis of Financial Markets Based on the Complex Network Approach ()
1. Introduction
Most scholars apply econometric or variance models to analyze time series of financial markets. However, the relationships between various factors are not easy to determine because of the complexity of the financial markets. Therefore, it is difficult to develop models that accurately describe relationships in complex financial systems using traditional time series analysis.
Recently, many scholars have applied complex scientific methods to the analysis of time series, and have discussed the relationship between the dynamic characteristics of time series and complex network topology, which is especially suitable for complex systems research where a precise mathematical model cannot be established. The internal variation law and evolution mechanism of complex systems in various financial markets are obtained by analyzing the time series for each complex system. However, these studies are limited to the analysis of one-dimensional time series data, and rarely observe the structural features and evolution mechanism of the entire financial market from the viewpoint of multidimensional time series.
In this study, we use price time series, and trading volume time series as the objects of our multidimensional analysis. There have been numerous studies of the relationship between volume and price that have found a positive correlation between the rate of return and trading volume volatility.
Here, we study the quantity-price relationship in the stock market using the complex network methodology and develop a method to analyze multidimensional time series data. We use closing price index series data from the Shanghai Composite Index, the Shenzhen stock index, the S & P 500 index, and the Dow Jones Industrial Average to study the relationship between volume and price in different markets, as well as differences between securities markets in China and the United States.
2. Related Studies
The relationship between stock price and trading volume has always been the focus of scholars in this field.
Clark (1973) concluded that stock returns were positively correlated with trading volume using a mixed distribution hypothesis and correlation analysis [1] . Starks and Smirlock (1985) [2] found that there is linear causality between price and trading volume using the Granger method. Bessembinder (1993) [3] studied conditional volatility between stock price and volume. He divided trading volume into two components, expected trading volume and unexpected trading volume, and found that unexpected trading volume led to greater volatility among stock prices. Podobnik and Horvatic (2009) [4] studied the relationship between price return and trading volume volatility and found a power-law relationship between the absolute value of the price return and the absolute value of the trading volume volatility. However, they also found no correlation between the non-absolute values.
Zhang and Run (1998) [5] studied the relationship between price return and trading volume on the Shanghai stock market using the Grainger causality method. Their results showed that price has a significant positive correlation with trading volume Chen and Song (2000) [6] randomly selected 31 stocks for analysis and found that the absolute price change was positively related to the trading volume, and that there was a positive correlation between daily price fluctuations and trading volume and an asymmetric quantity-price relationship. He and Liu (2005) [7] added the transaction volume to the GARCH model and EGARCH as an alternative to the information flow. Wang et al. (2012) [8] tested the quantity-price relationship with regard to China’s stock market using the stochastic volatility model and the Bayesian estimation method based on Markov chain Monte Carlo (MCMC) and found a positive correlation between trading volume and price fluctuations.
In recent years, various scholars have proposed a geometric approach based on time series. Zhang and Small (2006) [9] proposed a ring-cut method for mapping pseudo periodic time series
into network models. The advantage of this method is that phase space reconstruction does not depend on the sequence, and can avoid the loss of space dynamic information, which is caused by inaccurate reconstruction of selected dimensions. However, the network construction method is dependent on the selection of parameter thresholds. If these are too large or too small, they will affect the characteristics of the network structure. Yang et al. (2008) [10] proposed a fixed window length method to map time series into complex networks. This method of constructing networks also depends on the parameter thresholds. Its advantage is that by adjusting the phase space dimension and correlation coefficient threshold, the degree distribution of the network is kept unchanged over a wide range of parameters. Lacasa et al. (2008) [11] proposed another method of constructing complex networks from time series, i.e. the view method. The network that is constructed using this method is an undirected graph, and has the limitation that it can only be used for time series analysis with one variable. Li et al. (2011, 2014) [12] [13] [14] proposed a space-distance method based on phase space reconstruction to construct complex networks. This method can divide chaotic and random sequences by controlling the reconstructed phase dimension m. Thus, it is possible to reveal the dynamic law contained in time series data. Although this method relies on phase space reconstruction, it reveals the characteristics of the sequence by observing the changes in the spatial topological structure of the corresponding mapping network with increasing m values.
3. Method
The method for mapping time series into complex networks that is proposed in this paper is based on expanding the space-distance method proposed in the literature [12] [13] [14] .
The mapping algorithm [12] includes a definition of a node, a definition of distance, and a connecting rule.
Algorithm Definition
1) Definition of a node
A node is defined as a point in an m-dimensional reconstructed phase space. For a time series,
in a reconstructed phase space
.
Here, m denotes the dimension of the embedding space. The total number of nodes k is calculated by
.
2) Definition of distance
The Euclidean distance between two nodes i and j is given by
(1)
3) Connecting rule
We define the connecting rule as follows. Let
denote the maximum phase space distance.
is called the judgment distance (or the equipartition of the maximum distance in the phase space). Two nodes i and j will only be connected if
.
However, this method is mainly aimed at one-dimensional time series. Thus, for multidimensional time series, we propose the following methods.
1) First, we define the price change rate, the trading volume change rate, and the indices, which represent the quantity-price relationship.
R is the rate of change of prices, namely the rate of return:
is the fluctuation (rate of change) of the trading volume:
Let
be a measure of the relationship between the rate of change of prices and the volatility of the trading volume. According to previous studies, there is a regular relationship between the stock market’s return and the fluctuation in its trading volume. The product of the return rate and trading volume volatility can enlarge (or shrink) stock market fluctuations, but if there is no inherent relationship between the return and the volatility of the corresponding trading volume, the continuous fluctuation over the period of observation is likely to be a random and irregular sequence. Thus, we study the randomness of the continuous fluctuation series over the whole period to determine the corresponding relationship between the return rate and trading volume volatility. At the same time, we reduce the two-dimensional data to one-dimensional data by multiplying R and
.
Because the product of R and
contains information regarding the mutual influence between price and trading volume, its regular distribution can indirectly indicate that price and trading volume affect each other. Formula (2) represents the influence of the yield on the volatility of the trading volume, while Formula (3) represents the impact of trading volume volatility on returns.
(2)
(3)
2) The method used to analyze two-dimensional time series is presented.
The first step is to obtain the time series data for the stock, including the daily closing price series and the corresponding trading volume sequence (daily bar). By calculation, two time series can be obtained: the variable rate of the price index
, and the variable rate of the trading volume. Both of these time series are normalized.
The second step is to reduce the dimension of the price return series and the trading volume fluctuation sequence. First, the two sequences are transformed into the distance relationship matrix, and each element of the price index change rate sequence and the trading volume change rate sequence is regarded as a node. The distance between nodes is calculated as
, and two distance relationship matrices are obtained, matrix Dis_R and matrix
. Dis_R is as follows:
The first line of the Dis_R represents the distance between R(t1) and all the other elements
. That is, the first line in matrix Dis_R contains the spatial location information for R(t1), and can be represented as D1.
The first line of matrix
represents the distance between
and all the other elements
. That is, the first line of matrix
contains the spatial location information for
, and can be represented as
.
3) In the third step, we calculate the influence of the yield rate on the change in trading volume in sliding window T.
The method is as follows. Suppose the time window T = 5. In the first window, multiply the spatial location information R(t1), R(t2), R(t3), R(t4), R(t5) by the spatial location information
. That is, rows one to five of matrix Dis_R are multiplied by the fifth column of matrix
to obtain
,
,
,
, and
, respectively. Then, in the second window, rows two to six of matrix Dis_R are multiplied by the sixth column of matrix
to obtain
,
,
,
, and
, respectively. This continues until the last window, i.e.
,
,
,
, and
, respectively. Thus, five time series are obtained.
represents the influence of the yield rate at the earliest point on the change rate of the volume at the last point in each time window.
represents the influence of the yield rate at the second point on the change rate of the volume at the last point in each time window.
represents the influence of the yield rate at the fifth point (current time) in each time window on the change rate of the volume at the last point in each time window.
4) The fourth step is to reconstruct the phase space for each time series sequence
and map it into a network model for analysis. Assuming that the reconstructed phase space is m and the time delay is τ, the reconfiguration is given by:
The element
is a vector. Each vector is considered as a node, and then the distance between the nodes is calculated and the connection edge between nodes is determined. As a result, the network model is established, and a description of the complex network characteristic can be derived.
The two-dimensional time series analysis method can also be applied to multidimensional time series.
If a complex system has sequence data with multiple attribute characteristics, the first step is to obtain time series data for n attributes of the system, such as
,
,
,
.
The second step is to obtain the respective distance relationship matrices for n sequences. Each element of each sequence is treated as a node. The distances between the nodes are calculated using the formula
, and three relationship matrices consisting of distance data as elements are obtained, i.e., matrix Dis_C1, matrix Dis_C2, and matrix Dis_Cn.
The third step is to calculate the product of the distance relationship matrix for unit t of attribute C1, attribute C2, attribute C3, ∙∙∙ attribute Cn in the sliding window T to obtain the correlation matrix between all attributes. Then, the time series of T correlations between attributes can be obtained.
The fourth step is to reconstruct the phase space for T time series sequences and map them into a network model for analysis. Through the description of the complex network characteristics of the
sequence, we can obtain the stochastic characteristics of the sequence, and thus we can obtain the distribution law for the quantity-price relationship.
4. Empirical Study
The data used in this study were obtained from the Shanghai Composite Index (SHCI), the Shenzhen Component Index (SZCI), and the Standard & Poor’s 500 Index (S & P 500) for the period 2nd January 1993 to 31 December 2012, and the Dow Jones Industrial Average (DJIA) for the period 2nd January 1992 to 31 December 2011.
We first analyzed the impact of yield rate on the rate of trading volume volatility. Table 1 shows the number of nodes and edges of the network for the Shanghai Composite Index and the Shenzhen Component Index for different reconstructed m values.
As can be seen from Figure 1 and Figure 2, in each time window, the Shanghai and Shenzhen price-volume relationship
sequences show certain dy-
Figure 1. Degree distribution in network mapped from the Shanghai Composite Index and the Shenzhen Component Index for
sequences in
when m = 2.
Table 1. Number of nodes and connections for each
sequence in
for different m values for the Shanghai composite index and the Shenzhen component index.
Figure 2. Degree distribution mapped from the Shanghai Composite Index and the Shenzhen Component Index for
sequences in
when m = 5.
namic characteristics. The connectivity in the mapped network diagram decreases as m increases, and the decay rate is significantly smaller than at other points in the window, which shows that the yield rate at the end of each window, that is, in the current period, has a relatively stable relationship with the volatility of the current trading volume. At other points in the window, the randomness of
,
,
, and
is stronger and the number of connected edges in the mapped network graph decays rapidly as m increases. When m = 4, there are very few or no connections in the network diagram, which shows that the influence of yield rate on trading volume volatility is not sufficiently stable, and thus there is no relationship between them.
Similarly, we analyze
,
,
,
, and
for the S & P 500 index and the Dow Jones Industrial Average and obtain the sequences in different reconstruction dimensions. Table 2 shows the number of nodes and the number of edges in each network from the S & P 500 Index and the Dow Jones Industrial Average at various spatially reconstructed m values.
As seen from Figure 3 and Figure 4, within each time window, the number of connections in the network diagram mapped by the S & P 500 index and the
Dow Jones Industrial Average declines rapidly as m increases. The randomness of each
sequence is very strong, and the
sequence does not show the same characteristics as it does in the corresponding Shanghai and Shenzhen sequence mapping network, which indicates that the two indices have no influence on trading volume volatility at any point in the time window.
We continue to analyze the impact of yield rate on trading volume volatility. Table 3 shows the number of nodes and edges connecting the Shanghai Composite Index and the Shenzhen Component Index under different m values for each sequence of the network. Table 4 shows the number of nodes and the number of edges for each sequence network under different m values in the S & P 500 index and the Dow Jones Industrial Average.
Table 2. Number of nodes and connections for each
sequence in
for different m values for the S & P 500 index and the Dow Jones industrial average.
Figure 3. Degree distribution in network mapped from the S & P 500 index and the Dow Jones Industrial Average for
sequences in
when m = 2.
Table 3. Number of nodes and connections for each
sequence in
for different m values for the Shanghai Composite Index and the Shenzhen Component Index.
Table 4. Number of nodes and connections for each
sequence in
for different m values for the S & P 500 index and the dow jones industrial average.
Figure 4. Degree distribution mapped from the S & P 500 index and the Dow Jones Industrial Average for
sequences in
when m = 5.
As can be seen from Figure 5 and Figure 6, within each time window, the
sequence and the
sequence from the Shanghai Composite Index and the
sequence from the Shenzhen Component Index series show certain dynamic characteristics. The connectivity in the mapped network graph decreases as m increases and the decay rate is significantly smaller than at other points in the window, which indicates that at the last point in each window, i.e. at T = 5, the fluctuation of the trading volume in the current period has a relatively stable relationship with the current rate of return. For the Shanghai Composite Index, the three-period lagged volume fluctuation also has a significant impact on the current rate of return.
As can be seen from Figure 7 and Figure 8, the S & P 500 index and the Dow Jones Industrial Average exhibit inconsistent features. The number of connected edges in the network graph mapped by each
sequence decreases rapidly as m increases, which indicates that the randomness is very strong. The
sequence does not show the same characteristics as the corresponding Shanghai and Shenzhen sequence mapping networks, which indicates that trading volume volatility has no influence on yield rate at any point in the time windows of the two indices.
5. Conclusions
In this paper, we present two-dimensional or multidimensional time series analysis methods based on complex networks. These are applied to stock market volume and price analysis by studying the relationship between prices and trading volume fluctuations for the Shanghai Composite Index, the Shenzhen Component Index, the S & P 500 index and the Dow Jones Industrial Average.
sequences showing the influence of yield rate on trading volume volatility are obtained at different points in the time window and the stochastic characteristics of each sequence are studied. The following conclusions are drawn.
1) Regarding the influence of the rate of return on trading volume volatility, the Shanghai Composite Index and the Shenzhen Component Index show that the rate of return has a relatively stable impact on trading volume volatility at the end of each window, but at other points in the window, the impact of yield
Figure 5. Degree distribution mapped from the Shanghai Composite Index and the Shenzhen Component Index for
sequences in
when m = 2.
Figure 6. Network distribution maps for the Shanghai Composite Index and the Shenzhen Component Index for
,
sequences when m = 5.
rate on trading volume volatility is unstable, and the correlation is very weak. Regarding the S & P 500 index and the Dow Jones Industrial Average, the yield rate has no impact on trading volume volatility at any point in the window.
2) Regarding the impact of trading volume volatility on the rate of return, in each time window, the fluctuation in the trading volume of the Shanghai Composite Index has a relatively stable impact on the rate of return. However, at the same time, for the Shanghai Composite Index, the three-period lagged volume fluctuation also has a significant impact on the current rate of return. For the Shenzhen Component Index, only the current fluctuation in the trading volume has a relatively stable influence on the rate of return. For the S & P 500 index and the Dow Jones Industrial Average, trading volume volatility has no impact on yield at any point in the window.
In summary, after comparing the influence of yield on trading volume volatility and the influence of trading volume volatility on the rate of return, we find that the influence of trading volume volatility on yield rate has richer node connectivity in each
sequence network diagram, which indicates that the fluctuation in trading volume has a stronger influence on yield rate.
By analyzing data from the Shanghai Composite Index, the Shenzhen Component Index, the S & P 500 index, and the Dow Jones Industrial Average, we find that the mutual influence between returns and trading volume volatility is more obvious in China’s stock market. The reason for this may be that China’s stock market is still in a relatively immature stage of development. Moreover, there are differences between investors’ trading preference levels and the market information level in Chinese markets and that in more mature markets in foreign countries.
Figure 7. Degree distribution mapped from the S & P 500 index and the Dow Jones Industrial Average for
sequences in
when m = 2.
Figure 8. Degree distribution mapped from the S & P 500 index and the Dow Jones Industrial Average for
sequences in
when m = 5.
Acknowledgements
This work was supported, in part, by the National Natural Science Foundation of China (Grant Nos. 70801066, 71071167, 71071168, 71371200), and by a grant from Sun Yat-sen University Basic Research Funding (Grant Nos. 1009028, 1109115, 16wkjc13).