Topological Structure of Stock Market Networks during Financial Turbulence: Non-Linear Approach

In this paper, researchers utilize mutual information and distance covariance to establish the minimum spanning tree of the financial network of log-returns and trading volumes of the top 96 companies of the United States stock market listed on S&P 100 index. Researchers analyze the United States stock market’s turbulence during 2015 -2016, employing the data from January 2012 to July 2018. For investigating the turbulence, researchers construct three minimum spanning trees of the pre-turbulence, turbulence and post-turbulence. The findings represent that the degree distribution follows the power law and the minimum spanning tree of pre-turbulence contains a notable difference in topological characteristics and network’s measures such degree ratio, betweenness, closeness, eigenvector centrality, node eccentricity, node strength, node domination compared with turbulence and post-turbulence minimum spanning trees. Moreover, the minimum spanning trees constructed by two methods of mutual information and distance covariance are different in topological characteristics and the network’s behavior. Besides, the pre-turbulence and post-turbulence networks are robust against nodes attack, and turbulence network is tenuous against it.


Brief Literature Review
In the real world, most of the complex systems have been represented by complex networks [19]. Intrinsically, the stock market has been explained as a complex system. There exist an intricate relationship between stocks, which causes price oscillation. During the last two decades, researchers have scrutinized stock markets by forming the stock correlation networks, of which the nodes represent stocks and edges between nodes are price oscillation relationships of stocks [18].
Scrutinizing financial systems, particularly stock price markets, using the complex networks perspective has become one of the most widespread fields within econophysics. Additionally, a similar tendency is nowadays coming into sight within the econometrics and finance community researchers.
Classifying methodologically, various methods have been implemented to numerous stock markets, Heimo et al., and Gan et al. use Pearson's correlation coefficient to analyze the New York stock exchange [12,10], Huang et al. use a threshold method to construct China's stock correlation network and analyze the network's structural properties and topological stabilities [15]. Tabak et al. investigate the topological properties of the Brazilian stock market networks by building the minimum spanning tree. Their results suggest stocks tend to cluster by sector [23]. Galazka studies network structure of the Polish Stock Market using Minimum Spanning Tree (MST) and Weighted Random Graph (WRG) and compares them [9]. Coronnello et al. apply random matrix theory and hierarchical clustering techniques to a portfolio of stocks at the London Stock Exchange. Results have shown that the application of just a distinct method is not enough to extract all the economic information in the correlation coefficient matrix of a stock portfolio [3]. Zhuang et al. build a minimum spanning tree of CSI 300 index of the Shanghai and Shenzhen Stock markets and analyze its structure [25].
A few articles have been published about the behavior of networks during financial crises. Majapa and Gossel, look over the topological characteristics of financial networks before and after the 2008 financial crisis of South African stock market. They use correlations networks of the daily closing prices of the South African Top 100 companies from June 2003 to June 2013 to create a minimum spanning tree of before, during and after the financial crisis. The findings reveal that the network shrinks during the crisis period and expands afterwards [17]. Nobi et al., analyze the effects of the 2008 global financial crisis on financial networks of the Korean financial market around the crisis period. They contemplate the prices of stocks belonging to KOSPI 200 (Korea Composite Stock Price Index 200) for three periods, before, during and after the crisis. Threshold networks are built from fully connected cross-correlation networks, and thresholds of cross-correlation coefficients are appointed to acquire threshold networks. Findings show threshold networks are fatter during the crisis than before or after the crisis [20]. Heiberger establishes a longitudinal network of S&P 500 companies and their correlations between 2000 and 2012 the stock network. The results show that stock network changes its composition and transforms into a more centralized topology [11].

Motivation
Computing correlations between different stocks is an attractive topic not only for grasping the statistical aspects of this type of complex systems but also for practical reasons like asset management and portfolio-risk estimation. Inspecting characteristics of financial networks and the behavior of them in time of crisis or market turbulence can assist us to consider a more effective strategy to deal with critical conditions and risk management of the stock investment.
Many researchers apply Pearson's correlation coefficient, which is linear inherently; therefore, it prevents us from capturing the nonlinear behavior of financial networks [14]. For this reason, Fiedor seeks mutual information-based approaches to Warsaw stock exchange and New York stock exchange, [6,7], also, Tao et al. deploy information-theoretic distances to analyze the structure of Shanghai stock exchange [24]. Moreover, a well-accepted principle in the financial area obtain the significant role of traded volume on stock price movement; therefore, we decided to use traded volume alongside log-returns.
In this paper, we consider a network analysis of the turbulence of the United States stock market from 2015 to 2016. The United States stock market turbulence began on 18 August 2015 and ceased in July 2016. For tackling the non-linearity of log-returns, we apply information theoretic approach and distance covariance for constructing networks. Trading volumes are used at the side of the log-returns using the symbolic time series. This study is the first network analysis of the United States stock market's turbulence, using trading volume and logreturn and exploiting information theory and distance covariance. For preventing a massive amount of linkage, we use the minimum spanning tree to represent the anatomy of networks.
In this paper, section one displays the introduction and data. The second part introduces the methodology and construction of networks based on mutual information distances and distance covariance. The third section indicates the result of these methods regarding minimum spanning trees. The conclusion section summarizes the paper's contributions and proposes further research ideas.

Data
The data employed in this paper are extracted from Yahoo! Finance and cross-checked with the Bloomberg database; therefore, the provided data is more completed than other available resources. Data includes 96 quoted companies of S&P 100 Index maintained by Standard & Poor's. This index represents a subset of S&P 500, contains 102 leading U.S. stocks. S&P 100 represents virtually 51% of the market capitalization of the United States equity market. Six companies are excluded from the database because of missing data.
We extracted closing prices, volumes from January of 2012 to the end of July 2018, totally 3309 working days of the stock exchange. The period of data is determined long because of inhibiting the effect of short-term volatility during financial turbulence.
We used the industry sectors taken from the macroclassification of [5]. Sectors are consumer discretionary, consumer staples, energy, financial, healthcare, industrial, IT and material. The data set is divided into three non-overlapping windows regarding the volatility of S&P 100 Index; preturbulence, turbulence, and post-turbulence. The division of these windows is based on the volatility of log-returns of the stock prices within these years. Three windows of considered periods are revealed in table 1.

Methodology
In this section, to start with we introduce a symbolization of time series to reduce the dimension of our data, after that, we use mutual information and distance covariance-based distances to establish the financial networks. Moreover, we define some tools to analyze the networks which we create in the subsequent section.

Symbolization of Time Series
We transform the time series of closing prices of 96 stocks in the S&P 100 to the log ratios between sequential daily closing prices. Log-returns are as follows: Where c t,i is the stock closing price of the company i at time t; also, the traded money at day t is computed as: where v t,i is the trading volume of the company i at day t.
In this paper, we use the symbolization of time series analysis method for reducing the dimension of our data. The time series of each company is transformed into a set of natural numbers; therefore, a new sequence of numbers is initiated for each company. This process is called symbolization. Gabriel et al. apply the symbolization method to analyze currency markets [8]. For more discussion about symbolization see [4], and for selecting an appropriate partition see [21,13].
For applying symbolization method, to begin with, we split an ordered sequence of log-returns into three parts; each part contains a third of the whole time series of a company, and creates the 3-quantiles of each company, where r iT1 is the first 3-quantile, and r iT2 is the second 3-quantile of each company i. Time series of log-return and traded money are transformed into a sequence of numbers s t,i as follows: where ρ i = 1/96 t ρ t,i is the average of traded money of each company i. We transform log-returns and traded money of each company to a sequence of numbers, defined above and arrange it for each window defined at the table. 1 and extract three new time series of pre-turbulence, turbulence, and postturbulence sequences.

Mutual Information Based Network
Pearson's correlation coefficient only seizes the linear interdependencies of the networks. In this section, to address this issue, we propose mutual information networks to find nonlinear dependencies in financial markets. Mutual information is based on Shannon's entropy. Mathematically, Mutual information can be defined for two discrete random variables, X, and Y , which are the symbolized data created in the previous section, as: where p(x, y) is the joint probability distribution function of X and Y , p(x) and p(y) are the marginal probability distributions. Mutual information can be demonstrates as follows as well: where H(X) is Shannon's entropy, which is defined as: and H(X, Y ) is the joint entropy associated with both variables.
In this paper, we use Schürmann-Grassberger estimator which is a Bayesian parametric estimator to estimate Shannon's entropy. This estimator is derived by Dirichlet probability distribution. Its density function describes as follows: where θ i is the prior probability of an event x i which is the i th element of the set χ and Γ(.) is the gamma function. The entropy of a Dirichlet distribution can be calculated with the following equation:Ĥ where D x is the number of data points having value x, |χ| is the number of bins from the discretization step, m is the sample size, and ψ(z) = d dz ln(Γ(z)) is the digamma function. The Schürmann-Grassberger estimator assumes N = 1 |χ| as a prior parameter.
Subsequently, we transfer mutual information to a Euclidean metric which attains the axioms of a metric distance as follows:

Distance Covariance Based Network
As mentioned in the previous section, Pearson's correlation coefficient is principally vulnerable to discover the linear relationship between two variables and could not capture the nonlinear dependencies, therefore; we efficiently utilize distance covariance, initiated by Szekely and Rizzo to measure nonlinear statistical interdependence between two random vectors reliably [22].
Distance covariance between the column vector of p random variables X and a column vector of q random variables Y, which are the symbolized data created at section 2.1., defined as follows: 1+p |s| 1+q dtds (10) where F X , F Y and F XY are the characteristic functions of X, Y and joint characteristic function of X and Y respectively, t ∈ R p and s ∈ R q . Also c p or c q are defined as follows: where Γ(.) is the Gamma function. We use a standardized version of the distance covariance, distance correlation, which is defined as follows:

Tools of the Network Analysis
To analyze the financial networks constructed in previous sections, we need some convenient tools to illuminate the topological characteristics of them. These measures are borrowed from graph theory which empowers us to obtain a more thorough understanding of the structure of financial complex systems.
1. We construct two correlation matrices based on the measure created by (9) and (12), we calculate these measures for each pair of stocks and put the diagonal of these matrices equals to zero.
2. Minimum spanning tree (MST) is an appropriate implement to purify negligible information of the network to contemplate it favorably. Properly using the distance matrices constructed by (9) and (12), we can build a minimum spanning tree (MST) with Kruskal's algorithm [16].
3. Suppose A = [a ij ] N ×N is the adjacency matrix of correlations between the stocks, the degree of a company i which is the number of edges incident to a company in the financial network can be calculated as d i = N j=1 a ij . Degree ratios of a financial network defined as follow: where k = 0, . . . , N − 1 and I k (.) is the function with 4. Node eigenvector measures the centrality of a company and its neighbor companies with calculating the eigenvector of the principal eigenvalues of the adjacency matrix [1,2].
5. Node betweenness measures the role of a company as a link among other paired companies of a network. Betweenness centrality of a node i, β(i) is defined as: where l jk (i) is the number of paths from j to k that pass through i and l jk is the number of paths between nodes j and k.
6. Node eccentricity measures the maximum number of edges in the shortest paths between a company and any other company in a financial network.
7. Node strength typically measures the sum of the weights of the edges connected to a company, mathematically for a company i it is i =j δ ij where δ ij s are the distances defined at (9) and (12). 8. Closeness centrality measures the inverse of the sum of all distances to other companies in a financial network i.e.
9. Domination strength measures which one of the companies or sectors most characteristically affect the network. It is computed as follows: Where λ(i, j) indicates the weight of joints linking the node i, while τ (j) is the power density of the node j defined as follows:

Results
This section exhibits the behavior of the minimum spanning trees regulated on the top 96 companies by market capitalization on the S&P 100 daily closing prices and traded volumes between January 2012, and July 2018 calculated by logarithmic returns. The analysis in this section is based on the result of the correlation and distance matrices built by (9) and (12), as well as the network analysis tools defined in section 2.4.
Before, during and after the financial turbulence, the data set is split into three subdivisions: A pre-turbulence subdivision running from the beginning of January 2012 to 31 st July 2015; a turbulence subdivision running from first August 2015 to 31 st July 2016; and a post-turbulence subdivision, running from early August 2016 to July 31 st , 2018. Names, sectors and related color for each sector of a company are shown in Appendix 2.

Results of Mutual Information Based MSTs
Analyzing the topological structure of the MST of preturbulence depicts in Fig. 22 (Appendix 1) shows that the companies, 3M Company (MMM), Honeywell(HON) and Citigroup Inc (C) carry out a pivotal role in the MST of preturbulence and establish a hub. In this MST, most of the sectors are connected or in a similar region. Fig. 23 (Appendix 1) shows that the companies MasterCard Inc (MA), 3M Company (MMM) and Citigroup Inc (C) comprise the most leading companies in turbulence MST and create a hub; also, the majority of companies are grouped by sectors. As seen in Fig. 24   A high eigenvector score of a company demonstrates it is joined to the companies that themselves have high eigenvector scores. As shown in Fig. 2 the distributions of eigenvectors centralities of pre-turbulence, turbulence and post-turbulence are approximately indistinguishable. The highest eigenvector centrality of pre-turbulence, turbulence and post-turbulence periods be in the hands of 3M Company Energy (MMM), Mas-terCard Inc (MA) and Microsoft (MSFT) respectively.
Analyzing the performance of sectors during three periods encourages us to obtain an adequate understanding of the interconnectedness of the constructed MSTs. Fig. 3 shows that the various sectors maintain peculiar behavior during three periods. It shows pre-turbulence period has more sectors with the high eigenvector average; therefore, during pre-turbulence companies with the high eigenvector centrality tend to connect with companies with high eigenvector centrality.  Closeness centrality score in a complex network represents that the more pivotal a company is, the more adjacent to all other companies. Fig. 4 shows that closeness scores of the companies during pre-turbulence are higher than the closeness of two various periods, in other words, the companies are nearer to each other during the pre-turbulence period. Honeywell (HON) Morgan Stanley (MS) and 3M Company (MMM) obtain the highest score of closeness. Fig. 5 shows that the closeness score of the sectors during pre-turbulence is less than the other two periods.  As shown in Fig. 6, betweenness centrality of MST during the turbulence and post turbulence are alike and marginally different from the pre-turbulence period. The betweenness centrality of 50 companies is zero during pre-turbulence this amount decreases to 47 and 44 companies during turbulence and post-turbulence periods respectively. Zero betweenness centrality score denotes that no shortest path between any two companies in the network passes through those companies. The highest score of betweenness belongs to Honeywell (HON) MasterCard Inc (MA) and 3M Company (MMM). Fig.  7 shows that average betweenness during the turbulence period is high in most of the sectors compared with other periods.  As seen in Fig. 8, the strength of nodes during the perturbulence period is approximately higher than turbulence and post-turbulence periods. Furthermore, Fig. 9 shows that majority of sectors have higher average node's strength during the pre-turbulence period than other periods.shows that average betweenness during the turbulence period is high in most of the sectors compared with other periods. As depicts in Fig. 10 pre-turbulence and post-turbulence domination strengths are approximately alike, but they have an impalpable difference with the turbulence period. The most valuable companies during pre-turbulence, turbulence, and post-turbulence are 3M Company (MMM), Alphabet Inc (GOOGL) and Microsoft (MSFT) respectively. Fig. 11 shows that the eccentricity of the companies during pre-turbulence and post-turbulence is smaller than the MST of turbulence, therefore, the maximum number of edges in a short path of the companies in turbulence is higher than the maximum number of edges in a short path of the companies in the other two periods.

Results of Distance Covariance Based MSTs
In this section, we scrutinize the results of MSTs which are constructed by (12) with helping the tools defined in section 2.4. As shown in Fig. 25 (Appendix 1) companies gathered around a few dominant companies, also the sectors scattered around the MST. Fig. 26 (Appendix 1) shows that Southern Company (SO) is the most pivotal company in this MST, and the majority of companies try to connect to it. As seen in Fig.  27 (Appendix 1) the shape of the MST of post-turbulence is different from the MST of the turbulence period and similar to the pre-turbulence MST. Node's degree Degree Ratio (c) Figure 12. Node degree ratio of distance covariance based MSTs, preturbulence 12a, turbulence 12b, post-turbulence 12c As seen in Fig. 12, most of the companies have a small node degree of one; also, the considerable number of companies with the degree one decreases during the turbulence and postturbulence period. Netflix (NFLX), has the highest node degree in the pre-turbulence period and Southern Company (SO) has the highest node degree in turbulence and post-turbulence periods.  Figure 14. Average eigenvector centrality of sectors for MSTs based on distance covariance, during pre-turbulence, turbulence and post-turbulence. Fig. 13 and Fig. 14 depict that the post-turbulence period is different from other periods and the average eigenvector of the sectors during pre-turbulence is greater than other periods. It means that companies with a high eigenvector score try to connect with the companies with a higher eigenvector score during the pre-turbulence period. Netflix (NFLX) has the highest score in the pre-turbulence period, and Southern Company (SO) has the highest score during the turbulence and the postturbulence periods. As seen in Fig. 15 and Fig. 16, post-turbulence closeness scores are smaller than other periods; also the average closeness of most of the sectors in the turbulence period is greater than others; therefore, companies during the turbulence period are close together. Netflix (NFLX) has the highest closeness score in the pre-turbulence period, and Southern Company (SO) has the highest closeness score during the turbulence and post-turbulence periods. As shown in Fig. 17 and Fig. 18, betweenness score of companies in post-turbulence period is greater than other periods; also, the companies in the utilities sector have a greater average among other sectors in all periods. Netflix (NFLX) has the highest betweenness score in the pre-turbulence period, and Southern Company (SO) has the highest betweenness score during turbulence, and post-turbulence periods. As seen in Fig. 19 the strength of nodes during all periods are approximately similar, but Fig. 20 gives us a better view of MSTs, which confirms that more than 50% of sectors in turbulence period has a higher average compared with other periods. As depicted in Fig. 21a, domination strengths of all periods are approximately similar; also, Southern Company (SO) has the highest score among all companies in all the periods. Fig.  21b shows that the eccentricity of the companies during postturbulence is higher than the MST of other periods; therefore, the maximum number of edges in a short path of the companies in post-turbulence is smaller than the maximum number of edges in a short path of the companies in the other two periods.

Topological stability of MSTs
Topological stability of financial networks typically is measured by removing vertices and edges from the networks. If the characteristics of a financial network are evolved drastically against vertex attack and edge attack, then the network is not robust against those attacks. Randomly, a different percentage of vertices is removed to evaluate the vertex attack in characteristics of the constructed financial network. The similar procedure to evaluate the topological stability of a network can be found at [15].
Definition (Maximal Connected Component): A financial network F is connected if there is a path from any stock to any other stocks in the financial network. A disconnected network can be broke down into a variety of connected sub-networks, such as F , which is referred to as connected components of F . If F is a maximal connected component of F then F is a sub-network of F and if F is also a connected sub-network of F , then F = F .
In this paper, we deploy the size of the maximal connected component of a financial network to discover the connectivity of the financial network. Consequently, we evaluate the stability of networks with deploying the changes of the maximum connected component size before and after the vertex attack during pre-turbulence, turbulence and post-turbulence.
Results in Table. 2 show that the size of the maximal connected component of the MSTs of the pre-turbulence and post-turbulence with the mutual information and distance covariance methods have a slight change after removing a different percentage of stocks compared with turbulence period. Additionally, after removing 40% of companies, this size decreases significantly at the MSTs of pre-turbulence and post-turbulence. In closing, the topological stability of preturbulence and post-turbulence MST with MI and DC methods are more than turbulence period, and their MSTs are robust against vertex attacks.

Conclusions
In this paper, the structure of financial turbulence of the United States stock market occurred during 2015 to 2016 is analyzed with constructing the minimum spanning the tree of the top 96 companies of S&P 100 index for the years of 2012 to July 2018. Time-series of log-returns and trading volume of each company in this index is transformed into symbolized data than the two methods of mutual information, and distance covariance is used to construct the financial networks. Both of the derived measures from mutual information and distance Table 2. Maximally component size of MSTs in the absence of percentage of degrees, during pre-turbulence, turbulence and post-turbulence for Mutual Information (MI) and Distance Covariance (DC). Pre-Turbulence  Turbulence  Post-Turbulence  Percentage  DC  MI  DC  MI  DC  MI  10  76  81  52  57  77  82  20  82  86  61  66  83  86  30  81  84  59  67  80  83  40  83  85  58  62  82  83  50  67  72  46  50  68  73  60  68  73  45  52  69  75  70  68  74  48  51  66  73  80  69  76  49  52  68  77 covariance follow the metric distance axioms. After constructing the correlation matrices of these two distances, the MSTs of pre-turbulence, turbulence, and post-turbulence are created.

Size of Maximally Connected Component
Different network measures derived from MSTs for each method, and the results show that the degree distribution follows power law also there are significant differences between the topological structure and the network measures of the preturbulence period such as degree ratio, betweenness, closeness, eigenvector centrality, node eccentricity, node strength, node domination compared with the measures of the other two periods. However, the more essential companies are obtained for each period.
The behavior of sectors observed with average eigenvector, average betweenness, average closeness, and average node strength. These measures helped us to have a better understanding of the inter-connectedness relationship of MSTs.
Comparing the results derived from sections 3.1 and 3.2 shows that the two methods of building the MSTs, MI and distance covariance have a significant difference to analyze the complex financial networks. The MSTs built by two ways have a completely different structure and shape; moreover, the pivotal companies are different in each method. However, in both modes, the pre-turbulence MSTs have different results compared with the MSTs of the other two periods. Moreover, removing random companies shows that correlation networks of pre-turbulence and post-turbulence display the topological robustness, but turbulence is fragile against randomly node's failure.
Further research could look into choosing the best method to analyze the financial networks non-linearly. However, different symbolization methods of time series can be applied and be compared.