The paper entitled “Deep Learning and Factor Investing in China’s Stock Market, a GAN-based Approach”, which was jointly written by our school’s Professor Jiang Fuwei, as well as Ma Tian, a PhD degree candidate of our school, and Tang Guohao, Associate Professor of the College of Finance and Statistics, Hunan University, has been officially accepted by the China Economic Quarterly.
With the advancement of cloud computing over the Internet and other technologies, deep neural network models have found extensive applications in various industries. As a representative of unsupervised deep learning, generative adversarial network (GAN) models have been successfully applied in image and video generation. There are few literatures on research using GAN models in China’s financial field. This paper applies GAN in China’s stock market for the first time, meanwhile emphasizes the economic theory considerations and endeavors behind deep learning. Compared with advanced foreign markets, China’s stock market faces great participation by retail investors and large market volatility. For our GAN models, the discriminator system makes model adaptation more dynamic, which not only preserves the trend terms of the time-series data through the memory unit in the generator, but also further filters the noise through the discriminator when facing a new period of sample data. The empirical results find that the GAN models are much better than the classical linear models in both stock return prediction and factor investing. By using the predictive value ranking method for portfolio construction, we find that the monthly average return and Sharpe ratio of long-short portfolios are 1.13% and 0.71, respectively, which are better than other models, and the FF3- and FF5-model excess returns are also significant.
The paper also examines the economic theory mechanism behind the deep learning. Portfolio analysis finds that technology stocks have higher returns than stocks of traditional sectors. Factor importance analysis shows that the most important characteristic factors are related to three categories: price and volume trend indicators, liquidity indicators, and fundamental indicators. Under the mis-valuation hypothesis, deep learning models are more effective in stocks with low financial friction, low volatility and high liquidity. Macroeconomic condition analysis, which is based on the demonstration from multiple perspectives like macroeconomic activity, economic policy and financial market uncertainty, and investor sentiment, finds that deep learning models can effectively capture the potential risk factors in China’s macroeconomic or financial markets. Micro analysis of corporate situation finds that the deep learning models can effectively predict the fundamentals such as earnings, revenue and cash flow in the short and medium term (within a year).
The paper has the following innovations in deep learning modeling and big data analysis:
First, more effective feature extraction and application of non-linear information. Traditional linear regression models ignore the data properties such as latent factors, sparsity and nonlinearity interaction. This paper, with the construction of big data on stock features, uses GAN model to extract and analyze the nonlinear features in China’s stock market. The empirical results show a significant improvement compared with the linear models, indicating that the nonlinear features in China’s stock market imply important predictive information.
Second, more effective processing of time-series data. During modeling of GAN, we use the long short-term memory (LSTM) network model, which is more suitable for the processing of time-series data. Financial data has long been auto-correlated, and the momentum effect is widely used as a classical anomaly factor in asset pricing. The LSTM retains valid information through the memory unit and filters out “noise” information through the forgetting unit, matching different memory lengths to different asset classes.
Third, a more “intelligent” forecasting model. Unlike the traditional neural network that simply uses gradient descent in the process of model optimization, the GAN introduces the process of “gaming”, whereby the discriminator compares the predictive data obtained in the generative module with the real data and makes classification to evaluate and reject the prediction results. The final optimization result requires the data generated by the generator trick the discriminator. By introducing a discriminator as a “competitor”, the GAN models are structurally superior to simple models.