Professor Fuwei Jiang’s paper “Scaled PCA: A New Approach to Dimension Reduction” has been recently accepted by Management Science, a top international journal in economics and management, reflecting the important breakthrough and landmark achievement made by our school in the field of cross-cutting research in fintech. Professor Dr. Jiang graduated from Singapore Management University and is currently a professor of finance and supervisor of Ph.D. students of our school and director of the Department of Financial Engineering. The journal of Management Science was launched by the American Institute for Operations Research and the Management Sciences in 1954 and is the oldest and most reputable top journal in the field of management and operations research. The journal receives about 1,000 papers from around the world each year and publishes 5% or so of them. Its impact factor was 3.94 in 2020, and that in the last five years stood at 5.47.
Data and information are ubiquitous in the era of big data and artificial intelligence. While providing researchers with richer information, big data also troubles researchers due to its high-dimensional nature, leading to model failure and dimensional disasters. How to efficiently reduce the dimension of big data, extract the most compelling feature information for subsequent prediction, eliminate the adverse effects of redundant noise, and avoid dimensional disaster are currently the hot spots in and key to data science research.
Principal Component Analysis (PCA) is probably the most commonly used data dimension reduction algorithm in economic management. Through variance maximization and linear projection, PCA maps high-dimensional data to low-dimensional spatial representation and significantly reduces the impact of redundant noise and dimensional catastrophe. PCA has become an integral part of many data-driven economic management decision models. However, Professor Jiang and his research partners have pointed out and improved upon a crucial weakness of classical PCA: it is unsupervised learning and ignores the target information completely.
In the paper, Professor Jiang proposes a new data dimension reduction algorithm, “Scaled PCA,” based on supervised learning and proves that Scaled PCA outperforms classical PCA in prediction accuracy. Unlike classical PCA, Scaled PCA, by introducing the target information, scales the variance of the original data first and then carries out factor extraction: gives the variables with predictive solid power higher weights and variances, but assigns lower weights and variances to those with weak predictive power, to reduce the negative impact of redundant variables and noise on the data dimension reduction. In other words, Scaled PCA can use fewer data dimensions to maximally retain feature information related to target prediction in the original data, thus making data dimension reduction more accurate and efficient and prediction more accurate, and facilitating people’s understanding of the simpler algorithm. Therefore, Scaled PCA will be of great application value in empirical research of extensive data-driven economic management.
The paper demonstrates that Scaled PCA has better predictive effects than classical PCA theoretically and empirically. In theory, the paper finds that when noise accounts for the majority of the original data in some weak factor data structures, Scaled PCA can still estimate the effective factors consistently, but classical PCA fails; in strong factor data structures, Scaled PCA is better for small samples than classical PCA and places the factors with strong predictive power for the target at a more important position. The paper analyzes macroeconomic growth, unemployment rate, inflation, and financial market volatility risk empirically. The paper finds that financial volatility and inflation forecasting may be weak factors. They cannot be extracted effectively by classical PCA but can be extracted accurately and effectively by Scaled PCA, so Scaled PCA always beats classical PCA both in-sample and out-of-sample when forecasting financial market volatility risk and inflation; however, the paper also finds that economic growth and unemployment rate forecasting information may be strong factors that can be extracted effectively by both approaches, but Scaled PCA has more accurate forecasting results in low-dimensional scenarios with fewer factors and out-of-sample scenarios where policy practices are more critical.