Improving Factor-Based Quantitative Investing by Forecasting Company Fundamentals


#1

Alberg, John, and Zachary C. Lipton. “Improving Factor-Based Quantitative Investing by Forecasting Company Fundamentals.” arXiv preprint arXiv:1711.04837 (2017).

原文pdf

我们的翻译版推文

本文标题: 《Improving Factor-Based Quantitative Investing by Forecasting Company Fundamentals》

即《通过预测公司基本面来改善基本因子的定量投资》

本文作者 :John Alberg Zachary C. Lipton

翻译 :笪洁琼

校对 :吴谣

编辑 :陈益恒

简介: 本文提出了一种基于时间序列分析的股市自动预测方法,在5年观察窗口上基于深度学习网络选取基本面指标训练集进行训练,并建立基于预测基本面数据的选股模型进行回溯分析。

本文约4281字,阅读约需要14分钟

Abstract

On a periodic basis, publicly traded companies are required to report fundamentals: financial data such as revenue, operating income, debt, among others. These data points provide some insight into the financial health of a company. Academic research has identified some factors, i.e. computed features of the reported data, that are known through retrospective analysis to outperform the market average. Two popular factors are the book value normalized by market capitalization (book- to-market) and the operating income normalized by the enterprise value (EBIT/EV).

In this paper: we first show through simulation that if we could (clairvoyantly) select stocks using factors calculated on future fundamentals (via oracle), then our portfolios would far outperform a standard factor approach. Motivated by this analysis, we train deep neural networks to forecast future fundamentals based on a trailing 5-years window. Quantitative analysis demonstrates a significant improvement in MSE over a naive strategy. Moreover, in retrospective analysis using an industry-grade stock portfolio simulator (backtester), we show an improvement in compounded annual return to 17.1% (MLP) vs 14.4% for a standard factor model.

前言

上市公司会定期报告基本情况——收入、营业收入、债务等财务数据,

这些数据为了解公司的财务状况提供参考。学术研究确定了一些因子,即报告数据的计算特征,通过回溯分析得知这些因子的表现优于市场平均水平。两个受欢迎的因子是 账面价值市值比(book- to-market) 和用企业价值 标准化后 的经营收入 (EBIT/EV)

在本文中:首先通过模拟表明,如果我们可以(“ 预言式地 ”)使用未来基本面(通过 预测 ) 的因子计算来选择股票,那么我们的投资组合将远远超过标准因子的方法。基于这一分析的基础,我们将训练深层神经网络来预测未来基本面,这是一个5年后的(数据结果)。定量分析表明, 我们的策略在 MSE维度比初始因子策略 有显著的改进。此外,在回溯分析中,使用行业级股票组合模拟器(backtester),我们显示复合年收益率提高到17.1%(用MLP),而标准因子模型的复合年收益率为14.4%。

1 Introduction

Public stock markets provide a venue for buying and selling shares, which represent fractional ownership of individual companies. Prices fluctuate frequently, but the myriad drivers of price movements occur on multiple time scales. In the short run, price movements might reflect the dynamics of order execution, and the behavior of high frequency traders. On the scale of days, price fluctuation might be driven by the news cycle. Individual stocks may rise or fall on rumors or reports of sales numbers, product launches, etc. In the long run,we expect a company’s market value to reflect its financial performance, as captured in fundamental data,i.e., reported financial information such as income, revenue, assets, dividends, and debt. In other words, shares reflect ownership in a company thus share prices should ultimately move towards the company’s intrinsic value, the cumulative discounted cash flows associated with that ownership. One popular strategy called value investing is predicated on the idea that long-run prices reflect this intrinsic value and that the best features for predicting long-term intrinsic value are the currently available fundamental data.

1 导言

公开股票市场提供了买卖股票的场所,这代表了每个公司的部分所有权。价格波动频繁,但价格变动的无数驱动因素都发生在多个时间点上。在短期内,价格变动可能反映了指令执行的动态结果,以及高频交易商的行为。在日数据频率上,价格波动可能是由新闻周期驱动的。个别股票可能会因市场传言或销售数据、产品发布等消息而上涨或下跌。从长远来看,我们预计公司的市值将反映其财务表现,就像在基本面数据中所反映的那样。即报告的财务信息,如收入、总收入、资产、股息和债务。换言之,股票反映了一家公司的所有权,因此股价最终应该走向公司的内在价值,即与所有权相关的累计贴现现金流。一种被称为价值投资的流行策略是基于这样一种观点,即长期价格反映了这种内在价值,而预测长期内在价值的最佳特征正是当前可用的基本数据。

In a typical quantitative (systematic) investing strategy, we sort the set of available stocks according to some factor and construct investment portfolios comprised of those stocks which score highest. Many quantitative investors engineer value factors by taking fundamental data in a ratio to stock’s price, such as EBIT/EV or book-to-market. Stocks with high value factor ratios are called value stocks and those with low ratios are called growth stocks. Academic researchers have demonstrated empirically that portfolios of stocks which overweight value stocks have significantly outperformed portfolios that overweight growth stocks over the long run [12, 7].

在一个典型的量化(系统)投资策略中,我们根据一些因子对现有的股票进行排序,并将那些得分最高的股票组建成投资组合。许多量化投资者通过将基本面数据与股票价格的比率来设计价值因子,比如EBIT/EV或市值。比率高的股票被称为价值股,而低比率的股票被称为成长型股票。学术研究人员已经在之前的研究成果上证明,那些增持价值型股票的投资组合的表现明显好于那些在期间增持成长型股票的投资组合(此处引用第7和第12项文献)。

To be presented at the Time Series Workshop at the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.

将在美国加州长滩举行的第31届神经信息处理系列研讨会(NIPS 2017)上发表。

In this paper, we propose an investment strategy that constructs portfolios of stocks today based on predicted future fundamentals. Recall that value factors should identify companies that are inexpensively priced with respect to current company fundamentals such as earnings or book-value. We suggest that the long-term success of an investment should depend on the how well-priced the stock currently is with respect to its future fundamentals. We run simulations with a clairvoyant model that can access future financial reports (by oracle).

在本文中,我们提出了一种基于预测的未来基本面来构造今天股票投资组合的投资组合。所谓价值因子是指那些相对于公司基本面指标来说股价低廉的公司,基本面指标例如收益或账面价值等。我们建议,长期的投资收益应该取决于股票目前对其未来基本面的合理定价。我们使用一个“预言式”模型进行模拟,该模型可以访问未来的财务报告( 通过预言 )。

上市公司会定期报告基本情况——收入、营业收入、债务等财务数据,

这些数据为了解公司的财务状况提供参考。学术研究确定了一些因子,即报告数据的计算特征,通过回溯分析得知这些因子的表现优于市场平均水平。两个受欢迎的因子是 账面价值市值比(book- to-market) 和用企业价值标准化后的经营收入(EBIT/EV)。

在本文中:首先通过模拟表明,如果我们可以(“预言式地”)使用未来基本面(通过预测) 的因子计算来选择股票,那么我们的投资组合将远远超过标准因子的方法。基于这一分析的基础,我们将训练深层神经网络来预测未来基本面,这是一个5年后的(数据结果)。定量分析表明, 我们的策略在MSE维度比初始因子策略有显著的改进。此外,在回溯分析中,使用行业级股票组合模拟器(backtester),我们显示复合年收益率提高到17.1%(用MLP),而标准因子模型的复合年收益率为14.4%。

Figure 1:Annualized return for various factor models for different degrees of clairvoyance

图1:不同程度的“ 预测 ”模型的年化回报率

In Figure 1, we demonstrate that for the 2000-2014 time period, a clairvoyant model applying the EBIT/EV factor with 12-month clairvoyant fundamentals, if possible, would achieve a 44% compound annualized return.

在图1中,我们展示了在2000-2014年期间,应用EBIT/EV因子的12个月预言式模型,如果可能的话,将达到44%的复利。

Motivated by the performance of factors applied to clairvoyant future data, we propose to predict future fundamental data based on trailing time series of 5 years of fundamental data. We denote these algorithms as Lookahead Factor Models (LFMs). Both multilayer perceptrons (MLPs) and recurrent neural networks (RNNs) can make informative predictions, achieving out-of-sample MSE of .47,vs .53 for linear regression and .62 for a naive predictor. Simulations demonstrate that investing with LFMs based on the predicted factors yields a compound annualized return (CAR) of 17.1%, vs 14.4% for a normal factor model and a Sharpe ratio .68 vs .55.

根据对未来数据的影响,我们建议根据5年基本数据的 跟踪时间序列 来预测未来的基本数据。我们将这些算法称为前瞻因子模型(LFMS)。多层感知器(MLPs)和递归神经网络(RNNs)都能做出具有信息性的预测,得到的样本外均方误差(MSE)为0.47,而线性回归的MSE为0.53,朴素预测器的MSE为0.62。模拟结果表明,基于预测因子的LFMS投资的复合年化收益率(CAR)为17.1%,而正常因子模型为14.4%,夏普比率为0.68vs0.55。

Related Work Deep neural networks models have proven powerful for tasks as diverse as language translations [14,1], video captioning [11,16], video recognition [6,15], and time series modeling [9,10, 3]. A number of recent papers consider deep learning approaches to predicting stock market performance. [2] evaluates MLPs for stock market prediction. [5] uses recursive tensor nets to extract events from CNN news reports and uses convolutional neural nets to predict future performance from a sequence of extracted events. Several preprinted drafts consider deep learning for stock market prediction [4, 17, 8] however, in all cases, the empirical studies are limited to few stocks and short time periods.

相关工作 深度神经网络模型已经被证明在语言翻译[14,1]、视频字幕[11,16]、视频识别[6,15]和时间序列建模[9,10,3]等多种任务中表现出具有强大的能力。最近的一些论文研究了预测股市表现的深度学习方法。

[2]评估MLPs对股票市场的预测情况。[5]使用递归张量网从CNN新闻报道中提取事件,并使用卷积神经网络从提取的事件序列中预测未来的表现。

有几份预印稿(未刊出)考虑到对股市预测的深度学习[4、17、8],然而,在所有情况下,实证研究都局限于少数股票和较短的时间周期。

2 Deep Learning for Forecasting Fundamentals

Data In this research, we consider all stocks that were publicly traded on the NYSE, NASDAQ or AMEX exchanges for at least 12 consecutive months between between January, 1970 and September, 2017. From this list, we exclude non-US-based companies, financial sector companies, and any company with an inflation-adjusted market capitalization value below 100 million dollars. The final list contains 11,815 stocks. Our features consist of reported financial information as archived by the Compustat North America and Compustat Snapshot databases. Because reported information arrive intermittently throughout a financial period, we discretize the raw data to a monthly time step. Because we are interested in long-term predictions and to smooth out seasonality in the data, at every month, we feed in inputs with a 1-year lag between time frames and predict the fundamentals 12 months into the future.

2 预测基本原理的深度学习

数据 在这项研究中,我们考虑了所有在纽约证券交易所(NYSE)、纳斯达克(NASDAQ)或美国证券交易所(AMEX)上市交易的股票,这些股票在1970年1月至2017年9月之间至少连续12个月上市交易。从这份名单中,我们排除了非美国的公司,金融部门的公司,以及任何经通货膨胀调整后的市值低于1亿美元的公司。最终名单上有11,815只股票。我们的特征包括由Compustat北美数据库和Compustat快照数据库存档报告的金融信息。由于报告信息在整个财政期间断断续续地到达,我们将原始数据与每月的时间长度分离。因为我们对长期预测感兴趣,为了消除数据中的季节性影响因素,在每个月,我们在输入数据时,会在输入时间间隔1年的时间内输入数据,并预测未来12个月的基本面情况。

For each stock and at each time step t, we consider a total of 20 input features. We engineer 16 features from the fundamentals as inputs to our models. Income statement features are cumulative trailing twelve months, denoted TTM, and balance sheet features are most recent quarter, denoted MRQ. First we consider These items include revenue (TTM); cost of goods sold (TTM); selling, general & and admin expense (TTM); earnings before interest and taxes or EBIT (TTM); net income (TTM); cash and cash equivalents (MRQ); receivables (MRQ); inventories (MRQ); other current assets (MRQ); property plant and equipment (MRQ); other assets (MRQ); debt in current liabilities (MRQ); accounts payable (MRQ); taxes payable (MRQ); other current liabilities (MRQ); total liabilities (MRQ). For all features, we deal with missing values by filling forward previously observed values, following the methods of [9]. Additionally we incorporate 4 momentum features,which indicate the price movement of the stock over the previous 1,3, 6, and 9 months respectively. So that our model picks up on relative changes and doesn’t focus overly on trends in specific time periods, we use the percentile among all stocks as a feature (vs absolute numbers).

对于每只股票的t时刻,我们总共考虑了20个输入特征。我们从基本原理中设计出16个特征作为我们模型的输入。损益表特征经过12个月滚动累计,表示为TTM,资产负债表特征为最近一个季度,表示为MRQ。首先,我们考虑的项目包括:收入(TTM);销售成本(TTM);销售、一般及管理费用(TTM);息税前利润或EBIT(TTM);净收益(TTM);现金及现金等价物(MRQ);应收账款(MRQ);库存(MRQ);其他流动资产(MRQ);固定资产(MRQ);其他资产(MRQ);流动负债中的债务(MRQ);应付账款(MRQ);应付税款(MRQ);其他流动负债(MRQ);总负债(MRQ)。对于所有特征,我们按照[9](引用第9项)中的方法,通过填充先前观察到的值来处理缺失的值。此外,我们加入了4个动量特征,这表明股票在前1,3,6,和9个月的价格变动。为了使我们的模型获得相对变化,而不是过度关注特定时间段的趋势,我们使用所有股票中的百分位数作为一个特征(相对于绝对数字)。

Preprocessing Each of the fundamental features exhibits a wide dynamic range over the universe of considered stocks. For example, Apple’s 52-week revenue as of September 2016 was $215 billion (USD). By contrast, National Presto, which manufactures pressure cookers, had a revenue $340 million. Intuitively, these statistics are more meaningful when scaled by some measure of a company’s size. In preprocessing, we scale all fundamental features in given time series by the market capitalization in the last input time-step of the series. We scale all time steps by the same value so that the neural network can assess the relative change in fundamental values between time steps. While other notions of size are used, such as enterprise value and book equity, we choose to avoid these measure because they can, although rarely, take negative values. We then further scale the features so that they each individually have zero mean and unit standard deviation.

每个基本特征在我们的股票池中波动范围都比较大。 例如,截至2016年9月,苹果公司52周的营收为2150亿美元。相比之下,生产高压锅的Presto公司 (National Presto Industries,纽交所代码:NPK)的收入为3.4亿美元。直观地说,这些统计数据在考虑了一家公司的规模时更有意义。在预处理过程中,将给定时间序列中的所有基本特征通过序列最后一个输入时间长度的市值来进行归一化。我们用相同的值来标度所有的时间长度,这样神经网络就可以评估时间长度之间基本面数据的相对变化。虽然使用了其他关于规模的概念,如企业价值和账面权益,但我们选择避免这些度量法,因为它们会有负值(尽管很少)。然后,我们进一步归一化这些特征,使它们各自具有零均值和单位标准偏差。

Modeling In our experiments, we divide the timeline into an in-sample and out-of-sample period. Then, even within the in-sample period, we need to partition some of the data as a validation set. In forecasting problems, we face distinct challenges in guarding against overfitting. First, we’re concerned with the traditional form of overfitting. Within the in-sample period, we do not want to over-fit to the finite observed training sample. To protect against and quantify this form of overfitting, we randomly hold out a validation set consisting of 30% of all stocks. On this in-sample validation set, we determine all hyperparameters, such as learning rate, model architecture, objective function weighting. We also use the in-sample validation set to determine early stopping criteria. When training, we record the validation set accuracy after each training epoch, saving the model for each best score achieved. When 25 epochs have passed without improving on the best validation set performance, we halt training and selecting the model with the best validation performance. In addition to generalizing well to the in-sample holdout set, we evaluate whether the model can predict the future out-of-sample stock performance. Since this research is focused on long-term investing, we chose large in-sample and out-of-sample periods of the years 1970-1999 and 2000-2017, respectively.

模型 在我们的实验中,我们将时间线划分为样本内和样本外的时期。然后,即使在样本内,我们也需要将一些数据划分为样本验证训练集。在预测问题中,我们在防止过度拟合的问题上面临着截然不同的挑战。首先,我们关注的是传统的过度拟合。在样本内,我们不希望过度适应有限观测的训练样本。为了防止和量化这种过度拟合的形式,我们随机地拿出一个由30%的股票组成的验证集。在这个样本验证集中,我们决定了所有的超参数,如学习速率、模型架构、目标函数权重。我们还使用样本内验证集来确定早期停止的标准。在训练时,我们在每一个训练期之后记录并验证集的准确性,为每一个最好的分数保存模型。当25个阶段已经过去,还是没有改善最佳验证集性能时,我们停止了训练,并选择了具有最佳验证性能的模型。除了对样本内的“留存”集进行泛化之外,我们还评估模型是否能够预测未来的样本外库存性能。由于这项研究的重点是长期投资,我们选择了1970-1999年和2000-2017年的大量样本和样本外的时期。

In previous experiments,we tried predicting price movements directly with RNNs and while the RNN outperformed other approaches on the in-sample period, it failed to meaningfully out-perform a linear model (See results in Table 2a).

在以前的实验中,我们尝试直接用RNN来预测价格的变化,而RNN在样本内的表现优于其他方法,它没有明显地超越线性模型(见表2a)。

Given only price data, RNN’s easily overfit the training data while failing to improve performance on in-sample validation. One key benefit of our approach is that by doing multi-task learning, predicting all 16 future fundamentals, we provide the model with considerable training signal and may thus be less susceptible to overfitting.

如果只考虑价格数据,RNN很容易超出培训数据,并不能提高样本内验证的性能。我们的方法的一个主要好处是,通过多任务学习,预测未来的16个基本面数据,我们为模型提供了大量的训练信号,因此可能不太容易被过度拟合。

The price movement of stocks is extremely noisy [13] and so, suspecting that the relationships among fundamental data may have a larger signal to noise ratio than the relationship between fundamentals and price, we set up the problem thusly: For MLPs, at each month t, given features for 5 months spaced 1 year apart (t — 48, t — 36, t — 24, t — 12),predict the fundamental data at time t + 12. For RNNs, the setup is identical but with the small modification that for each input in the sequence, we predict the corresponding 12 month lookahead data.

股票价格的波动是非常嘈杂的[13],因此,我们怀疑基础数据之间的关系可能比基本面和价格之间的关系有更大的信号噪声比,因此我们设置了这样的问题:对于MLPs,在每个月t特征(指标),给定间隔5个月,数据间距离1年的特征(t-48,t-36,t-24,t-12),预测t+12时基本面数据。对于RNNs来说,设置是相同的,这是要对序列中的每次输入进行小的修改,我们预测相应的12个月的前瞻性数据。

We evaluated two classes of deep neural networks: MLPs and RNNs. For each of these, we tune hyperparameters on the in-sample period. We then evaluated the resulting model on the out-of-sample period. For both MLPs and RNNs, we consider architectures evaluated with 1, 2, and 4 layers with 64,128, 256, 512 or 1024 nodes. We also evaluate the use of dropout both on the inputs and between hidden layers. For MLPs we use ReLU activations and apply batch normalization between layers. For RNNs we test both GRU and LSTM cells with layer normalization. We also searched over various optimizers (SGD, AdaGrad, AdaDelta), settling on AdaDelta. We also applied L2-norm clipping on RNNs to prevent exploding gradients. Our optimization objective is to minimize square loss.

我们评估了两类深层神经网络:MLPs和RNNs。对于其中的每一个,我们都在样本周期中进行调优超参数。然后,我们在样本外的期间评估了所得到的结果模型。对于MLPs和RNNs,我们考虑由64,128、256、512或1024个节点组成的1、2和4层体系结构。我们还评估了丢弃(dropout:防止模型过拟合)在输入端和隐藏层之间的使用情况。对于MLPs,我们使用ReLU激活并在层之间应用批量标准化。对于RNNs,我们同时使用层规范化来测试GRU和LSTM单元。

我们还搜索了各种优化器(SGD、AdaGrad、AdaDelta),最后确定了AdaDelta。

我们还在RNNs上应用了L2-规范裁剪,以防止梯度溢出。我们的优化目标是最小化平方损失。

To account for the fact that we care more about our prediction of EBIT over the other fundamental values, we up-weight it in the loss (introducing a hyperparameter α1). For RNNs, because we care primarily about the accuracy of the prediction at the final time step (of 5), we upweight the loss at the final time step by hyperparameter (as in [9]). Some results from our hyperparameter search on in-sample data are displayed in Table 1. These hyperparameters resulted in MSE on in-sample validation data of 0.6141 for and 0.6109 for the MLP and RNN, respectively.

为了解释这样的一个事实,我们对EBIT超过其他基本值的预测结果更为关心,我们在损失中增加了权重(引入了超参数α1)。对于RNNs,因为我们主要关心的是在最后一个时间长度(5)的预测的准确性,所以我们通过超参数(如[9])在最后的时间长度上增加了损失的权重。表1显示了我们对样本内数据进行超参数搜索的一些结果。这些超参数导致MLP和RNN的样本内验证数据的均方误差分别为0.6141和0.6109。

(a) Out-of-sample performance for the 2000-2014 time period.All factor models use EBIT/EV.QFM uses current EBIT while our proposed LFMs use predicted EBIT.Price-LSTM is trained to predict price directly.

(a)2000-2014年期间的样本外业绩。所有因素模型都使用EBIT/EV。QFM使用当前EBIT,而我们建议LFMS使用预测EBIT。价格-LSTM被训练来直接预测价格。

(b) MSE over out-of-sample period for MLP(orange) and native predictor(black)

(b)MLP(橙色)和本地预测因子(黑色)的超出样本外的均方误差(MSE)

Figure 2:Quantitative results

图2:定量结果

Evaluation As a first step in evaluating the forecast produced by the neural networks, we compare the MSE of the predicted fundamental on out-of-sample data with a naïve prediction where predicted fundamentals at time t is assumed to be the same as the fundamentals at t — 12.To compare the practical utility of traditional factor models vs lookahead factor models we employ an industry grade investment simulator.The simulator evaluates hypothetical stock portfolios constructed on out-of-sample data.Simulated investment returns reflect how an investor might have performed had they invested in the past according to given strategy.

评估 作为评估神经网络产生的预测的第一步,我们将预测基础上的样本外基本数据的均方误差与非样本数据的简单模型数据进行比较,并将预测时间t的基本原理假设与t-12的基本原理相同。为了比较传统因素模型与前瞻性因子模型的实际效用,我们采用了行业级投资模拟器。该模拟器评估基于样本外数据的假想股票投资组合。模拟投资回报反映了投资者在过去根据既定策略投资的可能会有怎样的表现情况。

Table 1: Final hyperparameters for MLP and RNN

表1:MLP和RNN的最终超参数结果

The simulation results reflect assets-under-management at the start of each month that, when adjusted by the S&P 500 Index Price to January 2010, are equal to $100 million. We construct portfolios by ranking all stocks according to the factor EBIT/EV in each month and investing equal amounts of capital into the top 50 stocks holding each stock for one-year. When a stock falls out of the top 50 after one year, it is sold with proceeds reinvested in another highly ranked stock that is not currently in the simulated portfolio. We limit the number of shares of a security bought or sold in a month to no more than 10% of the monthly volume for a security. Simulated prices for stock purchases and sales are based on the volume-weighted daily closing price of the security during the first 10 trading days of each month. If a stock paid a dividend during the period it was held, the dividend was credited to the simulated fund in proportion to the shares held. Transaction costs are factored in as $0.01 per share, plus an additional slippage factor that increases as a square of the simulation’s volume participation in a security. Specifically, if participating at the maximum 10% of monthly volume, the simulation buys at 1% more than the average market price and sells at 1% less than the average market price. Slippage accounts for transaction friction, such as bid/ask spreads, that exists in real life trading.

模拟结果表明每个月开始时管理的资产按截止至2010年1月时标普500指数价格调整后相当于1亿美元。我们通过按每个月的EBIT/EV因子来对所有股票进行排序,并将等额资本投资于持有每只股票一年的前50只股票。当一只股票在一年后跌出排序前50名的时候,它就会被卖出,然后将所得收益投资于另一个高度排名的股票,可能这只股票目前还没有在模拟投资组合中。我们将一个月内购买或出售的证券的股票数量限制在每月的股票交易量的10%以内。股票购买和销售的模拟价格是基于每个月前10个交易日的成交量加权日收盘价。如果某只股票在其持有期间支付股息,则股息按所持股票的比例计入模拟基金中。交易成本被计算为每股0.01美元,再加上一个额外的滑动因子,它随着模拟量参与某只股票交易量的平方而增加。具体来说,如果以每月交易量的10%为上限参与,那么模拟买价会比平均市场价格高出1%,而且模拟卖价比平均市场价格低1%。滑动因子差距阐述了现实交易中存在的交易摩擦,比如买卖价差。

Our results demonstrate a clear advantage for the lookahead factor model. In nearly all months, however turbulent the market, neural networks outperform the naive predictor (that fundamentals remains unchanged) (Figure 2b). Simulated portfolios lookahead factor strategies with MLP and RNN perform similarly, both beating traditional factor models (Table 2a).

我们的结果表明 ,前瞻因子模型具有明显的优势。在几乎所有的月份里,无论市场如何动荡,神经网络的表现都超过了简单的预测(即基本面保持不变)(图2b)。

MLP和RNN的模拟投资组合的前瞻因子策略表现相似,都优于传统的因子模型(表2a)。

3 Discussion

In this paper we demonstrate a new approach for automated stock market prediction based on time series andysis. Rather than predicting price directly, predict future fundamental data from a trailing window of values. Retrospective analysis with an oracle motivates the approach, demonstrating the superiority of LFM over standard factor approaches. In future work we will thoroughly investigate the relative advantages of LFMs vs directly predicting price. We also plan to investigate the effects of the sampling window, input length, and lookahead distance.

3 讨论

本文提出了一种基于时间序列分析的股市自动预测方法。与其直接预测价格,不如从价值概念预测未来的基本面数据。对预测数据的回溯分析激发了这种想法,证明了LFM优于标准因子方法的优势。在今后的工作中,我们将深入研究LFMS相对于直接预测价格的优势。我们还计划研究采样窗口、输入长度和前瞻距离的影响。