Forecasting ETFs with Machine Learning Algorithms, 2017


#1

Liew, Jim Kyung-Soo, and Boris Mayster. “Forecasting ETFs with Machine Learning Algorithms.” (2017).

原文pdf

翻译版

本文标题: 《Forecasting ETFs withMachine Learning Algorithms》

即《基于机器学习算法的ETF预测》

本文作者 :Jim Kyung-Soo Liew 和 BoriSmaySter

翻译 :笪洁琼

校对 :吴谣

编辑 :小满

简介: 本文在有效市场假说中的背景下,使用深层神经网络(DNN)、随机森林(RF)、支持向量机(SVF)三种机器学习算法对SPY(标普500)、TIP(美国通胀债券)、FXE(欧元做多)等ETF的未来价格走势方向进行预测。

本文约3000字,阅读约需要5~10分钟

Machine learning and artificial intelligence (AI) algorithms havecome home to roost.These algorithms, whether we like it or not, will continueto permeate our daily lives. Nowhere is this more evident than in their currentuses in self-driving cars, spam filters, movie recommendation systems, creditfraud detection, geo-fencing mar- keting campaigns, and so forth. The usage ofthese algorithms will only expand and deepen going forward. Recently, StephenHawking issued a forewarning: “The automation of factories has alreadydecimated jobs in traditional manufacturing, and the rise of AI is likely toextend this job destruction deep into the middle classes” (Price [2016]).Whether we agree or disagree with the virtues of automation, the only way tobetter utilize its potentials and evade its dangers is to gain a deeperknowledge and appreciation of these algorithms. Moreover, quite nearly upon usis the next big wave called the Internet of Things (IoT), whereby increasinglymore devices and common household items will be interconnected and streamterabytes of data. As our society is deluged with data, the critical questionthat emerges is whether machine learning algorithms contribute a net benefit toor extract a net cost from society.

机器学习和人工智能(AI)算法已经深入人心,这些算法,不管我们喜欢与否,都将继续渗透到我们的日常生活中。这一点在他们目前在自动驾驶汽车、垃圾邮件过滤、电影偏好推荐、信用欺诈检测、地区限制营销活动等方面的应用中最为明显。而且这些算法的使用会不断扩展和深化。最近,斯蒂芬·霍金发布了一个预警:“工厂的自动化已经大量减少传统制造业的工作岗位,人工智能的兴起很可能将这种就业破坏波及到中产阶级”(Price[2016])。不管我们是否同意自动化的原理,而且还要更好地利用它的潜力,避免它的危险,唯一的办法就是对这些算法有更深入的了解和应用。此外,我们即将迎来下一次称为物联网(IOT)的大浪潮,越来越多的设备和普通家庭物品将相互连接,并传输TB(千兆)字节的数据。随着我们的社会充斥着大量的数据,出现的关键问题是机器学习算法究竟是为社会带来净收益,还是给社会增加负担。

While the future loomslarge for machine learning and AI, one pocket of their development appears tohave been deliberately left behind—namely, in finance and more so in hedge fundsthat attempt to predict asset prices to generate alpha for their clients. Thereason is clear: One trader’s gain in applying a well- traded learning algorithm is another’s loss. This edgebecomes a closely guarded secret, in many cases, defining the hedge fund’s secret sauce.In this work, we investigate the benefits of applying machine learningalgorithms to this corner of the financial industry, which academic researchershave left unexamined.

尽管机器学习和人工智能的未来前景一片光明,但它们发展的其中一个领域似乎被有意落在了后面,即金融领域,尤其是那些试图预测资产价格、为客户创造阿尔法值(Alpha)的对冲基金。原因很简单:一个交易者在应用一个交易良好的学习算法中的收益将是另一个交易者的损失。在许多情况下,这种优势变成了一个需要严格保密的秘密,决定了对冲基金的秘密(操作)。在这项研究工作中,我们将研究将机器学习算法应用到金融行业这一领域的好处,而学术研究人员目前似乎尚未对此进行研究。

We are essentially interested in understandingwhether machine learning algorithms can be applied to predicting financialassets. More specifically, the goal is to program an AI to throw off profits bylearning and besting other traders and, possibly, other machines. This goal isrumored to have already been achieved by a hedge fund, Renaissance Technology’s Medallion fund.The Medallion fund, cloaked in mystery to all but a few insiders, has generatedamazing performance over an extended time period. Renaissance Technologyclearly has access to enough brain power to be on the cutting edge of anymachine learning implementation, and as much as others have tried to replicatetheir success, Medallion’s secret sauce recipe has yet to be cracked. In this article, we attemptto unravel several potential research paths in an attempt to shed light on howmachine learning algorithms could be employed to trade financial market assets.We employ machine learning algorithms to build and test models of prediction ofasset price direction for several well- known and highly liquid exchange-tradedfunds (ETFs).

我们主要感兴趣的是,机器学习算法是否可以应用于金融资产的预测。更确切地说,我们的目标是编写一个人工智能程序,通过学习和击败其他交易者(可能还有其他机器)以获得利润。有传言说,这一目标已经由一家对冲基金-文艺复兴科技公(RenaissanceTechnology)旗下的大奖章基金(Medallionfund)实现了。除了少数内部人士以外,所有人都对这只大奖章基金隐藏的神秘保持好奇,但在很长一段时间里,它创造了令人惊叹的业绩。文艺复兴科技显然拥有足够的脑力,可以在任何机器学习实现中处于领先地位,而且尽管其他人试图复制他们的成功,大奖章基金的秘密仍有待破解。在本文中,我们试图解开几个潜在的研究路径,试图阐明如何使用机器学习算法来交易金融市场资产。本文将利用机器学习算法建立和检验几个著名的高流动性交易所交易基金(ETF)的资产价格方向预测模型。

Markets are efficient or, at a minimum, at leastsemi-strong form efficient. All publicly available information is reflected inthe stock prices, and the pricing mechanism is extremely quick and efficient inprocessing new information sets. Attempting to gain an edge is nearlyimpossible, especially when one tries to process widely accessible publicinformation. Investors are therefore better off holding a well-diversifiedportfolio of stocks. Clearly, Fama would side with those who have little faith in the abilityof these machine learning algorithms to process known publicly availableinformation and, with such information, gain an edge by trading.

市场是有效的,或者至少是半强式有效的。所有公开的信息在股票价格中都是被引用的,而定价机制在处理新的信息集合方面是非常快速和高效的。试图获得优势几乎是不可能的,特别是当一个人试图处理可广泛获取的公共信息时。因此,投资者最好持有多样化的股票投资组合。市场中有些人声称可以通过机器学习算法处理已知公开信息,并利用这些信息在交易获得优势,显然,Fama(引用文献中作者,三因子模型鼻祖)是对这些人的技能持怀疑态度的一方。

Many researchers have documented evidence thatasset prices are predictable. Jegadeesh and Titman [1993] and Rouwenhorst[1998] showed that past prices help predict returns. Fama and French [1992,1993, 1995] showed that the fundamental factors of book-to-market and sizeaffect returns. Liew and Vassalou [2000] documented the predictability offundamental factors and linked these factors to future real gross domesticproduct growth. Keim [1983] documented seasonality in returns, with morepronounced performance in January. For more recent evidence on internationalpredictability, see Asness, Moskowitz, and Pedersen [2013]. Whether thepredictability stems from suboptimal behavior along the reasoning ofLakonishok, Shleifer, and Vishny [1994]; limits to arbitrage by Shleifer andVishny [1997]; or some unidentified risk-based explanation by Fama and French[1992, 1993], it nonetheless appears that predictability exists in markets.

许多研究人员都有证据证明资产价格是可预测的。Jegadeesh和Titman[1993]和Rouwenhorst[1998]表明,历史价格有助于预测回报。Fama和French[1992,1993,1995]表明,市值比对市场和规模大小的基本因素影响回报。Liew和Vassalou[2000]记录了基本面因素的可预测性,并将这些因素与未来实际国内生产总值(GDP)增长联系起来。Keim[1983]记录了回报的季节性,在1月份的表现更为显著。有关国际上的可预测性的最新成果,见Asness、Moskowitz和Pedersen[2013]。可预见性是否来自Lakonishok,Shleph,andVishny[1994]等讲的次优行为决策机制,或者如Shleph和Vishny[1997]的讲的套利限制,或者Fama和Franch[1992,1993]的一些未确定的基于风险的解释,可预见性似乎在市场中存在着。

We employ the most advanced machine learningalgorithms, namely, deep neural networks (DNNs), random forest (RF), andsupport vector machines (SVMs). Our results are generally similar across thealgorithms employed, with a slight advantage for RF and SVMs over DNNs. Wereport results for the three distinct algorithms and are interested inpredicting price changes in 10 ETFs. These ETFs were chosen for theirpopularity as well as their liquidity, and their historical data were sourcedfrom Yahoo Finance. Because we are interested in predicting the change inprices over varying future periods, we employ daily data. The horizons that weattempt to predict range from trading days to weeks and months.

我们采用了最先进的机器学习算法,即深层神经网络(DNNs)、随机森林(RF)和支持向量机(SVMs)。我们发现在所使用的算法结果大致相似,与DNN相比,RF和SVMs有一点优势。我们展示了三种不同算法的结果,并对预测10只ETF的价格变化感兴趣。

选择这些ETF是因为它们的受欢迎程度和流动性,它们的历史数据来源于雅虎金融(YahooFinance)。因为我们对预测未来不同时期价格的变化感兴趣,所以我们使用每日数据。我们尝试预测的范围从交易日到周和月不等。

We test several information sets to determine whichsets are most important in predicting across differing horizons. Ourinformation sets are based on(A) prior prices, (B) prior volume, © dummies fordays of the week and months of the year, and (ABC) all our information sets combined.We find that (B) volume is very important for predicting across the 20 to 60day horizon. Additionally, we document that each feature has very lowpredictability, so we recommend that model builders use a wide range offeatures guided by finan- cial experiences and intuition. Our methodology wasconstructed to be robust and allow for easy switching and testing of differentinformation set specifications and securities.

我们测试了几个信息集,以确定哪些信息集在不同的情况时最重要。我们的信息集是基于。(A)历史价格,(B)历史数量,©一年中每周几天和几个月的数据虚拟变量,以及(ABC)我们所有的信息集的总和。我们发现(B)容积对于预测20到60天的基准线是非常重要的.。

此外,我们记录到每个特性的可预测性都很低,因此我们建议建模者在金融经验和直觉的基础上使用覆盖面较广的特征。

我们的方法被构建成具有鲁棒性(稳健性)的,并允许轻松切换和测试不同的信息集、规格和有价证券。

The next section describes the procedures weemployed and the assumptions made in this work, with a focus on applying bestpractices when applicable. Afterward, we discuss the machine algorithmsemployed. We then move into the details of our methodology and present our mainresults. Finally, we present our thoughts on implementation, weaknesses in ourapproach, implications, and conclusions.

下一节将描述我们在这项工作中所采用的程序和所做的假设,重点是在适用的情况下应用最佳实践。随后,我们讨论了所使用的机器算法。然后,我们深入研究方法和细节,并展示我们的主要结果。最后,我们提出了我们对执行的想法,我们方法的不足,影响,和结论。

接下来将描述我们在这项工作中所采用的程序和所做的假设,重点是在适用的情况下应用最佳实践。随后,我们讨论了所使用的机器算法。然后,我们深入研究方法和细节,并展示我们的主要结果。最后,我们提出了我们对执行的想法,我们方法的不足,影响,和结论。

PROCEDURE WITH EMBEDDED BEST PRACTICES

Machine learning algorithms are extremely powerful, and most can easily overfit any dataset. In machine learning parlance, overfitting is known as high variance. Fitting an overly complex model on the training set does not perform well out of sample or in the test set. Prior to the introduction of cross-validation, researchers would rely on their best judgment as to whether a model was overfit. Currently, the best practices for building a machine learning predictive model are based on the holdout cross-validation procedure.

嵌入式最佳实践过程

机器学习算法非常强大,而且大多数算法都可以轻松地超过任何数据集。在机器学习术语中,过度拟合被称为高方差。在训练集上安装过于复杂的模型,在样本外或测试集中的效果并不是很好。在引入交叉验证之前,研究人员将依赖于他们对模型是否过于拟合的最佳判断。目前,构建机器学习预测模型的最佳实践是基于holdout[1]交叉验证过程。

  • 注:[1]使用holdout方法,我们将初始数据集(initial dataset)分为训练集(training dataset)和测试集(test dataset)两部分。训练集用于模型的训练,测试集进行性能的评价。

We therefore employ the holdout cross-validation procedure in our article. We split the data into three components: the training set, the validation set, and test set. Best practices state that we should construct the model on the training set and validation set only and use the test set once. Repeatedly using the training set and validation set on that part of the data that does not contain the test set is known as holdout cross-validation. Although cross-validation is readily employed in many other fields, some criticize its use in financial time-series data. Nonetheless, we believe our results are still inter esting because we are investigating the foundational question of whether using machine learning algorithms works in modeling changes in prices.

因此,我们在我们的文章中采用了holdout交叉验证的程序。我们将数据分成三个组件:训练集、验证集和测试集。最佳实践表明,我们应该在训练集和验证集上构造模型,并且只使用测试集一次。在不包含测试集的部分数据上重复使用训练集和验证集,称为保持交叉验证。尽管交叉验证在许多其他领域都很容易使用,但有人不看好它在金融时间序列数据中的运用。尽管如此,我们相信我们的结果仍然令人感兴趣,因为我们正在研究使用机器学习算法是否能在价格变化建模方面发挥作用的这一基本问题。

The Irksome Stability Assumption

Arguably the most famous cross-sectional relationship is the capital asset pricing model (CAPM), which states that the expected return on any security is equal to the risk-free rate of returns plus a risk premium. The CAPM’s risk premium is defined as the security’s beta multiplied by the excess return on the market. Market observability has been changed by Roll’s [1976] critique; however, generally speaking, the theoretical CAPM has become a mainstay in academics as well as in practice. To estimate beta, students are taught to run a time-series linear regression of the excess return of a given security on the excess return on the market. The covariance of a security and the market provides for the fundamental building block on which the CAPM has been constructed.

令人讨厌的稳定性假设

可以说,最著名的基本面关系是资本资产定价模型(CAPM),该模型指出,任何有价证券的预期回报率等于无风险收益率加上风险溢价。CAPM的风险溢价被定义为有价证券的beta值乘以市场上的超额回报率。市场可观测性已被Roll[1976]的批评所改变,然而,一般说来,CAPM理念已成为学术和实践中的中流砥柱。为了估计beta值,学生们被教导把一个给定证券的超额收益对市场组合的超额收益进行时间序列上的线性回归。有价证券和市场组合收益的协方差为构建CAPM提供了基础。

In this work, however, breaking from some finance tradition, we view predictability disregarding the timeseries structure. We make the irksome stability assumption that there is a stable relationship between predicting price changes and the many features employed across our information sets. That is, like modeling credit card fraud and email spam, we assume that the relationship between the response and features is independent of time. For example, we allow our algorithms to capture the relationship that maps the feature input matrix (X) to the output responses (y). With that said, the features are always known prior to the change in prices.

然而,在这部作品中,我们打破了某些金融传统,忽略时间序列结构来看待可预测性。

我们提出了一个令人讨厌的稳定性假设,即预测价格变化与我们的信息集中使用的许多特性之间存在稳定的关系。也就是说,就像对信用卡欺诈和垃圾邮件的建模一样,我们假设响应和特征之间的关系与时间无关。例如,我们允许我们的算法捕获将特征输入矩阵(X)映射到输出响应(Y)的关系。话虽如此,这些特征总是在价格变化之前就已经知道了。

To sum, we incorporate the current best practices of balancing the overfitting (or high-variance) problem with the underfitting (or high-bias) problem. This is accomplished by separating the training sample and performing k-fold cross-validation on the training sample and employing the test sample only once. In this work, we adhere to this best practice when applicable.

总之,我们结合了当前平衡过拟合(或高方差)问题和欠拟合(或高偏差)问题的最佳实践。这是通过分离训练样本,对训练样本进行k次交叉验证,并且只使用一次测试样本来实现的。在这项工作中,我们在适用的情况下坚持这一最佳做法。

We attempt to use machine learning algorithms to answer the following questions:

  1. What is the optimal prediction horizon for ETFs?
  2. What are the best information sets for such prediction horizons?

我们尝试使用机器学习算法来回答以下问题:

  1. 什么是ETF的最佳预测期?
  2. 对这种预测基准而言,什么是最好的信息集?

Because we have a dependent variable (y) as future price movements, either up or down, in this work we are dealing with a supervised learning problem. The true value of the dependent variable is known a priori. We can also test the accuracy of our forecasts. Accuracy is measured by the percentage of times the model predicts correctly over the total number of predictions. Although we could choose from a vast number of algorithms, we restrict our analysis to the following three powerful and popular algorithms: DNNs, RFs, and SVMs.

因为我们有一个因变量(Y)作为未来的价格波动,无论是上升或下降,在这项工作中,我们是处理一个监督学习问题。因变量的真实值是先验已知量。我们也可以检验我们的预测的准确性。准确性是用模型正确预测的次数占预测总数的百分比来衡量的。虽然我们可以从大量的算法中进行选择,但我们的分析仅限于以下三个强大而流行的算法:DNNs、RFs和SVMs。

DNNs

DNNs are defined by neural networks with more than one hidden layer. Neural networks are composed of perceptrons, first introduced by Rosenblatt [1957], who built on the prior work of McCulloch and Pitts [1943]. McCulloch and Pitts introduced the first concept of a simplified brain cell: the McCulloch–Pitts neuron. Widrow and Hoff [1960] improved upon Rosenblatt’s [1957] work by introducing a linear activation function, thus allowing the solutions to be cast in the minimization of the cost function. The cost function is defined as the sum of squared errors, with errors in the context of a supervised machine learning algorithm defined as the predicted or hypothesized value minus the true value. The advantage of this setting allows for the change of only the activation function to yield different techniques. Setting the activation function to either the logistic or hyperbolic tangent allows us to arrive at the multilayered neural network. If the networks have more than one hidden layer, we arrive at our deep artificial neural network (see Raschka [2015]).

深层神经网络(DNNs)

DNNs是由具有多个隐藏层的神经网络定义的。神经网络是由感知器组成的,最早由Rosenblatt[1957]提出,他是在Mcculloch和Pitts[1943]的基础上建立起来的。McCulloch和Pitts提出了简化脑细胞的第一个概念:McCulloch-Pitts神经元。Widrow和Hoff[1960]改进了Rosenblatt[1957]的工作,引入了线性激活函数,从而将解决方案转换为最小化的成本函数。成本函数定义为误差平方之和,在监督机器学习算法中,误差定义为预测值或假设值减去真实值。此设置的优点是只允许更改激活函数以产生不同的应用。将激活函数设置为逻辑回归函数或双曲正切函数都可以使我们到达多层神经网络。如果网络有一个以上的隐藏层,我们就会得到我们的人工深层神经网络(见Raschka[2015])。

The parameters or weights in our DNN setting are determined by gradient descent. The process consists of initializing the weights across the neural network to small random numbers and then forward propagating the weights throughout the network. At each node, the weights and input data are multiplied and aggregated, then sent through the prespecified activation function. Prior layers are employed as input into the next layer and repeated. Once the errors have been forward propagated throughout the network, backward propagation is employed to adjust the weights. Weights are adjusted until some maximum number of iterations has been met or some minimum limit of error has been achieved. It should be noted that the reemergence of neural networks can be attributed to backward propagation contribution, which allowed for a much quicker convergence to the optimal parameters. Without backward propagation, this technique would have taken too long for convergence and would remain much less popular.

我们的DNN设置中的参数或权重是由梯度下降法确定的。该过程包括通过神经网络将权重初始化为小随机数,然后将权重向前传播到整个网络。在每个节点上,权重和输入数据相乘并聚合,然后通过预先指定的激活函数发送。前一层被用作下一层的输入并重复。一旦误差在整个网络中进行正向传播,则使用反向传播来调整权重。调整权重,直到满足某些最大的迭代次数或达到某些最小的误差限制为止。值得注意的是,神经网络的重新出现可以归因于反向传播的贡献,它允许更快地汇聚最优参数。如果没有反向传播,这种技术将需要很长的时间才能聚合,而且仍然不太受欢迎。

In our analysis, we employ a DNN algorithm (i.e., a neural network with more than one hidden layer). Between the input and output layers are the hidden layers. We employ two- and three-hidden-layer neural networks and thus a DNN in this work. Recently, a deep learning neural network beat the best human champion in the game Go, showing that this algorithm can be employed in surprising ways.

在我们的分析中,我们采用了DNN算法(即具有多个隐藏层的神经网络)。在输入层和输出层之间是隐藏层。在这项工作中,我们使用了两层和三层隐层DNN。

最近,一种深度学习神经网络在围棋比赛中击败了最好的人类冠军,表明这种算法可以以令人惊讶的方式使用。

RFs

RFs, introduced by Breiman [2001], have become extremely popular as a machine learning algorithm. Much of this popularity stems from their quick speed and ease of use. Unlike the DNN and SVM, the RF classifier does not require any standardization or normalization of input features in the preprocessing stage. By taking the raw data of features and responses and specifying the number of trees in the forest, RF will return a model quickly and often outperforms even the most sophisticated algorithms.

随机森林(RFS)

由Breiman[2001]提出的RFs作为一种机器学习算法已经变得非常流行。这种流行在很大程度上源于它们的快速运行和易于使用。与DNN和SVM不同,RF分类器不需要在预处理阶段对输入特征进行标准化或规范化。通过获取特征和响应的原始数据,并指定森林中的树的数量值,RF将快速地返回一个模型,通常比最复杂的算法性能还要好。

Decision trees can easily overfit the data, a problem that many have tried to overcome by making the decision tree more robust. Limiting the depth of the tree and number of leaves in the terminal nodes are some methods that have been employed in an attempt to reduce the high variance problem. RFs take a very different approach to gaining robustness in resultant predictions. Given that decision trees can easily overfit the data, RFs attempt to reduce such overfitting along two dimensions. The first is bootstrapping with replacement of the row samples used in a given decision tree. The second is a subset of features that are randomly sampled without replacement at each node split, with the objective of maximizing the information gain for this subsample of features. Parent and child nodes are examined and features are chosen that provide for the lowest impurity of the child node. The more homogeneous the elements within the child split, the better the branch is at separating the data.

决策树可以很容易地过度拟合数据,许多人试图通过使决策树更加鲁棒性来克服这个问题。限制树的深度和终端节点上的叶子数量是为了减少高方差问题而采用的方法。RFS采取了一种非常不同的方法来获得结果预测的稳健性。考虑到决策树可以很容易过度拟合数据,RFs尝试在两个维度上减少这种过度拟合。第一种是通过替换给定决策树中使用的行样本来引导。第二种是特征子集,在每个节点分裂时随机采样而不替换。其目标是最大化地获得该子样本特征的信息提升。检查父节点和子节点,并选择提供子节点的最低杂质的特征。子分支中的元素分裂得越均匀,分支分离数据的能力就越强。

Many statisticians were irked by RFs when they were initially introduced because they only provided a limited number of features. At that time, the model-building intuition was to employ as much data as possible and to avoid limiting the feature space. By limiting the feature space, each tree has slightly different variations, and thus the average across the many trees, also known as bagging the trees within the forest, provides for a very robust prediction that easily incorporates the complexity in the data. RFs will continue to gain even more appeal with the added benefit of allowing researchers to see the features that are most important for a given RF prediction model. We will present a list of the most important feature per ETFs later in this work, and our results show the complexity of predicting across our ETF asset classes.

在最初引入RFs时,许多统计学家对此感到厌烦,因为它们只提供了有限数量的特征。

在那个时候,建立模型的直觉是尽可能地使用更多的数据,并避免限制特征空间。

通过限制特征空间,每棵树都有稍微不同的变化,因此许多树的平均值(也称为森林中的套袋树)提供了一个非常可靠的预测,很容易将数据的复杂性考虑到。RFs将继续获得更多的吸引力,是因为允许研究人员看到对于给定的RF预测模型最重要的特性。我们将在稍后给出每个ETF最重要的特性列表,我们的结果显示了ETF资产跨类别进行预测的复杂性。

SVMs

SVMs, by Vapnik [1995], attempt to separate the data by finding supporting vectors that provide for the largest separation between groups, or maximize the margin. Margin is defined as the distance between the supporting hyperplanes.

One of the main advantages of this approach is that SVMs generate separations that are less influenced by outliers and potentially more robust vis-a-vis alternative classifiers. Additionally, SVMs allow for the option to apply the radial basis function, which allows for nonlinear separation by leveraging the kernel trick. The kernel trick casts the data into a higher dimension. In this higher dimension, linear separation occurs when projecting the data back down into the original dimensional space.

支持向量机(SVMs)

支持向量机,由Vapnik[1995],试图通过寻找支持向量来提供最大的组间分离,或最大化边距。边距被定义为支撑超平面之间的距离。

这种方法的主要优点之一是,支持向量机生成的分离较少受到异常值的影响,而且相对于其他分类器可能更加稳健。此外,支持向量机允许选择应用径向基函数,这允许利用内核技巧进行非线性分离。内核技巧将数据转换为更高的维度。在这个高维空间中,当将数据映射回原始维度空间时,会发生线性分离。

METHODOLOGY

As mentioned earlier, we have chosen widely used and liquid ETFs from various asset classes. The cross-section of ETFs allows us to include cross-asset correlation to boost predictive power. Presumably, investors make their decisions depending on their risk preferences as well as the ability to hold a well-diversified portfolio of assets. Although our list of ETFs is not exhaustive, it does represent well-known ETFs with which most practitioners and registered investment advisors should be well acquainted. The list of ETFs is as follows.

方法

正如前面提到的,我们从不同的资产类别中选择了广泛使用和流动性强的ETF。

ETF的基本面允许我们纳入跨资产相关性,以提高预测能力。据推测,投资者的决策取决于他们的风险偏好以及持有多样化资产组合的能力。虽然我们的ETF名单并非详尽无遗,但它确实代表了大多数从业者和注册投资顾问都应该熟悉的知名ETF。

ETFs的名单如下:

ETF Opportunity Set

  • SPY—SPDR S&P 500;U.S. equities large cap
  • IWM—iShares Russell 2000;U.S. equities small cap
  • EEM—iShares MSCI Emerging Markets;Global emerging markets equities
  • TLT—iShares 20+ Years;U.S. Treasury bonds
  • LQD—iShares iBoxx $ Invst Grade Crp Bond;U.S. liquid investment-grade corporate bonds
  • TIP—iShares TIPS Bond;U.S. Treasury inf lation-protected securities
  • IYR—iShares U.S. Real EstateReal estate
  • GLD—SPDR Gold Shares;Gold
  • OIH—VanEck Vectors Oil Services ETF;Oil
  • FXE—CurrencyShares Euro ETF;Euro

ETF的机会集:

  • SPY-标准普尔500指数;美国股票大盘
  • IWM-罗素2000指数;美国股票小盘
  • EEM-新兴市场ETF;全球新兴股票市场
  • TLT-美国国债20+年ETF;美国国债
  • LQD-美国投资级公司债;美国流动投资级公司债券
  • TIP-美国通胀债券ETF;美国财政部保护证券
  • IYR-美国房地产ETF;美国房地产指数
  • GLD-道富环球投资旗下的黄金ETF;黄金ETF-SPDR
  • OIH-Van Eck旗下的油服ETF;石油服务ETF
  • FXE-欧元ETF;欧元做多

We test the predictability of ETF returns on a set of varying horizons. Although it is common knowledge that stock prices and thus ETF prices follow a random walk on the shorter horizons, thus making shorter-term predictability very difficult, longer horizons may be driven by asset class linkages and attention. Asset classes ebb and flow in and out of investors’ favor. With this intuition, we attempt to predict the direction of price moves, not the magnitude. Thus, we cast our research into a supervised classification problem. The returns are calculated by employing adjusted closing prices (adjusted for stock splits and dividends) for the given time periods as measured in trading days (1, 2, 3, 5, 10, 20, 40, 60, 120, and 250 days), using the following formula:

我们在一系列不同的层面上测试ETF回报的可预测性。尽管大家都知道股票价格和ETF价格在短期内是随机波动的,因此较短期的预测非常困难,但较长期的预测可能受到资产类别关联和关注的驱动影响。而且资产类别的涨跌都会受到投资者的青睐。

根据这种直觉,我们试图预测价格变动的方向,而不是幅度。因此,我们将研究引入到一个监督分类问题。按交易日(1、2、3、5、10、20、40、60、120和250天)确定的给定时段的调整收盘价(对股票分割和股息进行调整),并使用以下公式:(1、2、3、5、10、20、40、60、120和250天)。

For each horizon of n days and each ETF, we examine four dataset combinations as explanatory information sets. We employ the term information set as the set of features based on the following explicit definitions. Note that, for any given asset’s change in price, we allow for its own information as well as the other ETFs’ information to influence the sign of the price change over the given horizon. We define our four information sets A, B, C, and ABC as follows:

对于n天的每个时间段和每只ETF,我们将考察四个数据集组合为解释信息集。我们使用术语信息集作为基于以下明确定义的特征集。请注意,对于任何给定资产的价格变化,我们允许它自己的信息以及其他ETF的信息来影响给定视界内价格变化的迹象。我们将我们的四个信息集A、B、C和ABC定义如下:

  • Information set A: previous n days return and j lagged n days return, where j is equivalent to the previous horizon (i.e., for a 20-day horizon, the number of lagged returns will be 10) for all ETFs:

  • 信息集A:所有ETF的前n天回报和j滞后n天回报,对所有ETF的,j等于之前讲的观察窗口(即20天期限内,滞后回报的数目将为10):

  • Information set B: average volume for n days and j lagged average volume for n days, where j is equivalent to the previous horizon for all ETFs:

  • 信息集B:n天的平均成交量,滞后j天的n天的平均成交量线:

  • Information set C: day of the week and month dummy variables.

  • Information set ABC: A, B, and C combined.

  • 信息集C:星期和月的虚拟变量。

  • 信息集ABC:A、B、C组合。

We concentrate our presentation of results on information set ABC, but the other information sets provide insight on the drivers of the predictions across our three algorithms. A priori, we believe that past returns will be the most beneficial in terms of future return predictions, and volume will be useful to boost the results. Many have shown volume to capture the notion of investors’ attention. Higher volumes are typically associated with more trading activity. If trading releases information, then those ETFs with a higher volume of trading should be adjusting more quickly to their true values. Note that we implicitly assume the dollar volume of trading is approximately equal across ETFs and concern our study with share volume. Clearly, the ETF prices are not equal at any given time. However, the intuition is clear: More relative trading volume is an important feature for predictability across ETFs. We would suspect that volume and prior returns should work in tandem. However, we find that volume in isolation works very well; that is, B works well even without A—a rather surprising result.

我们将结果的演示集中在信息集ABC上,但其他信息集提供关于的三个算法进行预测的驱动因素见解。因此我们认为,从预测未来收益的角度来看,过去的收益回报将是最有利的,而成交量将有助于提高结果。许多基金都通过展示成交量来吸引投资者注意力。成交量的增加通常与更多的交易活动联系在一起。如果交易释放了信息,那么那些交易量较大大的ETF应该更快地调整到它们的真实价值。注意,我们含蓄地假设,各ETF的美元交易量大致相等,我们的研究关注的是股票交易量。显然,ETF的价格在任何时候都是不相等的。然而,直觉是清楚的:相对交易量是ETF可预测性的一个重要特征。我们会怀疑,交易量和先前的回报应该是一致的。然而,我们发现单独的成交量数据效果就非常好;也就是说,即使没有A,B也能工作得很好-这是一个相当令人惊讶的结果。

Dummy variables are assumed to boost the performance of algorithms on shorter time periods and to be insignificant on longer horizons. It is important to note that A and ABC datasets are equivalent in number of observations, whereas B and C are not equivalent to A and have one less observation. This occurs because we need n+1 days of adjusted closing prices to compute returns. We only need n days, however, to compute average volume. We are employing the daily volume for that day.

The dependent variable is defined as 1 if n days’ return is equal to or greater than 0 and 0 otherwise:

假设虚拟变量可以在较短的时间周期内提高算法的性能,而在较长的时间范围内则是不显著的。

值得注意的是,A和ABC数据集在观测次数上是相等的,而B和C则不等同于A,而且少了一个观测值。这是因为我们需要n+1天的调整收盘价来计算回报。然而,我们只需要n天就可以计算出平均值。我们用当天的每日成交量。

如果n天收益率等于或大于0,则因变量定义为1,否则为0:

Next, we employ cross-validation. We divide datasets and corresponding dependent variables into training and test sets. Division is done randomly in the following proportion: 70% training set and 30% test set. The training set will be used to train the model and the test set to estimate the predictive power of the algorithms. Following best-practice procedures, we use our training set and perform holdout cross-validation. A validation set is obtained from the training set. Once the k-fold cross-validation has been performed and the optimal hyperparameters have been selected, we employ this model only once on our test set. Many textbooks recommend 10-fold cross-validation in the training set; however, we used only threefold cross-validation, given that larger cross-validation tests would take an even longer time to generate results. We exhaustively searched for the best hyperparameters for each of our algorithms.1 The possible values for hyperparameters for each algorithm are as follows.

接下来,我们使用交叉验证。我们将数据集和相应的因变量划分为训练集和测试集。

分组按以下比例随机进行:70%的训练集和30%的测试集。训练集将用于训练模型和测试集,以估计算法的预测能力。遵循最佳实践程序,我们使用我们的培训集和执行坚持交叉验证。

从所述训练集获得验证集。一旦执行了k次交叉验证,并选择了最优超参数,我们在测试集上只使用了这个模型一次。许多教科书建议在训练集中进行10次交叉验证;然而,我们只使用了三次交叉验证,因为较大的交叉验证测试需要更长的时间才能产生结果。我们竭尽全力地为我们的每一种算法寻找最优超参数。1每种算法的超参数可能值如下。

Hyperparameter Search Space.²

l DNN

  • alpha (L2 regularization term)—{0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0}
  • activation function—{rectified unit linear function, logistic sigmoid function, hyperbolic tan function}3
  • solver for weight optimization—stochastic gradient descent
  • hidden layers—{(100, 100), (100, 100, 100)}

l RF

  • number of decision trees—{100, 200, 300}
  • function to measure quality of a split—{Gini impurity, information gain}

l SVM

  • C (penalty parameter of the error term)—{0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0,1000.0}
  • kernel—{linear, radial basis function}

超参数研究部分

l 深层神经网络

  • 阿尔法,(L2正则化)—{0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0}
  • 激活函数-{校正单位线性函数,Logistic S形函数,双曲tan函数}3
  • 权重优化求解-随机梯度下降法
  • 隐藏层-{(100, 100), (100, 100, 100)}

l 随机森林

  • 决策树数量—{100, 200, 300}
  • 检测分裂—{基尼不纯度,信息增益}

l 支持向量机

  • C(错误项的惩罚参数))—{0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0,1000.0}
  • 核函数{线性,径向基函数}

The best estimator is determined by the accuracy of all possible combinations of hyperparameters values listed. After the best estimator is found, we use test set data to see how the algorithm performs. Scoring for test performance is based on accuracy, where accuracy is defined by how well the predicted outcome of the model compares to the actual outcome. To estimate the performance of each algorithm, we introduce our gain criteria. These criteria show whether the explanatory dataset explains the dependent variable better than randomly generated noise.

最佳估计量取决于所列超参数值的所有可能组合的准确度。在找到最佳估计量之后,我们使用测试集数据来观察算法的执行情况。测试性能的评分是基于准确性的,其中准确性是模型的预测结果与实际结果进行比较好坏来定义。为了估计每种算法的性能,我们展示了我们的增益标准。这些标准显示说明性数据集是否比随机产生的噪声更好地解释了因变量。

Introduce Gain Criteria

The gain criteria are computed as the difference between the accuracy of the model given the input information set and the accuracy of the model given noise data. We define noise as creating random data from a uniform random distribution bounded by 0 and 1 and replacing the original input data with these simulated noise data in the modeling process. We replaced this noise directly into the input feature data space, which preserves the shape of the actual data. We compute the gains by rerunning the same code used previously to obtain accuracy scores for the original data and subtracting the results from the scores obtained by testing best estimators with the actual data. Formally,

增益标准介绍

增益标准是根据输入信息集的模型精度与噪声数据的模型精度之间的差值来计算的。

我们将噪声定义为从以0和1为界的均匀随机分布中创建随机数据,并在建模过程中用这些模拟噪声数据替换原始输入数据。我们将这种噪声直接替换到输入特征数据空间中,从而保留了实际数据的形状。我们计算收益的方法是,重新运行以前用于获得原始数据的精确分数的代码,并从用实际数据测试最佳估计器所得到的分数,减去结果。

从公式上来说,

Using this score, it is easy to compare the performance of each algorithm and choose the one with the highest explanatory power.

利用这一分数结果,很容易比较每种算法的性能,并选择解释能力最高的算法。

RESULTS

First, we compare the test scoring accuracy (Exhibit 1) and prediction gain criteria accuracy (Exhibit 2) of each algorithm on different horizons for the ABC dataset. In Exhibit 1, test score accuracy increases as the prediction horizon increases. We would expect such a pattern given that some of our ETFs have had a natural positive drift over the sample period examined. The gain criteria would adjust for such positive drifts. Notice that the gain criterion ranges from 0% to 35%, compared to the scoring accuracy, which ranges from 0% to 100%.

结论

首先,我们比较了ABC数据集的每种算法在不同水平上的测试评分精准度(示例1)和预测增益标准的精准度(示例2)。在示例1中,测试分数的准确性随着预测窗口的增加而增加。鉴于我们的一些ETFs在样本期内自然出现了正偏移,我们预计会出现这样的模式。增益标准将根据这种正漂移进行调整。注意,增益标准在0%到35%之间,而评分精准度在0%到100%之间。

RF and SVM show close results for all horizons in terms of test set accuracy and gain criterion, whereas DNN converges to the other algorithms at 40 days or more. This may be due to the insufficient number of hidden layers or default values of other hyper-parameters of the DNN classifier. Overall, we see that, even though test set accuracy on average increases with horizon length, the gain criterion peaks at 40 days and steadily falls thereafter. This finding is because for some ETFs,such as SPY, IWM, IYR, and GLD, we see a strong trend toward increases or decreases in price changes at longer horizons (see the chart for deep neural nets TLT in Exhibit A6 in the Appendix), which increases the test set accuracy score but lowers the gain criterion score.

RF和SVM在测试集精准度和增益标准上的结果与其他算法相近,而DNN在40天或更长时间内收敛于其他算法。这可能是由于DNN分类器的隐藏层数不足或其他超参数的默认值所致。总之,我们看到,尽管测试集的准确度平均随期限长度的增加而增加,但增益标准在40天时达到峰值,此后稳步下降。这一发现是因为对于一些ETF,如SPIE、IWM、IYR和GLD,我们看到了较长时间价格变化的强烈趋势(参见附录中表A6中的深神经网络TLT图),这提高了测试集的准确性分数,但降低了增益标准分数。

示例1

示例2

Not surprisingly, the predictive power of algorithms increases with the forecast horizon. The result for the gain criterion was anticipated because, from pre-processing the data, examination suggested that noise level will increase with horizon, thus decreasing the gain. However, gain in predictive power for 120 and 250 days is still high, which raises the question of why such long-term predictions using only technical analysis still have good results. To understand this phenomenon, more analysis is required with the introduction of fundamental data, which is outside the scope of this article.

The next step in our analysis is to see how the algorithms performed in more detail. Using the gain criterion, we compare the performance of algorithms for each ETF from our list (see the Appendix). Generally, we see two patterns in gain behavior. The first is a peak at the 10- to 40-days level and a fall at consequent horizons, and the second is an increase for up to 20 to 60 days and a plateau with fluctuations or a slight rising trend to 250 days. We believe the reasons for such behaviors are the same as previously described.

不足为奇的是,算法的预测能力随着预测时间的推移而增加。增益标准的结果是预期的,因为通过对数据的预处理,检验表明噪声水平会随着期限的增加而增加,从而降低增益。然而,120天和250天的预测能力仍然很高,这就提出了这样一个问题:为什么只使用技术分析的长期预测仍然有好的结果。要理解这一现象,需要更多的分析与基础数据的介绍,这超出了本文的研究范围。

我们分析研究的下一步是更详细地了解算法的执行情况。使用增益标准,我们比较列表中每个ETF的算法性能(参见附录)。一般来说,我们在提升行为中看到两种模式。第一种是在10至40天的水平上的峰值和随后的水平下降,第二种是20至60天的上升趋势和一个有波动或有轻微上升至250天的基准线。我们认为,这种行为的原因与前面所述的相同。

示例4

示例5

The examples of SPY and EEM, respectively, shown in Exhibits 3 and 4.

Because overall DNN performance was lower than that of the other two algorithms on 3- to 20-day horizons, it is not surprising that it shows the same pattern at the single-ETF level. Nonetheless, for some ETFs (i.e., IWM and TLT), the DNN algorithm was able to catch up to the rest at 20 days. SVM and RF show close results for all of the ETFs, but the latter seems to produce less volatile results with respect to horizon. We find horizons of 10 to 60 days to be most interesting across all ETFs. Although for some instances 3-, 5-, 120-, and 250-day periods also draw attention and deserve more rigorous analysis, our goal is to compare the algorithms’ performances and to estimate the possibility and feasibility of meaningful predictions rather than to investigate the specifics of prediction of individual ETFs.

Next, we develop a deeper understanding of the explanatory variables and their significance in terms of predictive power. For that purpose, we examine the average test scores and gain criterion for the A, B, and C datasets for each algorithm and compare them to the ABC results.

图3和图4分别给出了SPY和EEM的例子。由于DNN的整体性能在3至20天内低于其他两种算法,因此它在单一ETF水平上表现出相同的模式并不令人惊讶。然而,对于一些ETF(即IWM和TLT),DNN算法能够在20天内追上其余的ETF。SVM和RF的训练结果接近所有的ETFs,但后者似乎产生较少波动的结果与基准线。我们发现,对于所有ETF而言,10至60天的期限是最有趣的。虽然在某些情况下,3天、5天、120天和250天的周期也会引起注意,值得进行更严格的分析,但我们的目标是比较算法的性能,评估有意义的预测可能性和可行性,而不是研究单个ETF预测的细节。

接下来,我们对解释变量及其在预测能力方面的意义有了更深的理解。为此,我们检查每种算法的A、B和C数据集的平均测试分数和提升标准,并将它们与ABC结果进行比较。

示例6

示例7

示例8

示例9

Let’s start with RF, displayed in Exhibits 5 and 6. Volume (dataset B) effectively explains all the results obtained in previous sections, which is a surprising and unexpected result. Returns (dataset A) have decent predictive power. However, strong at the horizon of 40 days and longer, returns show the same performance as volume and combined datasets. Calendar dummies (dataset C), however, seem to explain a small portion of daily returns and monthly (20 days) returns. We assume this is because dummies are for day of the week and month. Nonetheless, the predictive power of this set is negligible, and a clear contribution can only be seen at a one-day horizon.

SVM results (Exhibits 7 and 8) have the same pattern as RF. However, the overall performance and gain for the returns dataset is closer to those of the combined dataset in comparison with RF. What is more interesting, the combined dataset seems to outperform individual datasets on one- to three-day horizons. Furthermore, calendar dummy variables seem to yield better results but are still not large enough to be significant, and they do not add any predictive power to the combined dataset.

让我们从RF开始,展示在示例5和6中。成交量指标(DataSet B)高效地解释了前几节中获得的所有结果,这是一个令人惊讶和意外的结果。收益值(数据集A)具有良好的预测能力,但是,在40天或更长的时间内,收益数据显示的性能与成交量数据集和组合数据集相同。

然而,虚拟日历(DataSet C)似乎可以解释一小部分每日收益和每月收益(20天)。

我们假设这是因为虚拟模型是一周中的每一天和一个月中的某一天。尽管如此,这组数据的预测能力是微乎其微的,而且只有在一天的时间范围内才能看到明确的贡献。

SVM的结果(图7和8)具有与RF相同的模式。但是,与RF相比,返回数据集的总体性能和收益更接近于组合数据集的性能和收益。更有趣的是,组合的数据集似乎在一到三天的时间跨度上优于单个数据集。此外,日历虚拟变量似乎产生了更好的结果,但仍然不够明显,不足以产生显著影响,而且它们也没有为组合的数据集添加任何预测能力。

As mentioned earlier, DNNs (Exhibits 9 and 10) struggle to show competitive results on horizons less than 20 to 40 days. One- to five-day horizon predictions have effectively no predictive power. Apart from other algorithms, DNN benefits from a combination of datasets. However, the gradation of datasets with respect to gain is the same as for the previous algorithms, as are patterns of change in gains and test scores with varying horizons.

The results of DNN on 10- to 60-day horizons suggest that there is a possibility of improvement in algorithm predictions with combinations of datasets, which is not the case for other algorithms. Overall, we see poor ability to predict short-term returns for all algorithms. The solution to boost results might be an ensemble of algorithms, but such an analysis is beyond the scope of this article.

如前所述,DNNs(示例9和10)很难在20至40天内展示和其他两个算法的预测效果。一至五天的基准线预测实际上没有有效的预测能力。除了其他算法,DNN还得益于数据集的组合。然而,数据集关于提升的分级与以前的算法相同,提升的变化模式和具有不同区域的评测分数也是一样的。DNN在10到60天范围内的结果表明,通过数据集的组合改进算法预测是有可能的,而其他算法则没有这种情况。总的来说,我们认为所有算法预测短期收益的能力都很差。提高结果的解决方案可能是一系列算法的集合,但这种分析超出了本文的范围。

As mentioned earlier, we find horizons from 10 through 60 days to be most interesting in terms of predictive power. Thus, we will examine the performance of the algorithms in these time periods in more detail using the receiver operator characteristic (ROC), which will allow us to compare algorithms from a different angle. Based on ROC, we can compute another measure for algorithm comparison, the ROC area under the curve (AUC). We generated ROC curves for horizons of 10 to 60 days (see the Appendix). The results follow the same pattern as in all previous sections. For example, see the ROC graphs for EEM (Exhibits 11, 12, and 13). We also calculated the AUC for each of the selected horizons, ETFs, and algorithms (Exhibits A2 through A4 in the Appendix).

Longer-horizon ROC curves have an almost ideal form and AUCs close to 1, which suggest high predictive ability with high accuracy. Altogether, we can conclude that predictions for these ETFs are possible.

正如前面提到的,我们发现从10天到60天范围内在预测能力方面是最有趣的。因此,我们将利用算法性能曲线(ROC)对算法在这些时间段内的性能进行更详细的研究,这将使我们能够从不同的角度对算法进行比较。基于ROC,我们可以计算另一个用于算法比较的度量,即曲线下的ROC区域(AUC)。我们生成了10到60天的ROC曲线(参见附录)。结果与前面所有章节的模式相同。例如,见EEM的ROC图(示例11、12和13)。我们还计算了每个选定的区域、ETF和算法的AUCs(示例中展示了A2到A4)。较长时间的ROC曲线形状接近理想,AUCs接近1,表明预测能力强,准确性高。总之,我们可以得出这样的结论:对这些ETF的预测是可能的。

示例10

示例11

示例12

示例13

Feature Importance with RF

We also try to shed some light on which data drive the performance of algorithms by assessing feature importance with RFs. As previously discussed, volume is a good predictor by itself but is more powerful in combination with returns. However, it is unclear which features are actually driving the performance of the algorithms. In the case of SVM, it is only possible to interpret weights of each feature if the kernel is linear. For DNN, it is hard to explain and grasp what relationships exist within the hidden layers. For RF, however, we can compute and interpret the importance of each feature in an easy way.

We decided to examine the importance of 20-day horizon RF features for ETFs (Exhibits 14 and 15). The results show that there is no single feature that would explain most of the returns. Note that on the graph, features are sorted in descending order for each ETF. As one can see, the pattern is the same for all ETFs, meaning that almost all information in the dataset is contributing and is useful for prediction. With the features’ importance structured this way, we see that there is not a single factor that contributes more than 1.6%. However, the dataset also contains lagged variables. The questions that immediately arise are whether a group or groups of features (i.e., volume of SPY and returns on EEM) are more beneficial, and whether we can drop them.

For that purpose, we grouped returns and volumes for each ETF and summed up importance within each group (Exhibit 16). Volume is more important than returns, which confirms the difference in results for the A and B datasets. One of the reasons for such behavior might be a relationship between volume and returns; we assume that might be the result of a relationship obtained by Chen, Hong, and Stein [2001] and Chordia and Swaminathan [2000] in the sense that past volume is a good predictor of future returns’ skewness and patterns. Calendar dummies show little to no influence on predictions, as expected from dataset results.

特征重要性与RF

我们还试图通过使用RFs评估特征的重要性来揭示数据驱动算法的性能。如前所述,成交量本身是一个很好的预测指标,但与回报结合起来则更好。然而,目前还不清楚哪些特性真正在推动算法的性能。在SVM下,只有在核函数是线性的情况下,才有可能解释每个特征的权重。对于DNN,很难解释和掌握隐藏层中存在什么关系。然而,对于RF,我们可以用一种简单的方法来计算和解释每个特性的重要性。

我们决定研究20天内RF特征对于ETF的重要性(示例14和15)。结果表明,没有单一的特征可以解释大部分的收益。注意,在图中,每个ETF的特性都是按降序排序的。可以看出,所有ETF的模式都是一样的,这意味着数据集中的几乎所有信息都是对预测有用。通过这种方式构建特性的重要性,我们看到没有一个因素的影响超过1.6%。但是,数据集也包含滞后变量。立即出现的问题是,一组或多组特征(即SPY数量和EEM收益)是否更有利,以及我们是否可以放弃它们。

为此,我们将每只ETF的收益和成交量进行分组,并对每一组的重要性进行总结(见示例16)。量值比收益更重要,这证实了A和B数据集在结果上的差异。这种行为的原因之一可能是成交量和收益之间的关系;我们假设这可能是Chen,Hong和Stein[2001]和Chordia和Swaminathan[2000]得出的关系的结果,也就是说过去的交易量是未来回报的偏斜度和模式的一个很好的预测因子。正如数据集结果所预期的那样,日历虚拟模型对预测的影响很小,甚至没有影响。

示例14

示例15

示例16

CONCLUSIONS

In this work, we examined the ability of three popular machine learning algorithms to predict ETF returns. Although we restricted our initial analysis to only the direction of the future price movements, we still procured valuable results. First, machine learning algorithms do a good job of predicting price changes at the 10- to 60-day horizon. Not surprisingly, these algorithms fail to predict returns on short-term horizons of five days or less. We introduce our gain measure to help assess efficacy across algorithms and horizons. We also segmented our input feature variables into different information sets so as to cast our research in the framework of the efficient markets hypothesis. We find that the volume information set (B) works extremely well across our three algorithms. Moreover, we find that the most important predictive features vary depending on the ETFs that are being predicted. Financial intuition helps us to understand the prediction variables with complex relationships embedded within the prediction of the S&P 500, as proxied by SPY, requiring a more diverse set of features compared to the complexity of the top feature set needed to explain GLD or OIH.

In practice, the information set could be vastly extended to include other important features, such as social media, along the lines of Liew and Budavari [2017], who identified the social media factor. Additionally, the forecasting time horizons could have been extended even further beyond one trading year or shortened to examine intraday performance. However, we leave this more ambitious research agenda to future work.

One interesting application is to use several different horizon models launched at staggered times within a day, thereby gaining slight diversification benefits for the resultant portfolio of strategies.

In sum, we hope that our application of machine learning algorithm motivates others to move this body of knowledge forward. These algorithms possess great potential in their applications to the many problems in finance.

结论

在这项研究工作中,我们测试了三种流行的机器学习算法预测ETF收益的能力。虽然我们的初步分析仅限于未来价格走势的方向,但我们仍然坚持获得了有价值的结果。首先,机器学习算法在预测10至60天的价格变化方面做得很好。毫不奇怪,这些算法无法预测5天或更短时间内的短期回报。我们将介绍我们的收益测量,以帮助评估算法和跨界的有效性。我们还将我们的输入特征变量分割成不同的信息集,从而将我们的研究置于有效市场假说的框架里。我们发现,信息量集(B)在我们的三种算法中工作得非常好。此外,我们发现,最重要的预测特征的变化取决于被预测的ETF。金融的直觉帮助我们理解标普500指数(S&P 500)预测中包含复杂关系的预测变量,这些预测变量由SPY提供的,与解释GLD或OIH所需的高级特征集的复杂性相比,需要更多不同的特征集。

在实践中,信息集可以广泛扩展,包括其他重要功能,如社交媒体,类似于Liew和Budavari[2017],他们确定了社交媒体因素。此外,预测的时间范围本可以进一步延长,甚至超过一个交易年度,或缩短,以检查当天的表现。然而,我们将这一更加雄心勃勃的研究议程留给今后的工作。一个有趣的应用是使用几种不同的基准线模型,并在一天内错开时间推出的,从而为最终的策略组合获得一些的多样化收益。

总之,我们希望机器学习算法的应用能够激励其他人将这一知识体系向前推进。这些算法在解决金融领域的许多问题中具有很大的应用潜力。

示例A1

示例A2

示例A3

示例A4

示例A5

示例A6

ENDNOTES

1 For the DNN and SVM approaches, we also standardize the data using the training set prior to training and testing estimators. Standardization computes the ratio of the demeaned feature divided by the standard deviation of that feature. Thus, in the training set, each feature has a mean of zero and a standard deviation of one; however, in the test set, the features’ mean and standard deviation varies from zero and one.

2 All other parameters have default values. For more information, see http://scikit-learn.org/.

3 f(x)=max(0, x), f(x)=1/(1+exp(-x)) and f(x)=tanh(x), respectively.

尾注

1 对于DNN和SVM方法,我们在训练和测试估计器之前使用训练集对数据进行标准化。标准化计算被贬低的特征的比率除以该特征的标准差。因此,在训练集中,每个特征的均值为零,标准偏差为1;然而,在测试集中,特征的均值和标准偏差分别为0和1。

2 所有其他参数都有默认值。欲了解更多信息,请参见 http://cikit-Learning n.org/

3 F(X)=max(0,x),f(X)=1/(1+exp(-x)和f(X)=tanh(X)。