大数据与AI策略：机器学习在股票衍生品中的应用.docx-淘文阁

资源描述

《大数据与AI策略：机器学习在股票衍生品中的应用.docx》由会员分享，可在线阅读，更多相关《大数据与AI策略：机器学习在股票衍生品中的应用.docx（65页珍藏版）》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、Global Quantitative & Derivatives Strategy 07 January 2020J. P MorganBig Data and Al StrategiesApplications of Machine Learning in Equity DerivativesGlobal Quantitative and Derivatives Strategy Peng Cheng, CFA AC (1-212)622-5036 peng.cheng Thomas J Murphy, PhD AC (1-212) 270-7377 thomas.x.murphy Mar

2、ko Kolanovic, PhD (1-212) 622-3677 marko.kolanovic J.P. Morgan Securities LLC In this report, we illustrate the applications of machine learning techniques to equity derivatives trading strategies. Specifically, we focus on the topics below: A Practitioner Introduction to Neural Network: We aim to d

3、emystify neural network for our readers in a practitioner friendly way. The neural network architecture is explained by comparing it to the familiar linear regression model. We then move on to using real world data and examine the correspondence between neural network and existing, well known financ

4、ial models for volatility forecasting. Finally LSTM is used to forecast volatility of S&P 500 and EURUSD, and its performance is compared against GARCH. Sentiment Signals for Macro Dividend Trading: We look at the relationship between sentiment information contained in management call transcripts an

5、d the subsequent dividend futures returns. Our analysis shows that after controlling for SX5E returns, sentiment signals contain orthogonal signals on dividend futures returns. Moreover, we develop a trading strategy incorporating the sentiment signal, which is shown to offer performance improvement

6、s to long only dividend futures trading strategy. Sentiment Factor Returns: We analyze sentiment data from the vendor Alexandria Technology between 2000 and 2019 and find that the data contains a significant short term signal. We use a factor model, which controls for traditional factors in order to

7、 produce a pure sentiment factor, and examine its risk adjusted performance, signal decay and correlation with other factors. We also evaluate methods for combining the sentiment factor with traditional style factors, using value as an example. Cross Asset Volatility Optimal Portfolio Construction -

8、 Beyond Risk Parity: There has been increasing evidence of non-normal distribution in cross asset risk premium strategies. Popular portfolio models such as mean variance optimization and risk parity are ill-equipped to address the issue. We propose a framework for constructing cross asset portfolios

9、 that aims to address the challenge and achieves superior performance by incorporating all higher moments (skewness, kurtosis, etc.) and controlling for excess turnover. The framework is first demonstrated with simple stylized cases, followed by a more comprehensive real-world cross asset example. P

10、lease refer to our previous volumes for additional research on similar topics: May 2018, Nov 2018, and Jim 2019.See page 60 for analyst certification and important disclosures.J.P. Morgan does and seeks to do business with companies covered in its research reports. As a result, investors should be a

11、ware that the firm may have a conflict of interest that could affect the objectivity of this report. Investors should consider this report as only a single factor in making their investment decision.Figure 8: Define sigmoid function1 sigmoid 一 function(x2 exp(x)/(1+exp(x)3 )Source: J.P. MorganWe now

12、 move on to the forward propagation step. The chind(l, .) in lines 6 and 8 add the intercepts (biases). The sigmoid function is applied to the hidden node in line 7.、Figure 9: Define forward propagation function5 fwdprop 一 function(xz whz wy)6 h - cbind(lz x) %*% whT h - sigmoid(h) #hidden layer out

13、put - cbind(lz h) %*% wy #output layer return(list(output = output, h = h)10 Source: J.P. MorganSimilar to OLS, our loss function is defined to be least squares. The init.w variable is a vector which contains all the parameters including the intercepts (biases). For now we hard code the first four v

14、alues to the Wh terms and the last two variables to the 叫terms, for the sake of simplicity.Figure 10: Define loss function12 loss.fun - function(init.w, xz y)13 wh = init.wl:414 wy = init.w5:615 y.hat - fwdprop(xz wh, wy)$output16 return(sum(y - y.hat)A2)17 Source: J.P. MorganThe code above constitu

15、tes a one layer, one neutron neural network model. It can be expanded relatively easily to accommodate multiple layers and neurons.Before moving onto the backpropagation step, we first simulate some sample data in order to train the neural network. As opposed to using actual data, we are able to spe

16、cify the exact data generating process in our simulation. Lines 20 - 25 generate 500 normal random variables with mean zero and standard deviation 0.1. In line 26 we define y as a linear function of Xi, X2, X3, and some added noise. The neural network will attempt to use the sigmoid to fit to the li

17、near relationship. This exercise will demonstrate the implication of the choice of activation functions.Figure 11: Simulate insample data19 set.seed(1) #seed for random number generator20nobs - 50021mymean - 022mysd - 0.123xl - rnorm(nobs, mean=mymean, sd=mysd)24x2 - rnorm(nobs, mean=mymean, sd=mysd

18、)2 5x3 - rnorm(nobs, mean=mymean, sd=mysd)26yl - 1 + l*xl + 0.5*x2-0.75*x3 +rnorm(nobs, sd = mysd)Source: J.P. MorganGiven the data, we use Rs built-in optimizer optim() to train the neural network. The first argument is the initial guess of the weights. We assign them six normally distributed rando

19、m variables. The second argument is the loss function, and the subsequent arguments pass the data into the loss function.Figure 12: Train the model using optim()28 mysolution - optim(par = rnorm(6)z fn = loss.fun, 29 x = cbind(xlf x2 z x3)z y = yl)Source: J.P. MorganWe can retrieve the weights from

20、the variable mysolution by examining its par object. However we will not be able to recover the true coefficients since we used a sigmoid activation function in our neural network.To perform an out of sample test, we simulate another 500 variables with the same data generating process, but different

21、 random variables (lines 31 34). In lines 35 - 37 we feed the trained parameters into the forward propagation function and obtain the predictions.Figure 13: Simulate outofsample dataxxl-rnorm(nobs, mean = mymean.sd = mysd)32xx2- rnorm(nobs, mean = rnymean.sd = mysd)33xx3- rnormnobsz mean = mymeanzsd

22、 = mysd)34yyi-1 + l*xxl + 0.5* xx2 + -0.75* xx335mypredictions - fwdprop(x = cbind(xxlz xx2r xx3)r36wh= mysolution$parL:4z37wymysolution?par j?outputSource: J.P. MorganThe out of sample predictions appear reasonable (Figure 14), which is reassuring given the mismatch between the data generating proc

23、ess and our chosen activation function. We observe some nonlinearity for the more extreme values as a result of the sigmoid function.Figure 14: Predicted vs. actual y valuesNeural network predictionsSource: J.P. MorganAActual valuesThe above exercise is of course an idealized example. In practice, t

24、o ensure a reasonable solution, there are many important considerations in the backpropagation process such as the choice of weight initialization, learning rate, etc.Although the specifics are beyond the scope of the report, we provide a simple illustration by slightly modifying the data. We set th

25、e mysd value to 1 instead of 0.1 in line 22. By having variables with a wider dispersion and larger magnitude, we end up having a number of observations falling outside of the linear region of the sigmoid function (i.e. between +5 and -5 as shown in Figure 5). As a result, the fit becomes much worse

26、 (Figure 15). There are a couple of ways to remedy the problem. One option, as discussed previously, is to normalize our data to make sure that it works well with the activation function. Another option is to use more sophisticated optimization function. Here we choose simulated annealing, which is

27、a stochastic gradient descent method and can see that the fit is vastly improved (Figure 16).Figure 15: Predicted vs. actual y values using Nelder Mead algorithm Figure 16: Predicted vs. actual y values using simulated annealingNeural network predictionsNeural network predictionsSource: J.P. MorganS

28、ource: J.P. MorganActual valuesActual valuesFortunately, in our code, simulated annealing only involves one additional parameter (method = SANN in Figure 17).Figure 17: Extra argument required for simulated annealing28 mysolution - optim par = rnorm I) , fn = loss. fun, 29 x = cbind (xlz x2z x3) z y

29、 = ylz method = 1 STkNN1 )Source: J.P. MorganCase study: volatility forecastingMoving on to real world examples, some of the more sophisticated models in finance are also special cases of neural network. Specifically we show that there is a one to one correspondence between ARCH/GARCH used in volati

30、lity forecasting and feedforward/recurrent neural network, respectively. For simplicity, all neural networks shown in this section consist of one hidden layer with one neuron. Adding more neurons and layers do not alter the relationships discussed.In addition, for demonstration with real data, we us

31、e the EURUSD daily data over the last 10 years. The first six rows of the data are seen in Figure 18. RetlD is the log returns of ClosePrice, and Var is RetlD squared. The VarLn variables are Var lagged by n days. Here we lag the data for up to five days for our ARCH(5) model below.Figure 18: Define

32、 sigmoid functionC Data: head(eurusd)FileCloseDateClosePriceRetlDVarVarLlVarL2VarL3VarL4VarL512009-08-101.4140-0.00613390723.762482e-058.529179e-056.270020e-068.148381e-071.926603e-081.458829e-0422009-08-111.4131-0.00063669494.053804e-073.762482e-058.529179e-056.270020e-068.148381e-071.926603e-08320

33、09-08-121.42110.00564534703.186994e-054.053804e-073.762482e-058.529179e-056.270020e-068.148381e-0742009-08-131.42770.00463353822.146968e-053.186994e-054.053804e-073.762482e-058.529179e-056.270020e-0652009-08-141.4215-0.00435210571.894082e-052.146968e-053.186994e-054.053804e-073.762482e-058.529179e-0

34、562009-08-171.4077-0.00975548539.516949e-051.894082e-052.146968e-053.186994e-054.053804e-073.762482e-05Source: J.P. MorganFigure 19: ARCH expressed as a feedforward neural networkInputSource: J.P. MorganTo make the neural network consistent with ARCH, we choose the linear activation function for the

35、 hidden layer and the identify function for the output layer. Moreover, since ARCH is commonly estimated using maximum likelihood estimation (MLE), we do the same for our loss function. Specifically, we set the loss function to minimize negative sum of log likelihood, as seen below, where o) is the

36、likelihood function of the normal distribution:n -iogCf(yo ot)Compare the code for the classical ARCH and neural network expression:Figure 20: Classical ARCH definition in RXmyARCHfitl(params)2sigma2t-with(eurusdz params13params:2* VarLl +4params3* VarL2 +5params:4* VarL3 +6params5 * VarL4 +params:6

37、* VarL5)sigmat -sqrt(sigma2t)Qloglikelihood - log(dnorm(eurusd$RetlD,j, sigmat)10return(-* sum(log.likelihood)111213init.params-c(0z rep (0.2, 5)14arch - optim(par = init.params, fn =myARCHfit)Source: J.P. MorganFigure 21: ARCH expressed as a feedforward neural network in R1 actfun - function(x) x #

38、linear activation function2 fwdprop - function(xz w)h - actfun(cbind(lz x) %*% (w)4 y - h #output layer: identity function list(output = y)6 78 loss.fun - function(init.w, xr y) y.hat - fwdprop(xr init.w)$output10 return(-sum(log(dnorm(yz 0, sqrt(y.hat)11 )12 init.guess - c(0z rep(0.2Z 5)1314 arch.n

39、n - optim(par = init.guessf fn = loss.fun,15 x = as.matrix(eurusdr c(1VarLlr 1VarL21 r 1VarL31r161VarL41r 1VarL51)z y = eurusd, 1RetlD1)Source: J.P. MorganIn Figure 22 and Table 2 we compare the output and parameter values, and find the two models to be identical.Figure 22: ARCH vs. neural network o

40、utput values (Neural network)0.00005 0.00010 0.00015 0.00020 0.0002ARCHNNomega0.0000140.000014alphal0.0651490.065149alpha20.2779800.277980alpha30.1862510.186251alpha40.1953440.195344alpha50.0878190.087819Table 2: ARCH vs. neural network parameter valuesSource: J.P. MorganSource: J.P. Morgano? (ARCH)

41、Recurrent Neural Network GARCH(1, 1)It is generally accepted that ARCH with longer lags tend to produce better results.GARCH(1, 1) is a parsimonious way of parametrizing ARCH(oo).rt= ct, qff(O, ot)By expanding the o_1 term we can see it contains all the previous c terms, and the weights of ct_j deca

42、y exponentially with i.o2 = m + ac2 + |do2=m + ac2 Tp(m，Wc2 + )(m + ac2 + |)(m + ac2 + ,) t-lt-2t-3t-4oe= Nti(m + a咯J i=OWhile feedforward neural network is not able to capture the recursive structure, recurrent neural network (RNN) does exactly that. In Figure 23, we show what happens in an RNN. Th

43、e horizontal arrows no longer indicate connections between neurons, but between one time period to the next within a single neuron. The input and output layers remain the same as a feedforward neural network, but the /it in the hidden layer will take 九一1 as an additional input. In other words, ht pl

44、ays the role of in the GARCH model.Figure 23: GARCH expressed as a recurrent neural networkSource: J.P. MorganAgain compare the classical GARCH vs. the RNN expression:Figure 24: Classical GARCH definition in R1 myGARCHfit - function(params)2 omega - params1alphaLl - params24 betaLl - params3sigma2t

45、- rep(mean(eurusd$VarLl)z nrow(eurusd)6 for (i in 2:nrow(eurusd)sigma2t；i - omega + alphaLl * eurusd$VarLli betaLl * sigma2ti-19 10 sigmat - sqrt(sigma2t)log.likelihood - log(dnorm(eurusd$RetlDz 0f sigmat)12 return(-1 * sum(log.likelihood)13 1415 init.params - c(0, rep (0.5Z 2)16 garch一 optimlpar =

46、init.params, fn = myGARCHfit)Source: J.P. MorganFigure 25: GARCH expressed as a recurrent neural network in R1 actfun - function(x) x #linear activation function2 rnn.fwdprop - function(xf w)h - matrix(nrow = nrow(x)4 hlz 1 - mean(x)5 for (k in 2:nrow(h)hkz 1 - actfun(cbind(1, xkz 1r hk-lz 1) %*% w)/ )y - h #identity function output9 list(output = y)10 12 loss.fun - function(init.w, x, y)13 y.hat - rnn.fwdprop(x, init.w)$output14 return(-sum(log(dnorm(yz 0z sq

展开阅读全文