文本描述
基于遗传算法改进自适应多臂赌博机算法的投资组合模型构建
摘要
随着人工智能技术的发展,基于金融数学和计算机技术的量化投资研究进入
到了定量分析阶段,促使现代金融投资理论开始摆脱以个人经验和描述性研究为
主的状态。在绝大多数的量化投资研究中,他们以机器学习、强化学习、优化算
法等算法为工具,通过构建数学模型进行量化选股,这些模型无论在发达国家资
本市场还是在新兴市场中都取得了不错的表现。进入 21世纪以来,我国资本市
场得到了快速的发展,但总体上来说还不够完善和成熟,同时也面临着金融危机
和市场频繁波动带来的挑战。因此通过量化策略构建投资组合的理论研究抵御市
场风险和稳定市场秩序越来越具有理论和实际意义。
本文通过强化学习算法将投资组合选择问题描述为一个马尔科夫决策过程,
即在不确定性条件下投资组合的序列决策问题。本文采用 Bandit系列算法中的
上下文相关的 LinUCB算法(Linear Upper Confidence Bound),引入期望效用函
数来定义投资者对组合的偏好程度,将投资组合的选择标准定义为投资者对该组
合偏好的置信上界。根据强化学习算法所特有的在线学习性质,使得模型在实验
期内可以通过多次迭代进行学习,最终选择出最令投资者满意的投资组合。同时,
本文引入遗传算法对 LinUCB算法的参数进行优化,保证模型在运行环境内始终
保持有效性,最终实现累计收益的最大化。本文主要从以下几个方面开展研究工
作:一是基于投资者的效用函数构建自适应多臂赌博机模型,实现投资组合的在
线选择以及权重更新;二是基于遗传算法对自适应多臂赌博机模型的参数进行优
化,构建本文策略模型;三是展示和比较选取的最优投资组合的回测结果。
通过实证表明,本文构建的基于遗传算法改进自适应多臂赌博机算法的投资
组合选取模型在实验期内收益表现优异,面对不同风险偏好的投资者,回测结果
均优于沪深 300指数和四种不同类型的基金。通过进一步的分析证明了该模型在
实验期内能够通过自身学习使算法越来越了解投资者,当模型运行临近调仓期,
推荐的投资组合的表现越好。
关键词:投资组合;序列决策;多臂赌博机算法;遗传算法;量化投资
I
基于遗传算法改进自适应多臂赌博机算法的投资组合模型构建
Abstract
Under the background of artificial intelligence technology, quantitative investment
research based on financial mathematics and computer technology has entered the stage
of quantitative analysis, so that modern financial investment theory is beginning to get
rid of the transition based on personal experience and descriptive research. In these
quantitative investment studies, most of them use machine learning, reinforcement
learning, optimization algorithms, et al, building models for quantitative stock selection,
those in both developed capital markets and emerging markets have achieved good
returns. Since entering the 21st century, China's capital market is not perfect and mature
enough in general although it has been developing rapidly. At the same time, China is
also facing the challenges brought about by the financial crisis and frequent market
fluctuations. Therefore, researches on building portfolio theory through quantitative
strategy are more and more theoretical and practical to resist market risk and stabilize
market order, and the reinforcement learning and machine learning algorithm are also
used to construct portfolio selection model in this paper.
This paper describes the problem of portfolio selection as a Markovian Decision
Processes, that is, making sequential decisions of portfolio under uncertain conditions.
In this paper, the Contextual Linear Upper Confidence Bound algorithm from the bandit
series algorithm is introduced, together with, the Expected Utility Theory is adopted to
define the degree of investors' preference for the portfolio, and the criterion of portfolio
selection is defined as the confidence upper bound of investors' preference for the
portfolio. According to the unique online learning nature of the reinforcement learning
algorithm, the model can learn by itself through multiple iterations during the
experimental period, and finally select the most satisfactory portfolio for investors. At
the same time, in order to optimizing the parameters of LinUCB algorithm, we
introduces the genetic algorithm, ensuring that the model is always effective in the
II
基于遗传算法改进自适应多臂赌博机算法的投资组合模型构建
operating environment, and finally maximize the cumulative return. This paper carry
out our research mainly from the following aspects: First, we build an adaptive multi-
armed bandit model based on the Expected Utility Theory to select portfolio and update
its weight; The second is to optimize the parameters of adaptive multi-armed bandit
based on genetic algorithm. The third is to show and compare the results of the selected
optimal portfolios.
The results show that, the portfolio selection model of adaptive multi-armed bandit
algorithm which improved by genetic algorithm constructed in this paper has excellent
returns in the experimental period, and the reverse running results are better than the
CSI300 index and four different types of funds when facing investors with different risk
preferences, indicating the effectiveness of the strategy model. Through further analysis,
it is proved that the model can learn to understand the investors by learning during the
experimental period. When the model runs near the adjustment period, the
recommended portfolio perform better.
Keywords: portfolio selection; sequential decisions making; multi-armed bandit
algorithm; genetic algorithm; quantitative investment
III
。。。以下略