尝试理解强化学习(Try to understand reinforcement learning)

强化学习就是评价学习,这个和深度学习有啥区别?

我个人理解就是深度学习需要对一个一组特征设置标签, 然后反复训练模型,是这个模型尽量接近  一坨特征数据等于标签。

而强化学习是对一坨特征,模型刚开始不知道标签是具体是啥,随便输出一个值y就行,然后我们实现一个奖励函数,对这个输出值打一个分, 分数越高,说明这个随便输出的值可以认为是临时的标签数据。 相当于在训练过程中动态设置标签数据。

也就是说强化学习核心是需要一个打分系统,不需要预先设置标签。 

深度学习模型刚开始可以认为是随机生成一个值的,然后这个值和标签进行比较,这个值越小表示模型越好

强化学习模型刚开始可以认为是随机生成一个值的,然后对这个值进行打分, 这个分值越大表示模型越好。 

===============================

深度学习应用在买基金上:

比如给前30天的涨幅作为特征,今天的涨幅作为标签。 让模型去训练,训练好后去预测每日涨幅。

这里预测的准不准,其实和特征有很大关系,光涨幅这维度的特征去训练,计算训练完也很难达到好的预测效果, 因为基金涨幅的影响因素太多了

强化学习应用在买基金上:

首先得设计一个基金交易环境, 这个环境的输出是近30天的涨幅。输入是买入,卖出,观望。假定本金1万, 打分系统就设计成收益率

然后给前30天的涨幅作为特征, 输出值定义域y[-1, 0, 1],    0表示观望, 

我们人为的可以设计一个输出值y的含义:

y>0 表示买入, y=0.2    表示买入2000.     

y=0. 表示观望,不买也不卖

y<0 表示卖出, y=-0.5  表示卖出持有份额的一半。

在买基金的问题上,强化学习和深度学习是一样的,都不太准确,好处就是比较理性。还有一个缺点训练数据集太小,因为一支基金10年才2500条左右的数据。

很简单的例子,一个疫情的出现,会让医疗相关基金保障,而人工智能预测不了疫情会出现。 

但是这并不意味这个不能应用在买基金这件事上, 因为它会有一个策略, 什么时候止盈, 什么时候买入,什么时候加仓。这个策略不是简单的定投。

强化学习应用在游戏上

强化学习在非常擅长应用在游戏领域,因为游戏本身就是环境, 游戏画面就是输出, 基本上所有的游戏基本都有一个分数或者胜利的东西,即打分系统。

比如说玩消灭星星

消灭星星游戏本身就是一个环境,这个环境的输入就是点击位置, 输出就是游戏画面。 消灭的分数就是打分系统。

gym 里有很多基于物理引擎的游戏, 非常适合来练手,学习。

————————

Reinforcement learning is evaluation learning. What’s the difference between this and deep learning?

My personal understanding is that deep learning needs to label a group of features, and then train the model repeatedly. This model is as close as possible to a lump of feature data equal to the label.

Reinforcement learning is for a lump of features. At the beginning of the model, we don’t know what the label is. Just output a value y casually. Then we implement a reward function to give a score to the output value. The higher the score, it shows that the casually output value can be considered as temporary label data. It is equivalent to dynamically setting label data during training.

In other words, the core of reinforcement learning is to need a scoring system without setting labels in advance.

At first, the deep learning model can be regarded as randomly generating a value, and then the value is compared with the label. The smaller the value, the better the model

At first, reinforcement learning model can be regarded as randomly generating a value, and then score the value. The larger the score, the better the model.

===============================

Application of deep learning in buying funds:

For example, give the increase in the first 30 days as a feature and today’s increase as a label. Let the model train and predict the daily increase after training.

In fact, the accuracy of the prediction here has a lot to do with the characteristics. It is difficult to achieve a good prediction effect after the calculation and training based on the characteristics of the increase dimension, because there are too many factors affecting the increase of the fund

Application of reinforcement learning in buying funds:

First of all, we have to design a fund trading environment. The output of this environment is the increase in recent 30 days. Input is buy, sell, wait and see. Assuming that the principal is 10000, the scoring system is designed as the rate of return

Then, take the increase in the first 30 days as the feature, and the output value definition field y [- 1, 0, 1], 0 means wait-and-see

We can artificially design the meaning of an output value Y:

y> 0 means buying, y = 0.2 , means buying 2000

y=0. It means to wait and see, neither buy nor sell

y< 0 means selling, y = – 0.5 = selling half of the shares held.

On the issue of buying funds, intensive learning and in-depth learning are the same. They are not accurate. The advantage is that they are more rational. Another disadvantage is that the training data set is too small, because a fund has only about 2500 data in 10 years.

A very simple example is that the emergence of an epidemic will be guaranteed by medical related funds, while artificial intelligence can not predict the emergence of an epidemic.

But this does not mean that this can not be applied to buying funds, because it will have a strategy, when to stop profits, when to buy and when to increase positions. This strategy is not a simple fixed investment.

Application of reinforcement learning in games

Reinforcement learning is very good at applying in the game field, because the game itself is the environment and the game picture is the output. Basically, all games basically have a score or victory, that is, the scoring system.

For example, play to destroy the stars

The game of eliminating stars itself is an environment. The input of this environment is the click position, and the output is the game screen. The eliminated score is the scoring system.

There are many games based on physics engine in gym, which are very suitable for practicing and learning.