Reinforcement learning (RL) is a machine learning technique that focuses on training an algorithm following the cut-and-try approach. The algorithm (agent) evaluates a current situation (state), takes an action, and receives feedback (reward) from the environment after each act. Positive feedback is a reward (in its usual meaning for us), and negative feedback is punishment for making a mistake.
We introduce an ensemble strategy using deep reinforcement learning to maximize returns. This involves a learning agent employing three actor-critic algorithms: Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG). This ensemble approach combines the strengths of each algorithm, adapting well to market changes. To manage memory usage when training with continuous action spaces, we use a load-on-demand data processing technique. We tested our strategy on 30 Dow Jones stocks and compared its performance to the Dow Jones Industrial Average and the traditional min-variance portfolio allocation. Our ensemble strategy outperformed individual algorithms and baselines in risk-adjusted returns, as measured by the Sharpe ratio.
Keywords: Deep Reinforcement Learning, Markov Decision Process, Automated Stock Trading, Ensemble Strategy, Actor-Critic Framework
- Deep Reinforcement Learning for Automated Stock Trading
- Outline
- Overview
- Reinforcement, supervised, and unsupervised learning
- Multi-level deep Q-networks for Bitcoin trading strategies
- File Structure
- Supported Data Sources
- Deep Q-network
- News articles
- Installation
- Status Update
- Tutorials
- Publications
- News
- Citing
- Join and Contribute
- LICENSE
Jojo has three layers: market environments, agents, and applications. For a trading task (on the top), an agent (in the middle) interacts with a market environment (at the bottom), making sequential decisions.
In supervised learning, an agent “knows” what task to perform and which set of actions is correct. Data scientists train the agent on historical data with target variables (desired answers with predictive analysis) AKA labeled data. The agent receives direct feedback. As a result of training, an agent can forecast whether there will be target variables in new data or not. Supervised learning allows for solving classification and regression tasks.
Reinforcement learning doesn’t rely on labeled datasets: The agent isn’t told which actions to take or the optimal way of performing a task. RL uses rewards and penalties instead of labels associated with each decision in datasets to signal whether a taken action is good or bad. So, the agent only gets feedback once it completes the task. That’s how time-delayed feedback and the trial-and-error principle differentiate reinforcement learning from supervised learning.
Since one of the goals of RL is to find a set of consecutive actions that maximize a reward, sequential decision making is another significant difference between these algorithm training styles. Each agent’s decision can affect its future actions.
Reinforcement learning vs unsupervised learning. In unsupervised learning, the algorithm analyzes unlabeled data to find hidden interconnections between data points and structures them by similarities or differences. RL aims at defining the best action model to get the biggest long-term reward, differentiating it from unsupervised learning in terms of the key goal.
Reinforcement and deep learning. Most of reinforcement learning implementations employ deep learning models. They involve the use of deep neural networks as the core method for agent training. Unlike other machine learning methods, deep learning fits best for recognizing complex patterns in images, sounds, and texts. Additionally, neural networks allow data scientists to fit all processes into a single model without breaking down the agent’s architecture into multiple modules.
We propose a multi-level deep Q-network (M-DQN) that leverages historical Bitcoin price data and Twitter sentiment analysis. In addition, an innovative preprocessing pipeline is introduced to extract valuable insights from the data, which are then input into the M-DQN model. In the experiments, this integration led to a noteworthy increase in investment value from the initial amount and a Sharpe Ratio in excess of 2.7 in measuring risk-adjusted return.
Reinforcement learning is applicable in numerous industries, including internet advertising and eCommerce, finance, robotics, and manufacturing. Let’s take a closer look at these use cases.
The main folder finrl has three subfolders applications, agents, meta. We employ a train-test-trade pipeline with three files: train.py, test.py, and trade.py.
FinRL
├── finrl (main folder)
│ ├── applications
│ ├── Stock_NeurIPS2018
│ ├── imitation_learning
│ ├── cryptocurrency_trading
│ ├── high_frequency_trading
│ ├── portfolio_allocation
│ └── stock_trading
│ ├── agents
│ ├── elegantrl
│ ├── rllib
│ └── stablebaseline3
│ ├── meta
│ ├── data_processors
│ ├── env_cryptocurrency_trading
│ ├── env_portfolio_allocation
│ ├── env_stock_trading
│ ├── preprocessor
│ ├── data_processor.py
│ ├── meta_config_tickers.py
│ └── meta_config.py
│ ├── config.py
│ ├── config_tickers.py
│ ├── main.py
│ ├── plot.py
│ ├── train.py
│ ├── test.py
│ └── trade.py
│
├── examples
├── unit_tests (unit tests to verify codes on env & data)
│ ├── environments
│ └── test_env_cashpenalty.py
│ └── downloaders
│ ├── test_yahoodownload.py
│ └── test_alpaca_downloader.py
├── setup.py
├── requirements.txt
└── README.md
Data Source | Type | Range and Frequency | Request Limits | Raw Data | Preprocessed Data |
---|---|---|---|---|---|
Akshare | CN Securities | 2015-now, 1day | Account-specific | OHLCV | Prices&Indicators |
Alpaca | US Stocks, ETFs | 2015-now, 1min | Account-specific | OHLCV | Prices&Indicators |
Baostock | CN Securities | 1990-12-19-now, 5min | Account-specific | OHLCV | Prices&Indicators |
Binance | Cryptocurrency | API-specific, 1s, 1min | API-specific | Tick-level daily aggegrated trades, OHLCV | Prices&Indicators |
CCXT | Cryptocurrency | API-specific, 1min | API-specific | OHLCV | Prices&Indicators |
EODhistoricaldata | US Securities | Frequency-specific, 1min | API-specific | OHLCV | Prices&Indicators |
IEXCloud | NMS US securities | 1970-now, 1 day | 100 per second per IP | OHLCV | Prices&Indicators |
JoinQuant | CN Securities | 2005-now, 1min | 3 requests each time | OHLCV | Prices&Indicators |
QuantConnect | US Securities | 1998-now, 1s | NA | OHLCV | Prices&Indicators |
RiceQuant | CN Securities | 2005-now, 1ms | Account-specific | OHLCV | Prices&Indicators |
Sinopac | Taiwan securities | 2023-04-13~now, 1min | Account-specific | OHLCV | Prices&Indicators |
Tushare | CN Securities, A share | -now, 1 min | Account-specific | OHLCV | Prices&Indicators |
WRDS | US Securities | 2003-now, 1ms | 5 requests each time | Intraday Trades | Prices&Indicators |
YahooFinance | US Securities | Frequency-specific, 1min | 2,000/hour | OHLCV | Prices&Indicators |
OHLCV: open, high, low, and close prices; volume. adjusted_close: adjusted close price
Technical indicators: 'macd', 'boll_ub', 'boll_lb', 'rsi_30', 'dx_30', 'close_30_sma', 'close_60_sma'. Users also can add new features.
Building upon the foundations of Q-learning, DQN is an extension that combines reinforcement learning with deep learning techniques15. It uses a deep neural network as an approximator to estimate the action-value function Q(s, a). DQN addresses the main challenges of traditional Q-learning, such as learning stability. Moreover, by employing deep learning, DQN can handle high-dimensional state spaces, such as those encountered in image-based tasks or large-scale problems
Returns latest news articles across stocks and crypto. By default returns latest 10 news articles. News recommendation. Machine learning has made it possible for businesses to personalize customer interactions at scale through the analysis of data on their preferences, background, and online behavior patterns.
However, recommending such content type as online news is still a complex task. News features are dynamic by nature and become rapidly irrelevant. User preferences in topics change as well. A Deep Reinforcement Learning Framework for News Recommendation discuss three main challenges related to news recommendation methods. We used the Deep Q-Learning based recommendation framework that considers current reward and future reward simultaneously in addition to user return as feedback.
- Install description for all operating systems (MAC OS, Ubuntu, Windows 10)
- FinRL for Quantitative Finance: Install and Setup Tutorial for Beginners
Version History [click to expand]
- 2022-06-25 0.3.5: Formal release of FinRL, neo_finrl is chenged to FinRL-Meta with related files in directory: meta.
- 2021-08-25 0.3.1: pytorch version with a three-layer architecture, apps (financial tasks), drl_agents (drl algorithms), neo_finrl (gym env)
- 2020-12-14 Upgraded to Pytorch with stable-baselines3; Remove tensorflow 1.0 at this moment, under development to support tensorflow 2.0
- 2020-11-27 0.1: Beta version with tensorflow 1.5
- [Towardsdatascience] Deep Reinforcement Learning for Automated Stock Trading
Click to view publications
Title | Conference/Journal | Link | Citations | Year |
---|---|---|---|---|
Dynamic Datasets and Market Environments for Financial Reinforcement Learning | Machine Learning - Springer Nature | paper code | 7 | 2024 |
FinRL-Meta: FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning | NeurIPS 2022 | paper code | 37 | 2022 |
FinRL: Deep reinforcement learning framework to automate trading in quantitative finance | ACM International Conference on AI in Finance (ICAIF) | paper | 49 | 2021 |
FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance | NeurIPS 2020 Deep RL Workshop | paper | 87 | 2020 |
Deep reinforcement learning for automated stock trading: An ensemble strategy | ACM International Conference on AI in Finance (ICAIF) | paper code | 154 | 2020 |
Practical deep reinforcement learning approach for stock trading | NeurIPS 2018 Workshop on Challenges and Opportunities for AI in Financial Services | paper code | 164 | 2018 |
Returns latest news articles across stocks and crypto. By default returns latest 10 news articles. Example using symbol BTCUSD-BitCoin start:2024-08-04 end 2024-11-04:
{ "news": [ { "author": "Bibhu Pattnaik", "content": "
Crypto analyst Kevin Svenson predicts that Bitcoin (CRYPTO: <a class="ticker" href="https://www.benzinga.com/quote/btc/usd">BTC) could see an upsurge of up to 86%, <a href="https://www.benzinga.com/markets/cryptocurrency/23/12/36274982/crypto-analyst-forecasts-monumental-bitcoin-rally-by-2026">potentially hitting the $100,000 mark.
\n\n\n\nWhat Happened: Svenson said that Bitcoin has formed a bullish divergence pattern on the daily chart. This pattern emerges when the asset’s price is trading down or sideways, while an oscillator like the relative strength index (RSI) is in an uptrend, indicating increasing bullish momentum.
\n\n\n\nIn a video <a href="https://www.youtube.com/watch?v=dU_qfQRtxks">post, Svenson said, “We got actually a slightly higher low in the RSI – you could also call it flat support, horizontal support. And upon that flat support, we had lower lows in price. That is a bullish divergence.”
\n\n\n\nHe also observed a broadening pattern in the Bitcoin chart, characterized by lower highs and even lower lows, which could be interpreted as a bullish continuation pattern if the asset breaks its diagonal resistance.
\n\n\n\n“And so what is happening on the Bitcoin chart? Well, what we see is a broadening pattern of sorts – lower highs but even lower lows," he added.
\n\n\n\nAlso Read: <a href="https://www.benzinga.com/markets/cryptocurrency/24/06/39241558/analyst-predicts-bitcoin-to-reach-groundbreaking-100-000-milestone">Analyst Predicts Bitcoin To Reach Groundbreaking $100,000 Milestone
\n\n\n\nDRL based trading agents: Risk driven learning for financial rules-based policy
A General Portfolio Optimization Environment
Welcome to JojoFinance community!
Thank you!
MIT License
Disclaimer: We are sharing codes for academic purpose under the MIT education license. Nothing herein is financial advice, and NOT a recommendation to trade real money. Please use common sense and always first consult a professional before trading or investing.