Adventures in Algorithmic Trading on the Runescape Grand Exchange

Overview

Runescape has been a game near and dear to my heart since I was a child. Though I do not actively play anymore, it still functions as an interesting programming project substrate. Most recently, I created a bot that automatically executes trades on the Grand Exchange in order to conduct market making via common machine learning techniques. This blog post will explain the individual components of the bot, the various trading algorithms used, and the results of an experiment comparing the various trading algorithms' performance.

The Old School Runescape Grand Exchange is a market where all players can place buy or sell offers for almost any item in the game. It is a chaotic system like most markets, but it is also modelable on small enough time scales or microstructures. The only constraint placed on offers is a four hour buy limit that is different for each item. For example, coal ore has a buy limit of 13,000. This means that a player can only buy up to 13,000 coal every four hours. Another interesting feature of the Grand Exchange is the 1% tax applied to all executed sell offers. This tax is applied to all items with an individual trading price greater than 100 gold. It is capped at 5 million gold per offer and is applied on a per-item basis rounded down to the nearest integer. The proceeds of the tax are used by the game developer to control inflation. As an example, if I were to sell 10,000 coal ore at today's price of 142 gold per item, then my executed trade would be subject to a tax of: math.floor(0.01 * 142) * 10000 = 10,000 gold

Components

The bot is composed of three separate applications: a JavaScript API to interact with the OSRS Wiki's real-time item price stream, a Java client to control character actions, and a Python API to return the ranking of a set of possible offers in terms of their forecasted profitability. The OSRS Wiki project maintains a useful API that, every 5 minutes, writes data about every single item trading on the Grand Exchange. It records fields such as the average item price spread over a configurable past period, the volume traded over a configurable past period, and the buy limit of each item.

The data pipeline I use to train the ML models is composed of two cronjobs that interact with the OSRS Wiki API: one that polls it every 5 minutes, and another that polls it every hour. Each of them record the price spreads and volumes of every item traded on the Grand Exchange over that time period and write the resulting data to a database. Furthermore, the bot records data about each offer that it successfully executes. The fields it records are: gold/second generated, absolute profit generated, timestamp of the buy offer initialization, and the ID of the item traded. These two tables are joined on timestamp such that each row contains aggregate trade data of the item across the entire Grand Exchange for the period leading up to a successful trade as well as the profitability of the individual executed trade. The target of the model's loss function is gold/second generated. Lastly, to prevent temporal leakage, the training set consists of labeled trades from 63 days prior to 14 days prior, while the validation set includes the most recent 14 days.

Baseline Method

In any modeling scenario, it is always good practice to establish a naive baseline method that can be used to determine if non-trivial methods are actually improving performance. The baseline method I came up with is as follows:

Given an item, its price spread over the last 5 mins, and its trade volume over the last hour, compute the following variables:

ROI: (sell_total - {1% tax} - buy_total) / buy_total

Volume ratio: (1h_volume_traded_high_price / 1h_volume_traded_low_price)

Average gold/second of item trades over last two weeks

Then do the following:

  1. Compute ROI Z score for each item.
  2. Compute volume ratio Z score for each item.
  3. Filter out any items with a historically negative average gold/second metric.
  4. Sort each item by (roi_zscore + volume_ratio_zscore) descending.

Machine Learning Methods

After generating results for the baseline method, I ran a one-week experiment comparing the baseline to random forest and neural network regression models. In order to avoid a sample ratio mismatch, I programmed the game client to choose randomly between each model type's output when ranking potential offers. The results are displayed in the following table sorted by mean profit/hour in descending order:

Model Type n L2 loss Mean profit/hr 95% CI profit/hr
random forest 257 53 150,892 129,140 - 172,643
neural network 191 57 123,923 103,279 - 144,566
baseline 216 N/A 87,353 79,493 - 95,212

Conclusion

Machine learning methods outperformed baseline by a significant amount. Among ML model architectures, random forest performed the best, edging out the neural network slightly. This aligns with the validation loss of each model during training. I am somewhat surprised that the random forest generated the highest profit/hour, as random forest predictions are typically constrained to the range of target values in the training data. With that said, the training data does have relatively low variance as the market making trades these models are predicting tend to be high frequency and low ROI.

Please reach out to me if you are interested in obtaining the training data used for this project. The code used to train the models can be found linked above as well as on my Github.