This is an open source Python libaray of 1-on-1 mahjong game, which provides a standard API to learn algorithms on mahjong environment.
You can see the detailed game rules in appendix of the paper.
You can clone the git project, and import the python library,We support python 3.x,and the only requirement is numpy.
git clone https://github.com/yata0/Mahjong.git
from p2_mahjong.wrapper import MJWrapper as Wrapper
wrapper = Wrapper()
You can see how to use interact with the environment through this example.
import random
from p2_mahjong.wrapper import MJWrapper as Wrapper
wrapper = Wrapper()
is_game_over = False
index = 0
game_count = 100
for game_index in range(game_count):
is_game_over = False
wrapper.reset()
legal_actions = wrapper.get_legal_actions()
while not is_game_over:
action_label = random.choice(legal_actions)
cards, actions, reward, is_game_over, legal_actions = wrapper.step([action_label])
if is_game_over:
_, winner_id = wrapper.get_game_status()
if winner_id is not None:
print(game_index, wrapper.get_payoffs())
else:
print(game_index, "tie")
index += 1
There are 24 unique tiles and 72 tiles in total in the 1-on-1 Mahjong game, and the relevant tile ids defined in the source code are listed in the table below.
Tile name | ID |
---|---|
Character 1 | 9 |
Character 2 | 10 |
Character 3 | 11 |
Character 4 | 12 |
Character 5 | 13 |
Character 6 | 14 |
Character 7 | 15 |
Character 8 | 16 |
Character 9 | 17 |
Green | 27 |
Red | 28 |
White | 29 |
East | 30 |
West | 31 |
North | 32 |
South | 33 |
Spring | 34 |
Summer | 35 |
Autumn | 36 |
Winter | 37 |
Mei | 38 |
Lan | 39 |
Zhu | 40 |
Ju | 41 |
There are 10 types of actions with 105 different actions in total, and the relevant action ids defined in the source code are listed in the table below.
action type | auxiliary tiles | target tile | id |
---|---|---|---|
Get Card | - | - | 0 |
Hu | - | - | 1 |
Discard | - | Character 1 Character 2 Character 3 Character 4 Character 5 Character 6 Character 7 Character 8 Character 9 Green Red White East West North South |
12 13 14 15 16 17 18 19 20 30 31 32 33 34 35 36 |
Pong | - | Character 1 Character 2 Character 3 Character 4 Character 5 Character 6 Character 7 Character 8 Character 9 Green Red White East West North South |
46 47 48 49 50 51 52 53 54 64 65 66 67 68 69 70 |
Gong | - | Character 1 Character 2 Character 3 Character 4 Character 5 Character 6 Character 7 Character 8 Character 9 Green Red White East West North South |
80 81 82 83 84 85 86 87 88 98 99 100 101 102 103 104 |
Chow | Character 2,3 Character 1,3 Character 1,2 Character 3,4 Character 2,4 Character 2,3 Character 4,5 Character 3,5 Character 3,4 Character 5,6 Character 4,6 Character 4,5 Character 6,7 Character 5,7 Character 5,6 Character 7,8 Character 6,8 Character 6,7 Character 8,9 Character 7,9 Character 7,8 |
Character 1 Character 2 Character 3 Character 2 Character 3 Character 4 Character 3 Character 4 Character 5 Character 4 Character 5 Character 6 Character 5 Character 6 Character 7 Character 6 Character 7 Character 8 Character 7 Character 8 Character 9 |
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
Concealed Gong | - | Character 1 Character 2 Character 3 Character 4 Character 5 Character 6 Character 7 Character 8 Character 9 Green Red White East West North South |
177 178 179 180 181 182 183 184 185 195 196 197 198 199 200 201 |
Pass Hu | - | - | 202 |
Ting | - | - | 203 |
Add Gong | - | Character 1 Character 2 Character 3 Character 4 Character 5 Character 6 Character 7 Character 8 Character 9 Green Red White East West North South |
213 214 215 216 217 218 219 220 221 231 232 233 234 235 236 237 |
step(self, action: int) -> Tuple[list of list, list of list, list, bool, list]
Run one step of the environment's dynamics.You can call reset() to reset the environment's state.This function accepts an action id and returns a tuple (tiles, actions, rewards, is_game_over, legal_actions)
- action(int): an action provided by the player.This is an integer,and should be one of the Actions
-
tiles(list of list): one of player's observations of the current environment ,list of tiles,contains player’s hand, the player’s Chow, Pong, and Kong, the player’s concealed-Kong, the player’s Discard, the opponent’s Chow, Pong, and Kong, the opponent’s concealed-Kong, and the opponent’s Discard.Each tile in the list is integer, which is one of the Tiles. The specific information is shown in the table below:
class description length remarks self_hand the player's hand 34 complement the length with -1 self_piles the player's Chow,Pong and Kong 34 complement the length with -1 self_hidden_piles the player's concealed-Kong 34 complement the length with -1 self_history_tiles the player's Discard 34 complement the length with -1 opp_piles the opponent's Chow,Pong and Kong 34 complement the length with -1 opp_hidden_piles the opponent's concealed-Kong (invisible, replace the real id with 34) 34 complement the length with -1 opp_history_tiles the opponent's Discard 34 complement the length with -1 last_table_tile the latest discard tile on the table 1 complement the length with -1 self_flower player's flower 8 complement the length with -1 opp_flower opponent's flower 8 complement the length with -1 -
actions(list of list): one of player's observations of the current environment, a list of actions, contains player's history actions, the opponent's history actions,the player's state of Ting and the opponent's state of Ting.The specific information is shown in the table below:
class description length remarks self_his_actions the player's history actions 50 complement the length with -1 opp_his_actions the opponent's history actions 50 complement the length with -1 self_wait the player's state of Ting 1 1 for Ting, 0 for not opp_wait the opponent's state of Ting 1 1 for Ting, 0 for not -
rewards(list): reward returned after previous action, the first item in this list is the reward of player 0, the second item is the reward of player 1.
-
is_game_over(bool): a signal to check whether the episode has ended.
-
legal_actions(list): set of legal actions that can be done next step, these actions are also integers.
get_legal_actions(self) -> list
- legal_actions(list)
reset(self) -> Tuple[list of list, list of list, list]
- tiles(list of list)
- actions(list of list)
- legal_actions(list)
get_current_player(self) -> int
- current_player(int): current player to provide an action, which is an integer.
get_current_obs(self) -> Tuple[list of list, list of list]
- tiles(list of list)
- actions(list of list)
get_payoffs(self) -> Tuple(list, list, list)
-
payoffs(list): the payoffs of both player, the first is the player 0'score, the second is the player 1's score. Payoff is same to the reward.
-
fan_names(list): categories to which the winner's completed legal hand belongs.This is a list of strings.
-
fan_score(list): list of scores one-to-one with fan_names,which is a list of integers.
get_game_status(self) -> Tuple(bool, int)
- is_over(bool): a signal to check whether the episode has ended.
- winner(int): an integer indicating who is the winner, 0 for player 0 and 1 for player 1.