Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phantom Tic-Tac-Toe information state bug #1273

Closed
nathanlct opened this issue Aug 21, 2024 · 2 comments · Fixed by #1275
Closed

Phantom Tic-Tac-Toe information state bug #1273

nathanlct opened this issue Aug 21, 2024 · 2 comments · Fixed by #1275
Labels
bug Something isn't working

Comments

@nathanlct
Copy link
Contributor

nathanlct commented Aug 21, 2024

Hey there, I was looking at the phantom tic-tac-toe implementation, specifically the information state tensor:

void PhantomTTTState::InformationStateTensor(Player player,
                                             absl::Span<float> values) const {
  SPIEL_CHECK_GE(player, 0);
  SPIEL_CHECK_LT(player, num_players_);

  // First 27 bits encodes the player's view in the same way as TicTacToe.
  // Then the action sequence follows (one-hot encoded, per action).
  // Encoded in the same way as InformationStateAsString, so full sequences
  // which may contain action value 10 to represent "I don't know."
  const auto& player_view = player == 0 ? x_view_ : o_view_;
  SPIEL_CHECK_EQ(values.size(), kNumCells * kCellStates +
                                    kLongestSequence * (1 + kBitsPerAction));
  std::fill(values.begin(), values.end(), 0.);
  for (int cell = 0; cell < kNumCells; ++cell) {
    values[kNumCells * static_cast<int>(player_view[cell]) + cell] = 1.0;
  }

  // Now encode the sequence. Each (player, action) pair uses 11 bits:
  //   - first bit is the player taking the action (0 or 1)
  //   - next 10 bits is the one-hot encoded action (10 = "I don't know")
  int offset = kNumCells * kCellStates;
  for (const auto& player_with_action : action_sequence_) {
    if (player_with_action.first == player) {
      // Always include the observing player's actions.
      values[offset] = player_with_action.first;  // Player 0 or 1
      values[offset + 1 + player_with_action.second] = 1.0;
    } else if (obs_type_ == ObservationType::kRevealNumTurns) {
      // If the number of turns are revealed, then each of the other player's
      // actions will show up as unknowns.
      values[offset] = player_with_action.first;
      values[offset + 1 + 10] = 1.0;  // I don't know.
    } else {
      // Do not reveal anything about the number of actions taken by opponent.
      SPIEL_CHECK_EQ(obs_type_, ObservationType::kRevealNothing);
    }

    offset += (1 + kBitsPerAction);
  }
}

Firstly, values[offset + 1 + 10] = 1.0; // I don't know. should be 1 + 9 I believe since actions range from 0-8 and not 1-9.

Second, we offset offset += (1 + kBitsPerAction); in any case. In the default case of ObservationType::kRevealNothing, I think we should only offset in the if and the else if, but now in the else (right now we always offset).

For instance, with ObservationType::kRevealNumTurns we could get, for player 0's info string:

move 0: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
move 1: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
move 2: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
move 3: [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
move 4: [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
move 5: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

(and you can see that the last "i don't know" action isn't set to 1, as per my previous point).

With ObservationType::kRevealNothing, we get:

move 0: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
move 1: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
move 2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
move 3: [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
move 4: [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
move 5: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

So well, we can still see the number of moves the opponent did by counting the lines of zeros.

@nathanlct
Copy link
Contributor Author

I see that the problem is exactly the same in Dark Hex 3. Am I maybe missing something?

@lanctot
Copy link
Collaborator

lanctot commented Aug 21, 2024

I believe you're right (at least for both issues in Phantom TTT -- haven't looked at Dark Hex). Nice find.

Can you submit a PR fix?

I think the information state size also be smaller then in the case of kRevealNothing (since its upper bound is smaller). And then we can put a check that the offset has reached the exactly the size of the container at the end of the function (like we do in other games).

If/when you're convinced that this is also an issue in Dark Hex, can you open another issue that refrers to this one? (Maybe also listing the code as you did here)

@lanctot lanctot added the bug Something isn't working label Aug 21, 2024
@nathanlct nathanlct changed the title Problem with phantom tic tac toe Phantom Tic-Tac-Toe information state bug Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants