Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Phase 1][colocation][DocDB][SQLsmith] Colocated table corruption after crash loop #11415

Closed
def- opened this issue Feb 10, 2022 · 14 comments
Closed
Labels
area/docdb YugabyteDB core features area/ysql Yugabyte SQL (YSQL) duplicate kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@def-
Copy link
Contributor

def- commented Feb 10, 2022

Jira Link: [DB-372](https://yugabyte.atlassian.net/browse/DB-372)

Description

After running SQLsmith for a while (Release build of code state 596eecc) with many trigger crashes (#10152) the data state becomes invalid and database can't start up anymore. Starting up:

$ bin/yugabyted start
Starting yugabyted...
✅ System checks

+--------------------------------------------------------------------------------------------------+
|                                            yugabyted                                             |
+--------------------------------------------------------------------------------------------------+
| Status              : Running. Leader Master is present                                          |
| Web console         : http://127.0.0.1:7000                                                      |
| JDBC                : jdbc:postgresql://127.0.0.1:5433/yugabyte?user=yugabyte&password=yugabyte  |
| YSQL                : bin/ysqlsh   -U yugabyte -d yugabyte                                       |
| YCQL                : bin/ycqlsh   -u cassandra                                                  |
| Data Dir            : /nfusr/dev-server/dfelsing/var/data                                        |
| Log Dir             : /nfusr/dev-server/dfelsing/var/logs                                        |
| Universe UUID       : c3d09506-a6b4-44ea-b4a1-c37ea2f5ed60                                       |
+--------------------------------------------------------------------------------------------------+
🚀 yugabyted started successfully! To load a sample dataset, try 'yugabyted demo'.
🎉 Join us on Slack at https://www.yugabyte.com/slack
👕 Claim your free t-shirt at https://www.yugabyte.com/community-rewards/

master log during startup:

Log file created at: 2022/02/10 10:02:57
Running on machine: $ip
Application fingerprint: version 2.11.3.0 build PRE_RELEASE revision 596eeccb063c9e5cf000c6c8fb03ff20f2a97554 build_type RELEASE built at 08 Feb 2022 17:24:13 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0210 10:02:57.094101 15742 catalog_manager.cc:1225] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:9441): Node ae1da62ce17645969b28e43d02e48b7b peer not initialized.
W0210 10:02:57.094103 15743 catalog_manager.cc:1225] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:9441): Node ae1da62ce17645969b28e43d02e48b7b peer not initialized.
W0210 10:02:57.423324 15747 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000010: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:02:57.423382 15747 log_reader.cc:188] T 00000000000000000000000000000000 P ae1da62ce17645969b28e43d02e48b7b: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000010 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:02:57.946458 15747 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000011: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:02:57.946511 15747 log_reader.cc:188] T 00000000000000000000000000000000 P ae1da62ce17645969b28e43d02e48b7b: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000011 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:02:58.270313 15747 transaction_participant.cc:1198] T 00000000000000000000000000000000 P ae1da62ce17645969b28e43d02e48b7b: Transaction not found: e6204b7d-850a-4bfd-8bac-88dc94886f93, for: metadata with write id
W0210 10:02:58.286698 15747 remove_intents_task.cc:55] Remove intents task failed: Aborted (yb/tablet/tablet_peer.cc:1266): Thread pool not ready
W0210 10:02:59.286842 15747 remove_intents_task.cc:55] Remove intents task failed: Aborted (yb/tablet/tablet_peer.cc:1266): Thread pool not ready
W0210 10:03:00.125685 15758 client_master_rpc.cc:90] LookupByIdRpc(tablet: 8b8844f8b96f4369b656b7727924e2bb, num_attempts: 1): Leader Master has changed (127.0.0.1:7100 is no longer the leader), re-trying...
W0210 10:03:00.214926 15808 master.cc:467] ListMasters: Service unavailable (yb/master/scoped_leader_shared_lock.cc:91): Unable to get registration information for peer ([127.0.0.1:7100, 127.0.0.1:7100]) id (ae1da62ce17645969b28e43d02e48b7b): Catalog manager is not initialized. State: 1
W0210 10:03:00.290989 15823 catalog_manager_bg_tasks.cc:124] Catalog manager background task thread going to sleep: Service unavailable (yb/master/scoped_leader_shared_lock.cc:91): Catalog manager is not initialized. State: 1
W0210 10:03:00.626838 15758 client_master_rpc.cc:90] LookupByIdRpc(tablet: 6ac75df927db4a36b07bf63f628ff1e8, num_attempts: 1): Leader Master has changed (127.0.0.1:7100 is no longer the leader), re-trying...
W0210 10:03:04.109220 15839 catalog_loaders.cc:191] Missing table 000043b10000300080000000000045b2 required by tablet b862a57f8caa406299a377321bc2e35d, metadata: committed_consensus_state {
  current_term: 2
  leader_uuid: "98b7c7a15928410e8bdf4cdb2b9b33ca"
  config {
    opid_index: -1
    peers {
      permanent_uuid: "98b7c7a15928410e8bdf4cdb2b9b33ca"
      member_type: VOTER
      last_known_private_addr {
        host: "$ip"
        port: 9100
      }
      last_known_broadcast_addr {
        host: "127.0.0.1"
        port: 9100
      }
      cloud_info {
        placement_cloud: "cloud1"
        placement_region: "datacenter1"
        placement_zone: "rack1"
      }
    }
  }
}
state: RUNNING
state_msg: "Tablet reported with an active leader"
table_id: "000043b1000030008000000000000000.colocated.parent.uuid"
partition {
  partition_key_start: ""
  partition_key_end: ""
}
table_ids: "000043b1000030008000000000000000.colocated.parent.uuid"

tserver log looks similar:

Log file created at: 2022/02/10 10:03:06
Running on machine: $ip
Application fingerprint: version 2.11.3.0 build PRE_RELEASE revision 596eeccb063c9e5cf000c6c8fb03ff20f2a97554 build_type RELEASE built at 08 Feb 2022 17:24:13 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0210 10:03:06.070217 15885 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-b750582d33394537b85b2da58e4e71
7b/wal-000000003: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.070490 15885 log_reader.cc:188] T b750582d33394537b85b2da58e4e717b P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-b750582d33394537b85b2da58e4e717b/wal-000000003 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.070806 15886 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-eda0ea9c63804fdaafac7c368dab95
dd/wal-000000003: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.070830 15886 log_reader.cc:188] T eda0ea9c63804fdaafac7c368dab95dd P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-eda0ea9c63804fdaafac7c368dab95dd/wal-000000003 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.086746 15887 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-05818ae782514a83804c3f38ffa643
1f/wal-000000003: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.086795 15887 log_reader.cc:188] T 05818ae782514a83804c3f38ffa6431f P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-05818ae782514a83804c3f38ffa6431f/wal-000000003 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.092182 15842 pg_wrapper.cc:307] No hba configuration lines found, defaulting to trust all configuration.
W0210 10:03:06.208655 15886 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-eda0ea9c63804fdaafac7c368dab95
dd/wal-000000004: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.208696 15886 log_reader.cc:188] T eda0ea9c63804fdaafac7c368dab95dd P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-eda0ea9c63804fdaafac7c368dab95dd/wal-000000004 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.211668 15887 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-05818ae782514a83804c3f38ffa643
1f/wal-000000004: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.211707 15887 log_reader.cc:188] T 05818ae782514a83804c3f38ffa6431f P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-05818ae782514a83804c3f38ffa6431f/wal-000000004 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.216943 15885 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-b750582d33394537b85b2da58e4e71
7b/wal-000000004: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.216998 15885 log_reader.cc:188] T b750582d33394537b85b2da58e4e717b P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-b750582d33394537b85b2da58e4e717b/wal-000000004 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.389780 15885 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-3ec9dfae98d9459394b5b5acf92c60
82/wal-000000003: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.389812 15885 log_reader.cc:188] T 3ec9dfae98d9459394b5b5acf92c6082 P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-3ec9dfae98d9459394b5b5acf92c6082/wal-000000003 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.395673 15886 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-6ac75df927db4a36b07bf63f628ff1
e8/wal-000000003: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.395716 15886 log_reader.cc:188] T 6ac75df927db4a36b07bf63f628ff1e8 P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-6ac75df927db4a36b07bf63f628ff1e8/wal-000000003 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.418133 15887 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-ceaa05ea815849c090c12c71c24f95
d3/wal-000000003: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.418180 15887 log_reader.cc:188] T ceaa05ea815849c090c12c71c24f95d3 P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-ceaa05ea815849c090c12c71c24f95d3/wal-000000003 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.503043 15885 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-3ec9dfae98d9459394b5b5acf92c60
82/wal-000000004: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.503087 15885 log_reader.cc:188] T 3ec9dfae98d9459394b5b5acf92c6082 P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-3ec9dfae98d9459394b5b5acf92c6082/wal-000000004 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.509040 15887 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-ceaa05ea815849c090c12c71c24f95
d3/wal-000000004: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.509075 15887 log_reader.cc:188] T ceaa05ea815849c090c12c71c24f95d3 P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-ceaa05ea815849c090c12c71c24f95d3/wal-000000004 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.534430 15886 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-6ac75df927db4a36b07bf63f628ff1
e8/wal-000000004: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.534509 15886 log_reader.cc:188] T 6ac75df927db4a36b07bf63f628ff1e8 P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-6ac75df927db4a36b07bf63f628ff1e8/wal-000000004 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.674021 15885 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-5ecb36a0627c4efc9839c52078e2cc
b9/wal-000000003: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.674052 15885 log_reader.cc:188] T 5ecb36a0627c4efc9839c52078e2ccb9 P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-5ecb36a0627c4efc9839c52078e2ccb9/wal-000000003 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.681702 15887 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca2183898c57a0/tablet-8b8844f8b96f4369b656b7727924e2
bb/wal-000000003: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
W0210 10:03:06.681746 15887 log_reader.cc:188] T 8b8844f8b96f4369b656b7727924e2bb P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-3d1f6f31365f448eb4ca
2183898c57a0/tablet-8b8844f8b96f4369b656b7727924e2bb/wal-000000003 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0210 10:03:06.689651 15908 pg_wrapper.cc:714] PostgreSQL server exited with code 256
W0210 10:03:06.692298 15908 pg_wrapper.cc:307] No hba configuration lines found, defaulting to trust all configuration.

ysqlsh then fails:

ysqlsh: could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5433?

I can provide the ~/var directory if it helps (1 GB).

@def- def- added kind/bug This issue is a bug area/docdb YugabyteDB core features labels Feb 10, 2022
@bmatican
Copy link
Contributor

W0210 10:03:04.109220 15839 catalog_loaders.cc:191] Missing table 000043b10000300080000000000045b2 required by tablet b862a57f8caa406299a377321bc2e35d, metadata: committed_consensus_state {
@jaki I think we've seen some flavor of these, for colocated tables. Any thoughts on this?

@def- can you clarify if this was during some DDL phase of the app, or pure CRUD operations?

@jaki
Copy link
Contributor

jaki commented Feb 10, 2022

@def-

  • Do you have postgres logs?
  • Is the tablet b862a57f8caa406299a377321bc2e35d for colocation?
  • Was the table 000043b10000300080000000000045b2 dropped?

Then that warning might be issue #11129. I don't know what's causing "Footer not found".

@bmatican
Copy link
Contributor

Note, I would even ignore the footer not found warnings -- that's just because you're restarting nodes and we don't close the WALs on hard restart

@def-
Copy link
Contributor Author

def- commented Feb 11, 2022

After the initial DDL phase, so this was only SELECT/INSERT/UPDATE.

I have postgres logs, but it's 4 GB uncompressed, so not sure what to search for. Should I upload it somewhere?

Is the tablet b862a57f8caa406299a377321bc2e35d for colocation?

Is there an easy way to find this out from the logs? I ran:

$ grep b862a57f8caa406299a377321bc2e35d yb-tserver.dev-server-dfelsing* | grep colo
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:55:27.944008  6186 tablet_service.cc:1208] Processing CreateTablet for T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca (table=000043b1000030008000000000000000.colocated.parent.tablename [id=000043b1000030008000000000000000.colocated.parent.uuid]), partition=range: [<start>, <end>)
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:55:27.944048  6186 ts_tablet_manager.cc:2078] Get and update data/wal directory assignment map for table: 000043b1000030008000000000000000.colocated.parent.uuid and tablet b862a57f8caa406299a377321bc2e35d
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:55:27.948024  6186 ts_tablet_manager.cc:666] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Created tablet metadata for table: 000043b1000030008000000000000000.colocated.parent.uuid
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:55:27.950886  7216 tablet.cc:393] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Schema version for 000043b1000030008000000000000000.colocated.parent.tablename is 0
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:55:27.951164  7216 tablet.cc:509] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Creating RocksDB database in dir /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:55:27.954224  7216 tablet.cc:700] Opening RocksDB at: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:55:27.957321  7216 version_set.cc:2842] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca [R]: Recovered from manifest file:/nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is <NULL>
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:55:27.958966  7216 tablet.cc:715] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Opening intents DB at: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d.intents
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:55:27.962368  7216 version_set.cc:2842] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca [I]: Recovered from manifest file:/nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d.intents/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is <NULL>
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:55:27.963740  7216 tablet.cc:758] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Successfully opened a RocksDB database at /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d, obj: 0xac92d00
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:58:01.229540  7863 log.cc:666] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Last appended OpId in segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000001: 1.944
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:58:01.232887  7863 log.cc:675] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Rolled over to a new segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000002
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:58:08.239727  7857 log.cc:666] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Last appended OpId in segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000002: 1.1072
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220208-175253.5492:I0208 17:58:08.243881  7857 log.cc:675] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Rolled over to a new segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:35.221828 27511 ts_tablet_manager.cc:2141] Update data/wal directory assignment map for table: 000043b1000030008000000000000000.colocated.parent.uuid and tablet b862a57f8caa406299a377321bc2e35d
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.569167 27554 tablet.cc:509] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Creating RocksDB database in dir /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.569226 27554 tablet.cc:700] Opening RocksDB at: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.570528 27554 version_set.cc:2842] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca [R]: Recovered from manifest file:/nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is <NULL>
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.571112 27554 db_impl.cc:1148] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca [R]: [JOB 2] Delete /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d//MANIFEST-000001 type=4 #1 -- OK
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.571820 27554 db_impl.cc:1148] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca [R]: [JOB 2] Delete /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d//000003.log type=0 #3 -- OK
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.573180 27554 tablet.cc:715] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Opening intents DB at: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d.intents
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.574545 27554 version_set.cc:2842] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca [I]: Recovered from manifest file:/nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d.intents/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is <NULL>
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.576347 27554 db_impl.cc:1148] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca [I]: [JOB 2] Delete /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d.intents//MANIFEST-000001 type=4 #1 -- OK
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.576386 27554 db_impl.cc:1148] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca [I]: [JOB 2] Delete /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d.intents//000003.log type=0 #3 -- OK
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.604442 27554 tablet.cc:758] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Successfully opened a RocksDB database at /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/data/rocksdb/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d, obj: 0x263c5200
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.604555 27554 tablet_bootstrap.cc:686] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Will attempt to recover log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000001
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.604579 27554 tablet_bootstrap.cc:686] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Will attempt to recover log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000002
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.604590 27554 tablet_bootstrap.cc:686] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Will attempt to recover log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:W0209 00:00:52.637569 27554 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:W0209 00:00:52.637612 27554 log_reader.cc:188] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.721117 27554 log_util.cc:623] Scanning /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003 for valid entry headers following offset 2456101...
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.722450 27554 log_util.cc:546] Ignoring log segment corruption in /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003 because there are no log entries following the corrupted one. The server probably crashed in the middle of writing an entry to the write-ahead log or downloaded an active log via remote bootstrap. Error detail: Corruption (yb/consensus/log_util.cc:742): Invalid checksum in log entry head header: found=0, computed=2351477386: Log file corruption detected. Failed trying to read batch #428 at offset 2456101 for log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003: Prior batch offsets: 2455811 2455840 2456072 2456101; Last log entries read: [REPLICATE (1.1288)] [REPLICATE (1.1289)] [REPLICATE (1.1290)] [REPLICATE (1.1291)]
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.722505 27554 log_util.cc:282] Successfully rebuilt footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003 (valid entries through byte offset 2456101)
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.736361 27554 log.cc:597] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Opened existing logs. Last segment is /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.736727 27554 tablet_bootstrap.cc:1172] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003 cannot be used as the first segment to start replay with according to our OpId and retention criteria. Continuing to earlier segments.op_id=1.1073, first_op_time=8312276860.424s, min_seconds_to_retain_logs=120.000s, *replay_from_this_or_earlier_time=8312276740.424s, is_first_op_id_low_enough=0, is_first_op_time_early_enough=0
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.737680 27554 tablet_bootstrap.cc:1172] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000002 cannot be used as the first segment to start replay with according to our OpId and retention criteria. Continuing to earlier segments.op_id=1.945, first_op_time=8312276853.411s, min_seconds_to_retain_logs=120.000s, *replay_from_this_or_earlier_time=8312276740.424s, is_first_op_id_low_enough=0, is_first_op_time_early_enough=0
yb-tserver.dev-server-dfelsing.dfelsing.log.INFO.20220209-000034.27511:I0209 00:00:52.737726 27554 tablet_bootstrap.cc:1172] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000001 cannot be used as the first segment to start replay with according to our OpId and retention criteria. However, this is already the earliest segment so we have to start replay here. We should probably investigate how we got into this situation. op_id=1.1, first_op_time=8312276700.157s, min_seconds_to_retain_logs=120.000s, *replay_from_this_or_earlier_time=8312276740.424s, is_first_op_id_low_enough=0, is_first_op_time_early_enough=1
yb-tserver.dev-server-dfelsing.dfelsing.log.WARNING.20220209-000035.27511:W0209 00:00:52.637569 27554 log_util.cc:224] Could not read footer for segment: /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003: Not found (yb/consensus/log_util.cc:466): Footer not found. Footer magic doesn't match
yb-tserver.dev-server-dfelsing.dfelsing.log.WARNING.20220209-000035.27511:W0209 00:00:52.637612 27554 log_reader.cc:188] T b862a57f8caa406299a377321bc2e35d P 98b7c7a15928410e8bdf4cdb2b9b33ca: Log segment /nfusr/dev-server/dfelsing/var/data/yb-data/tserver/wals/table-000043b1000030008000000000000000.colocated.parent.uuid/tablet-b862a57f8caa406299a377321bc2e35d/wal-000000003 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.

Was the table 000043b10000300080000000000045b2 dropped?

No drops were run during the test. A grep 000043b10000300080000000000045b2 yb-tserver.dev-server-dfelsing.dfelsing.log.* shows nothing.

@jaki
Copy link
Contributor

jaki commented Feb 11, 2022

@def-

I think you cropped the log entry that starts with W0210 10:03:04.109220 15839 catalog_loaders.cc:191] Missing table 000043b10000300080000000000045b2. Can you show the rest of it?

After the initial DDL phase

Were there any DROP TABLE or ALTER TABLE ADD PRIMARY KEY?

I have postgres logs, but it's 4 GB uncompressed, so not sure what to search for.

For all of the logs, are there any fatals or errors? So far, you've only shown warning logs, and the real reason is probably somewhere else. Does postgres crash loop? Is the "can't start up anymore" only for when you try to connect to the database (e.g. with ysqlsh)? Are postgres background workers and the postmaster running fine? A lot of information can be answered with a pstree -p <tserver_pid>.

Is the tablet b862a57f8caa406299a377321bc2e35d for colocation?

Is there an easy way to find this out from the logs?

The log shows table_id: "000043b1000030008000000000000000.colocated.parent.uuid", so it is for colocation.

A grep 000043b10000300080000000000045b2 yb-tserver.dev-server-dfelsing.dfelsing.log.* shows nothing.

It's more likely to appear in the master logs.

@def-
Copy link
Contributor Author

def- commented Feb 14, 2022

I think you cropped the log entry that starts with W0210 10:03:04.109220 15839 catalog_loaders.cc:191] Missing table 000043b10000300080000000000045b2. Can you show the rest of it?

Full file (new run) is attached: yb-master.dev-server-dfelsing.dfelsing.log.INFO.20220214-103230.4236.txt

For all of the logs, are there any fatals or errors?

No fatals, two errors, attached:
yb-master.dev-server-dfelsing.dfelsing.log.ERROR.20220208-175402.5415.txt
yb-tserver.dev-server-dfelsing.dfelsing.log.ERROR.20220209-000035.27511.txt

Does postgres crash loop?

Doesn't crash, but keeps getting restarted:

-bash-4.2$ pstree -p 4349
yb-tserver(4349)─┬─postgres(26894)─┬─postgres(26895)
                 │                 └─postgres(26897)
                 ├─{yb-tserver}(4379)
                 ├─{yb-tserver}(4380)
                 ├─{yb-tserver}(4381)
                 ├─{yb-tserver}(4382)
                 ├─{yb-tserver}(4383)
                 ├─{yb-tserver}(4384)
                 ├─{yb-tserver}(4385)
                 ├─{yb-tserver}(4386)
                 ├─{yb-tserver}(4387)
                 ├─{yb-tserver}(4388)
                 ├─{yb-tserver}(4389)
                 ├─{yb-tserver}(4390)
                 ├─{yb-tserver}(4391)
                 ├─{yb-tserver}(4409)
                 ├─{yb-tserver}(4411)
                 ├─{yb-tserver}(4413)
                 ├─{yb-tserver}(4414)
                 ├─{yb-tserver}(4415)
                 ├─{yb-tserver}(4416)
                 ├─{yb-tserver}(4418)
                 ├─{yb-tserver}(4420)
                 ├─{yb-tserver}(4421)
                 ├─{yb-tserver}(4423)
                 ├─{yb-tserver}(4424)
                 ├─{yb-tserver}(4427)
                 ├─{yb-tserver}(4428)
                 ├─{yb-tserver}(4429)
                 ├─{yb-tserver}(4430)
                 ├─{yb-tserver}(4431)
                 ├─{yb-tserver}(4432)
                 ├─{yb-tserver}(4433)
                 ├─{yb-tserver}(4434)
                 ├─{yb-tserver}(4435)
                 ├─{yb-tserver}(4436)
                 ├─{yb-tserver}(4437)
                 ├─{yb-tserver}(4446)
                 ├─{yb-tserver}(4463)
                 ├─{yb-tserver}(4558)
                 ├─{yb-tserver}(4567)
                 ├─{yb-tserver}(4568)
                 ├─{yb-tserver}(4569)
                 ├─{yb-tserver}(4570)
                 ├─{yb-tserver}(4571)
                 ├─{yb-tserver}(4572)
                 ├─{yb-tserver}(4573)
                 ├─{yb-tserver}(4574)
                 ├─{yb-tserver}(4575)
                 ├─{yb-tserver}(4576)
                 ├─{yb-tserver}(4577)
                 ├─{yb-tserver}(4578)
                 ├─{yb-tserver}(4579)
                 ├─{yb-tserver}(4580)
                 ├─{yb-tserver}(4581)
                 ├─{yb-tserver}(4582)
                 ├─{yb-tserver}(4583)
                 ├─{yb-tserver}(4584)
                 ├─{yb-tserver}(4585)
                 ├─{yb-tserver}(4586)
                 ├─{yb-tserver}(4587)
                 ├─{yb-tserver}(4588)
                 ├─{yb-tserver}(4589)
                 ├─{yb-tserver}(4590)
                 ├─{yb-tserver}(4591)
                 ├─{yb-tserver}(4592)
                 ├─{yb-tserver}(4593)
                 ├─{yb-tserver}(4594)
                 ├─{yb-tserver}(4595)
                 ├─{yb-tserver}(4596)
                 ├─{yb-tserver}(4597)
                 ├─{yb-tserver}(4598)
                 ├─{yb-tserver}(4599)
                 ├─{yb-tserver}(4600)
                 ├─{yb-tserver}(4601)
                 ├─{yb-tserver}(4602)
                 ├─{yb-tserver}(4603)
                 ├─{yb-tserver}(4604)
                 ├─{yb-tserver}(4605)
                 ├─{yb-tserver}(4606)
                 ├─{yb-tserver}(4607)
                 ├─{yb-tserver}(4608)
                 ├─{yb-tserver}(4609)
                 ├─{yb-tserver}(4610)
                 ├─{yb-tserver}(4612)
                 ├─{yb-tserver}(4613)
                 ├─{yb-tserver}(4614)
                 ├─{yb-tserver}(4615)
                 ├─{yb-tserver}(4616)
                 ├─{yb-tserver}(4617)
                 ├─{yb-tserver}(4620)
                 ├─{yb-tserver}(4621)
                 ├─{yb-tserver}(4623)
                 ├─{yb-tserver}(4624)
                 ├─{yb-tserver}(4625)
                 ├─{yb-tserver}(5057)
                 ├─{yb-tserver}(5059)
                 ├─{yb-tserver}(5060)
                 ├─{yb-tserver}(5061)
                 ├─{yb-tserver}(5062)
                 └─{yb-tserver}(5063)
-bash-4.2$ pstree -p 4349
yb-tserver(4349)─┬─pg_supervisorxx(26975)
                 ├─{yb-tserver}(4379)
                 ├─{yb-tserver}(4380)
                 ├─{yb-tserver}(4381)
                 ├─{yb-tserver}(4382)
                 ├─{yb-tserver}(4383)
                 ├─{yb-tserver}(4384)
                 ├─{yb-tserver}(4385)
                 ├─{yb-tserver}(4386)
                 ├─{yb-tserver}(4387)
                 ├─{yb-tserver}(4388)
                 ├─{yb-tserver}(4389)
                 ├─{yb-tserver}(4390)
                 ├─{yb-tserver}(4391)
                 ├─{yb-tserver}(4409)
                 ├─{yb-tserver}(4411)
                 ├─{yb-tserver}(4413)
                 ├─{yb-tserver}(4414)
                 ├─{yb-tserver}(4415)
                 ├─{yb-tserver}(4416)
                 ├─{yb-tserver}(4418)
                 ├─{yb-tserver}(4420)
                 ├─{yb-tserver}(4421)
                 ├─{yb-tserver}(4423)
                 ├─{yb-tserver}(4424)
                 ├─{yb-tserver}(4427)
                 ├─{yb-tserver}(4428)
                 ├─{yb-tserver}(4429)
                 ├─{yb-tserver}(4430)
                 ├─{yb-tserver}(4431)
                 ├─{yb-tserver}(4432)
                 ├─{yb-tserver}(4433)
                 ├─{yb-tserver}(4434)
                 ├─{yb-tserver}(4435)
                 ├─{yb-tserver}(4436)
                 ├─{yb-tserver}(4437)
                 ├─{yb-tserver}(4446)
                 ├─{yb-tserver}(4463)
                 ├─{yb-tserver}(4558)
                 ├─{yb-tserver}(4567)
                 ├─{yb-tserver}(4568)
                 ├─{yb-tserver}(4569)
                 ├─{yb-tserver}(4570)
                 ├─{yb-tserver}(4571)
                 ├─{yb-tserver}(4572)
                 ├─{yb-tserver}(4573)
                 ├─{yb-tserver}(4574)
                 ├─{yb-tserver}(4575)
                 ├─{yb-tserver}(4576)
                 ├─{yb-tserver}(4577)
                 ├─{yb-tserver}(4578)
                 ├─{yb-tserver}(4579)
                 ├─{yb-tserver}(4580)
                 ├─{yb-tserver}(4581)
                 ├─{yb-tserver}(4582)
                 ├─{yb-tserver}(4583)
                 ├─{yb-tserver}(4584)
                 ├─{yb-tserver}(4585)
                 ├─{yb-tserver}(4586)
                 ├─{yb-tserver}(4587)
                 ├─{yb-tserver}(4588)
                 ├─{yb-tserver}(4589)
                 ├─{yb-tserver}(4590)
                 ├─{yb-tserver}(4591)
                 ├─{yb-tserver}(4592)
                 ├─{yb-tserver}(4593)
                 ├─{yb-tserver}(4594)
                 ├─{yb-tserver}(4595)
                 ├─{yb-tserver}(4596)
                 ├─{yb-tserver}(4597)
                 ├─{yb-tserver}(4598)
                 ├─{yb-tserver}(4599)
                 ├─{yb-tserver}(4600)
                 ├─{yb-tserver}(4601)
                 ├─{yb-tserver}(4602)
                 ├─{yb-tserver}(4603)
                 ├─{yb-tserver}(4604)
                 ├─{yb-tserver}(4605)
                 ├─{yb-tserver}(4606)
                 ├─{yb-tserver}(4607)
                 ├─{yb-tserver}(4608)
                 ├─{yb-tserver}(4609)
                 ├─{yb-tserver}(4610)
                 ├─{yb-tserver}(4612)
                 ├─{yb-tserver}(4613)
                 ├─{yb-tserver}(4614)
                 ├─{yb-tserver}(4615)
                 ├─{yb-tserver}(4616)
                 ├─{yb-tserver}(4617)
                 ├─{yb-tserver}(4620)
                 ├─{yb-tserver}(4621)
                 ├─{yb-tserver}(4623)
                 ├─{yb-tserver}(4624)
                 ├─{yb-tserver}(4625)
                 ├─{yb-tserver}(5057)
                 ├─{yb-tserver}(5059)
                 ├─{yb-tserver}(5060)
                 ├─{yb-tserver}(5061)
                 ├─{yb-tserver}(5062)
                 └─{yb-tserver}(5063)

Postgres log:

2022-02-14 10:49:33.040 UTC [28983] LOG:  database system was interrupted; last known up at 2022-02-08 20:19:34 UTC
I0214 10:49:33.044778 28984 webserver.cc:168] Starting webserver on 0.0.0.0:13000
I0214 10:49:33.044919 28984 webserver.cc:173] Document root: /nfusr/dev-server/dfelsing/code/yugabyte-db/www
I0214 10:49:33.045089 28984 webserver.cc:163] Webserver listen spec is 0.0.0.0:13000
I0214 10:49:33.045274 28984 webserver.cc:262] Webserver started. Bound to: http://0.0.0.0:13000/
2022-02-14 10:49:33.046 UTC [28983] LOG:  invalid checkpoint record
2022-02-14 10:49:33.046 UTC [28983] FATAL:  could not locate required checkpoint record
2022-02-14 10:49:33.046 UTC [28983] HINT:  If you are not restoring from a backup, try removing the file "/nfusr/dev-server/dfelsing/var/data/pg_data/backup_label".
2022-02-14 10:49:33.052 UTC [28981] LOG:  startup process (PID 28983) exited with exit code 1
2022-02-14 10:49:33.052 UTC [28981] LOG:  aborting startup due to startup process failure
2022-02-14 10:49:33.053 UTC [28981] LOG:  database system is shut down
2022-02-14 10:49:33.247 UTC [28988] LOG:  database system was interrupted; last known up at 2022-02-08 20:19:34 UTC
I0214 10:49:33.251636 28989 webserver.cc:168] Starting webserver on 0.0.0.0:13000
I0214 10:49:33.251785 28989 webserver.cc:173] Document root: /nfusr/dev-server/dfelsing/code/yugabyte-db/www
I0214 10:49:33.251961 28989 webserver.cc:163] Webserver listen spec is 0.0.0.0:13000
I0214 10:49:33.252140 28989 webserver.cc:262] Webserver started. Bound to: http://0.0.0.0:13000/
2022-02-14 10:49:33.254 UTC [28988] LOG:  invalid checkpoint record
2022-02-14 10:49:33.254 UTC [28988] FATAL:  could not locate required checkpoint record
2022-02-14 10:49:33.254 UTC [28988] HINT:  If you are not restoring from a backup, try removing the file "/nfusr/dev-server/dfelsing/var/data/pg_data/backup_label".
2022-02-14 10:49:33.259 UTC [28986] LOG:  startup process (PID 28988) exited with exit code 1
2022-02-14 10:49:33.259 UTC [28986] LOG:  aborting startup due to startup process failure
2022-02-14 10:49:33.261 UTC [28986] LOG:  database system is shut down
2022-02-14 10:49:33.455 UTC [28993] LOG:  database system was interrupted; last known up at 2022-02-08 20:19:34 UTC
I0214 10:49:33.459271 28994 webserver.cc:168] Starting webserver on 0.0.0.0:13000
I0214 10:49:33.459398 28994 webserver.cc:173] Document root: /nfusr/dev-server/dfelsing/code/yugabyte-db/www
I0214 10:49:33.459564 28994 webserver.cc:163] Webserver listen spec is 0.0.0.0:13000
I0214 10:49:33.459758 28994 webserver.cc:262] Webserver started. Bound to: http://0.0.0.0:13000/
2022-02-14 10:49:33.462 UTC [28993] LOG:  invalid checkpoint record
2022-02-14 10:49:33.462 UTC [28993] FATAL:  could not locate required checkpoint record
2022-02-14 10:49:33.462 UTC [28993] HINT:  If you are not restoring from a backup, try removing the file "/nfusr/dev-server/dfelsing/var/data/pg_data/backup_label".
2022-02-14 10:49:33.467 UTC [28991] LOG:  startup process (PID 28993) exited with exit code 1
2022-02-14 10:49:33.467 UTC [28991] LOG:  aborting startup due to startup process failure
2022-02-14 10:49:33.468 UTC [28991] LOG:  database system is shut down
2022-02-14 10:49:33.669 UTC [28998] LOG:  database system was interrupted; last known up at 2022-02-08 20:19:34 UTC
I0214 10:49:33.672663 28999 webserver.cc:168] Starting webserver on 0.0.0.0:13000
I0214 10:49:33.672816 28999 webserver.cc:173] Document root: /nfusr/dev-server/dfelsing/code/yugabyte-db/www
I0214 10:49:33.672992 28999 webserver.cc:163] Webserver listen spec is 0.0.0.0:13000
I0214 10:49:33.673172 28999 webserver.cc:262] Webserver started. Bound to: http://0.0.0.0:13000/
2022-02-14 10:49:33.676 UTC [28998] LOG:  invalid checkpoint record
2022-02-14 10:49:33.676 UTC [28998] FATAL:  could not locate required checkpoint record
2022-02-14 10:49:33.676 UTC [28998] HINT:  If you are not restoring from a backup, try removing the file "/nfusr/dev-server/dfelsing/var/data/pg_data/backup_label".
2022-02-14 10:49:33.681 UTC [28996] LOG:  startup process (PID 28998) exited with exit code 1
2022-02-14 10:49:33.681 UTC [28996] LOG:  aborting startup due to startup process failure
2022-02-14 10:49:33.682 UTC [28996] LOG:  database system is shut down
2022-02-14 10:49:33.878 UTC [29003] LOG:  database system was interrupted; last known up at 2022-02-08 20:19:34 UTC
I0214 10:49:33.882380 29004 webserver.cc:168] Starting webserver on 0.0.0.0:13000
I0214 10:49:33.882505 29004 webserver.cc:173] Document root: /nfusr/dev-server/dfelsing/code/yugabyte-db/www
I0214 10:49:33.882673 29004 webserver.cc:163] Webserver listen spec is 0.0.0.0:13000
I0214 10:49:33.882891 29004 webserver.cc:262] Webserver started. Bound to: http://0.0.0.0:13000/
2022-02-14 10:49:33.885 UTC [29003] LOG:  invalid checkpoint record
2022-02-14 10:49:33.885 UTC [29003] FATAL:  could not locate required checkpoint record
2022-02-14 10:49:33.885 UTC [29003] HINT:  If you are not restoring from a backup, try removing the file "/nfusr/dev-server/dfelsing/var/data/pg_data/backup_label".
2022-02-14 10:49:33.890 UTC [29001] LOG:  startup process (PID 29003) exited with exit code 1
2022-02-14 10:49:33.890 UTC [29001] LOG:  aborting startup due to startup process failure
2022-02-14 10:49:33.892 UTC [29001] LOG:  database system is shut down

@bmatican bmatican assigned jasonyb and unassigned bmatican Feb 14, 2022
@bmatican bmatican added the area/ysql Yugabyte SQL (YSQL) label Feb 14, 2022
@bmatican bmatican changed the title [DocDB][SQLsmith] Corrupted data after crash, can't start up anymore [DocDB][SQLsmith] Colocated table corruption after crash loop Feb 14, 2022
@jaki
Copy link
Contributor

jaki commented Feb 15, 2022

@def-

You didn't answer this question:

Were there any DROP TABLE or ALTER TABLE ADD PRIMARY KEY?

Full file (new run) is attached

Are you sure the issue is in this log? I don't see any CreateTable at all. I was originally asking only for the rest of that "Missing table" log entry since I wanted to know what other table_ids were in there.

No fatals, two errors, attached:

These don't look serious.

Doesn't crash, but keeps getting restarted:

That's what I'm looking for. The postgres logs are more helpful, then.

Regarding

2022-02-14 10:49:33.046 UTC [28983] LOG:  invalid checkpoint record
2022-02-14 10:49:33.046 UTC [28983] FATAL:  could not locate required checkpoint record
2022-02-14 10:49:33.046 UTC [28983] HINT:  If you are not restoring from a backup, try removing the file "/nfusr/dev-server/dfelsing/var/data/pg_data/backup_label".

that's probably the main culprit. I never saw it before. Did you try removing the file like the hint says?

@bmatican, I wouldn't call this "Colocated table corruption" yet since there's no evidence of that. The master log "Missing table" is nonfatal and looks more like issue #11129 like I previously mentioned.

@def-
Copy link
Contributor Author

def- commented Feb 15, 2022

Were there any DROP TABLE or ALTER TABLE ADD PRIMARY KEY?

Sorry, forgot to answer. There should have been no DROP TABLE commands, but ALTER TABLES yes. For an initial setup I ran: (cat src/postgres/src/test/regress/sql/yb_*.sql | grep -vi "DROP") | bin/ysqlsh

Are you sure the issue is in this log? I don't see any CreateTable at all. I was originally asking only for the rest of that "Missing table" log entry since I wanted to know what other table_ids were in there.

There is a line catalog_loaders.cc:191] Missing table 000043b10000300080000000000045b2 required by tablet b862a57f8caa406299a377321bc2e35d, metadata: committed_consensus_state { in there, and a long log after that, I didn't want to fill up the entire comments with that.

@def-
Copy link
Contributor Author

def- commented Feb 15, 2022

that's probably the main culprit. I never saw it before. Did you try removing the file like the hint says?

Tried it now, that helps. Starts up fine now. The content of the file was:

START WAL LOCATION: 4/79000028 (file 000000010000000400000079)
CHECKPOINT LOCATION: 4/79000070
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2022-02-08 20:19:22 UTC
LABEL: a
START TIMELINE: 1

@jaki
Copy link
Contributor

jaki commented Feb 15, 2022

@def-

For an initial setup I ran: (cat src/postgres/src/test/regress/sql/yb_*.sql | grep -vi "DROP") | bin/ysqlsh

That initial setup is interesting.

  • The files sometimes depend on other ones, so if you apply them alphabetically, you're likely to get a lot of failed queries since dependencies are out of order.
  • The files may depend on connections and session variables being reset per-file, so you may end up running some files not intended for colocation on a colocated database or other files with an unintended session variable.
  • The grep filtering out DROP possibly could have issues if there's a multiline statement involving the word "DROP", but I doubt that exists.

There's still a chance ALTER TABLE ADD PRIMARY KEY happened. That internally replaces a nonpk table with a pk one, meaning it internally drops the nonpk table. Then, issue #11129 is possible. If you try filtering out lines with "ADD PRIMARY KEY" and rerun, you may not see that warning any more.

There is a line catalog_loaders.cc:191] Missing table 000043b10000300080000000000045b2 required by tablet b862a57f8caa406299a377321bc2e35d, metadata: committed_consensus_state { in there

Sorry, I missed that and the many others in the file. It looks like expected, and I wonder what that table was. If CreateTable were in the master logs, we could find out. Do you have those earlier master logs?


About that backup_label file, it's probably the main reason for this issue. Is it reproducible?

@def-
Copy link
Contributor Author

def- commented Feb 17, 2022

About that backup_label file, it's probably the main reason for this issue. Is it reproducible?

Not so easy. I ran this a few times for a full day (initial setup, 4 SQLsmith instances against it for the rest of the day) and only two times I got this issue. The initial setup alone is not enough, crashing less commonly by running with (cat src/postgres/src/test/regress/sql/yb_*.sql | grep -vi "DROP" | grep -vi "TRIGGER") | bin/ysqlsh it never got into this state of not starting up anymore.

Sorry, I missed that and the many others in the file. It looks like expected, and I wonder what that table was. If CreateTable were in the master logs, we could find out. Do you have those earlier master logs?

I uploaded the full var directory: https://ddnet.tw/var-11415.tar.gz

@jaki
Copy link
Contributor

jaki commented Feb 17, 2022

@def-

Found the CreateTable for 000043b10000300080000000000045b2 in yb-master.dev-server-dfelsing.dfelsing.log.INFO.20220208-175245.5415: name: "ft_h_tab_xid". That table failed to create:

I0208 17:56:27.527658  7387 ybccmds.c:482] Creating Table colocation_test.public.ft_h_tab_xid
2022-02-08 17:56:27.574 UTC [7387] ERROR:  data type xid has no default operator class for access method "lsm"
2022-02-08 17:56:27.574 UTC [7387] HINT:  You must specify an operator class for the index or define a default operator class for the data type.
2022-02-08 17:56:27.574 UTC [7387] STATEMENT:  CREATE TABLE ft_h_tab_xid (feature_col XID PRIMARY KEY);

so master dropped the docdb table

I0208 17:56:27.734095  7453 catalog_manager.cc:3473] Table transaction failed, deleting: ft_h_tab_xid [id=000043b10000300080000000000045b2]
I0208 17:56:27.734205  7453 catalog_manager.cc:4553] Servicing DeleteTable request from internal request: table { table_id: "000043b10000300080000000000045b2" table_name: "ft_h_tab_xid" } is_index_table: false
I0208 17:56:27.736016  7453 catalog_manager_ent.cc:2476] Deleting CDC streams for tables: 000043b10000300080000000000045b2
I0208 17:56:27.736049  7453 catalog_manager.cc:4655] Deleting table: ft_h_tab_xid, retained by: []
I0208 17:56:27.736196  7453 catalog_manager.cc:4699] Successfully initiated deletion of table with table_id: "000043b10000300080000000000045b2"
table_name: "ft_h_tab_xid"
 per request from internal request
I0208 17:56:27.741055  7454 catalog_manager.cc:4895] Marking table as DELETED: ft_h_tab_xid [id=000043b10000300080000000000045b2]

but issue #11129 means the drop doesn't fully clean up the metadata. This is not anything new.


The backup_label issue needs more investigation.

git grep -rl 'backup_label' | sort -r

gave me postgresql-2022-02-08_201935.log as the first log with the backup_label issue. Inside there, I see segmentation fault.

2022-02-08 20:19:35.683 UTC [18115] LOG:  server process (PID 20859) was terminated by signal 11: Segmentation fault
2022-02-08 20:19:35.683 UTC [18115] DETAIL:  Failed process was running: select
    subq_0.c3 as c0,
    pg_catalog.interval_pl(
      cast(case when case when cast(null as varbit) >= cast(null as varbit) then case when cast(null as timestamptz) > cast(null as "timestamp") then cast(null as "bit") else cast(null as "bit") end
               else case when cast(null as timestamptz) > cast(null as "timestamp") then cast(null as "bit") else cast(null as "bit") end
               end
             > case when cast(null as int8) = (select k from public.single_row_complex limit 1 offset 1)
                 then case when subq_0.c4 is NULL then cast(null as "bit") else cast(null as "bit") end
               else case when subq_0.c4 is NULL then cast(null as "bit") else cast(null as "bit") end
               end
             then cast(coalesce(cast(null as "interval"),
          pg_catalog.time_mi_time(
            cast(cast(null as "time") as "time"),
            cast(
              pg_catalog.min(
                cast((select c from public.tbl13 limit 1 offset 6)
                   as "time")) over (partition by su
2022-02-08 20:19:35.683 UTC [18115] LOG:  terminating any other active server processes

Can you get a backtrace from the corresponding core, core.20859?

@def-
Copy link
Contributor Author

def- commented Feb 17, 2022

Thanks for checking, good to know this is a known issue.

Can you get a backtrace from the corresponding core, core.20859?

Core was generated by `postgres: yugabyte yugabyte 127.0.0.1(50984) SELECT                           '.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000005b27bc in YbDatumToText (datum=0, data=0x7ffe67ab6180, bytes=0x7ffe67ab6178) at ../../../../../../../src/postgres/src/backend/catalog/yb_catalog/yb_type.c:254
254	../../../../../../../src/postgres/src/backend/catalog/yb_catalog/yb_type.c: No such file or directory.
(gdb) bt
#0  0x00000000005b27bc in YbDatumToText (datum=0, data=0x7ffe67ab6180, bytes=0x7ffe67ab6178) at ../../../../../../../src/postgres/src/backend/catalog/yb_catalog/yb_type.c:254
#1  0x00007f42ac0620ec in yb::pggate::PgConstant::PgConstant(PgTypeEntity const*, bool, char const*, unsigned long, bool, yb::pggate::PgExpr::Opcode) (this=0x395ec00, type_entity=<optimized out>,
    collate_is_valid_non_c=<optimized out>, collation_sortkey=0x0, datum=0, is_null=false, opcode=PG_EXPR_CONSTANT) at ../../src/yb/yql/pggate/pg_expr.cc:544
#2  0x00007f42ac03c4e0 in yb::pggate::PgApiImpl::NewConstant(yb::pggate::PgStatement*, PgTypeEntity const*, bool, char const*, unsigned long, bool, yb::pggate::PgExpr**) (this=<optimized out>, stmt=0x26e8780,
    type_entity=0x349b68 <YbTypeEntityTable+360>, collate_is_valid_non_c=<optimized out>, collation_sortkey=0x0, datum=0, is_null=<optimized out>, expr_handle=0x7ffe67ab6838)
    at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/memory:2622
#3  0x00007f42ac02a299 in YBCPgNewConstant (stmt=0x7ffe67ab6178, type_entity=0x4, collate_is_valid_non_c=false, collation_sortkey=0x0, datum=0, is_null=true, expr_handle=0x7ffe67ab6838)
    at ../../src/yb/yql/pggate/ybc_pggate.cc:784
#4  0x00000000005a5caa in ybcBindScanKeys (ybScan=0x4971c30, scan_plan=0x7ffe67ab6890)
    at /nfusr/dev-server/dfelsing/code/yugabyte-db/src/postgres/src/backend/executor/../../../../../../src/postgres/src/backend/executor/ybcExpr.c:64
#5  0x00000000005a4107 in ybcBeginScan (relation=<optimized out>, index=<optimized out>, xs_want_itup=false, nkeys=<optimized out>, key=<optimized out>, pg_scan_plan=0x43c81e8)
    at ../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c:1325
#6  0x00000000005a901a in ybcinrescan (scan=0x4971a90, scankey=0x4970468, nscankeys=1, orderbys=<optimized out>, norderbys=<optimized out>)
    at ../../../../../../../src/postgres/src/backend/access/yb_access/yb_lsm.c:414
#7  0x000000000076603a in IndexNext (node=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/nodeIndexscan.c:131
#8  0x0000000000746f17 in ExecScan (node=0x38de518, accessMtd=0x765f20 <IndexNext>, recheckMtd=0x766200 <IndexRecheck>) at ../../../../../../src/postgres/src/backend/executor/execScan.c:171
#9  0x00000000007675f6 in ExecProcNode (node=0x38de518) at ../../../../../../src/postgres/src/include/executor/executor.h:249
#10 ExecLimit (pstate=0x38de328) at ../../../../../../src/postgres/src/backend/executor/nodeLimit.c:113
#11 0x000000000073a52e in standard_ExecutorRun (queryDesc=0x1ffc518, direction=<optimized out>, count=0, execute_once=<optimized out>) at ../../../../../../src/postgres/src/include/executor/executor.h:249
#12 0x00007f42a5419565 in pgss_ExecutorRun (queryDesc=0x1ffc518, direction=ForwardScanDirection, count=0, execute_once=true) at ../../../../../src/postgres/contrib/pg_stat_statements/pg_stat_statements.c:947
#13 0x00007f42a54103da in ybpgm_ExecutorRun (queryDesc=0x1ffc518, direction=ForwardScanDirection, count=0, execute_once=true) at ../../../../../src/postgres/contrib/yb_pg_metrics/yb_pg_metrics.c:499
#14 0x000000000093b9c6 in PortalRunSelect (portal=0x1d7a118, forward=<optimized out>, count=0, dest=<optimized out>)
    at /nfusr/dev-server/dfelsing/code/yugabyte-db/src/postgres/src/backend/executor/../../../../../../src/postgres/src/backend/executor/execMain.c:305
#15 0x000000000093b4a6 in PortalRun (portal=0x1d7a118, count=9223372036854775807, isTopLevel=true, run_once=<optimized out>, dest=0x43c8d78, altdest=0x43c8d78, completionTag=0x7ffe67ab6f70 "")
    at ../../../../../../src/postgres/src/backend/tcop/pquery.c:784
#16 0x0000000000937776 in yb_exec_simple_query_impl (query_string=0x1c7e938) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:1161
#17 0x0000000000937e4e in yb_exec_query_wrapper (exec_context=0x1c7e000, restart_data=<optimized out>, functor=0x936ed0 <yb_exec_simple_query_impl>, functor_context=0x1c7e938)
    at ../../../../../../src/postgres/src/backend/tcop/postgres.c:4445
#18 0x000000000093472d in PostgresMain (argc=1, argv=<optimized out>, dbname=<optimized out>, username=0x1c61fe8 "yugabyte") at ../../../../../../src/postgres/src/backend/tcop/postgres.c:4479
#19 0x000000000089013b in BackendRun (port=0x1afa1e0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4470
#20 0x000000000088f872 in ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4136
#21 0x000000000088c0b1 in PostmasterMain (argc=<optimized out>, argv=0x1b106c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1417
#22 0x00000000007ac8d3 in PostgresServerProcessMain (argc=23, argv=0x1b106c0) at ../../../../../../src/postgres/src/backend/main/main.c:234
#23 0x00000000004de432 in main ()

So this is #11363

@tverona1 tverona1 changed the title [DocDB][SQLsmith] Colocated table corruption after crash loop [Phase 1] [colocated] [DocDB][SQLsmith] Colocated table corruption after crash loop May 17, 2022
@yugabyte-ci yugabyte-ci added the priority/medium Medium priority issue label May 17, 2022
@tverona1 tverona1 changed the title [Phase 1] [colocated] [DocDB][SQLsmith] Colocated table corruption after crash loop [Phase 1] [colocation] [DocDB][SQLsmith] Colocated table corruption after crash loop May 17, 2022
@frozenspider frozenspider changed the title [Phase 1] [colocation] [DocDB][SQLsmith] Colocated table corruption after crash loop [Phase 1][colocation][DocDB][SQLsmith] Colocated table corruption after crash loop May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features area/ysql Yugabyte SQL (YSQL) duplicate kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

6 participants