Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

主从服务器突然不同步了, 但是长时间不恢复, 要重启, 才可以 #933

Closed
epubreader opened this issue Jun 29, 2020 · 6 comments

Comments

@epubreader
Copy link

主从服务器不知道什么原因突然不同步了, 但是长时间不恢复, 要重启, 才可以
主服务器日志如下:

| I0623 18:01:52.907841    22 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 71, ip_port: 10.0.1.150:35924
| I0623 18:02:02.309511     1 pika_server.cc:273] Goodbye...
| I0623 18:01:56.026744     1 pika_dispatch_thread.cc:27] dispatch thread 140276433671936 exit!!!
| I0623 18:01:56.036780     1 pika_server.cc:113] Delete slave success
| I0623 18:02:08.311497     1 pika_dispatch_thread.cc:27] dispatch thread 139996992329472 exit!!!
| I0623 18:01:56.997107     1 pika_auxiliary_thread.cc:17] PikaAuxiliary thread 140276416886528 exit!!!
| I0623 18:02:08.314826     1 pika_auxiliary_thread.cc:17] PikaAuxiliary thread 139996975544064 exit!!!
| sh: line 0: kill: -: arguments must be process or job IDs
| sh: line 0: kill: -: arguments must be process or job IDs
| I0623 18:02:08.319499     1 pika_rsync_service.cc:35] PikaRsyncService exit!!!
| I0623 18:02:08.319536     1 pika_monitor_thread.cc:28] PikaMonitorThread 139998511958528 exit!!!
| I0623 18:02:08.323971     1 pika_server.cc:132] PikaServer 139998511958528 exit!!!
| I0623 18:02:08.324070     1 pika_repl_client.cc:38] PikaReplClient exit!!!
| I0623 18:02:08.324097     1 pika_repl_server.cc:31] PikaReplServer exit!!!
| path : /pika/conf/pika.conf
| -----------Pika server 3.2.9 ----------
| -----------Pika config list----------
|  1 port 9221
|  2 thread-num 5
|  3 thread-pool-size 12
|  4 sync-thread-num 6
|  5 log-path ./log/
|  6 db-path ./db/
|  7 write-buffer-size 268435456
|  8 timeout 60
|  9 requirepass 1111
| 10 masterauth 1111
| 11 userpass 2222
| 12 userblacklist
| 13 instance-mode classic
| 14 databases 1
| 15 default-slot-num 1024
| 16 dump-prefix
| 17 dump-path ./dump/
| 18 dump-expire 0
| 19 pidfile ./pika.pid
| 20 maxclients 20000
| 21 target-file-size-base 20971520
| 22 expire-logs-days 7
| 23 expire-logs-nums 10
| 24 root-connection-num 2
| 25 slowlog-write-errorlog no
| 26 slowlog-log-slower-than 10000
| 27 slowlog-max-len 128
| 28 db-sync-path ./dbsync/
| 29 db-sync-speed -1
| 30 slave-priority 100
| 31 server-id 1
| 32 sync-window-size 9000
| 33 max-conn-rbuf-size 268435456
| 34 write-binlog yes
| 35 binlog-file-size 104857600
| 36 max-cache-statistic-keys 0
| 37 small-compaction-threshold 5000
| 38 max-write-buffer-size 10737418240
| 39 max-client-response-size 1073741824
| 40 compression snappy
| 41 max-background-flushes 1
| 42 max-background-compactions 2
| 43 max-cache-files 5000
| 44 max-bytes-for-level-multiplier 10
| -----------Pika config end----------
| I0623 18:01:57.233522     1 pika_rsync_service.cc:35] PikaRsyncService exit!!!
| I0623 18:01:57.233603     1 pika_monitor_thread.cc:28] PikaMonitorThread 140278029359616 exit!!!
| I0623 18:01:57.632009     1 pika_server.cc:132] PikaServer 140278029359616 exit!!!
| I0623 18:01:57.638080     1 pika_repl_client.cc:38] PikaReplClient exit!!!
| I0623 18:01:57.638273     1 pika_repl_server.cc:31] PikaReplServer exit!!!
| path : /pika/conf/pika.conf
| -----------Pika server 3.2.9 ----------
| -----------Pika config list----------
|  1 port 9221
|  2 thread-num 5
|  3 thread-pool-size 12
|  4 sync-thread-num 6
|  5 log-path ./log/
|  6 db-path ./db/
|  7 write-buffer-size 268435456
|  8 timeout 60
|  9 requirepass 1111
| 10 masterauth 1111
| 11 userpass 2222
| 12 userblacklist
| 13 instance-mode classic
| 14 databases 1
| 15 default-slot-num 1024
| 16 dump-prefix
| 17 dump-path ./dump/
| 18 dump-expire 0
| 19 pidfile ./pika.pid
| 20 maxclients 20000
| 21 target-file-size-base 20971520
| 22 expire-logs-days 7
| 23 expire-logs-nums 10
| 24 root-connection-num 2
| 25 slowlog-write-errorlog no
| 26 slowlog-log-slower-than 10000
| 27 slowlog-max-len 128
| 28 db-sync-path ./dbsync/
| 29 db-sync-speed -1
| 30 slave-priority 100
| 31 server-id 1
| 32 sync-window-size 9000
| 33 max-conn-rbuf-size 268435456
| 34 write-binlog yes
| 35 binlog-file-size 104857600
| 36 max-cache-statistic-keys 0
| 37 small-compaction-threshold 5000
| 38 max-write-buffer-size 10737418240
| 39 max-client-response-size 1073741824
| 40 compression snappy
| 41 max-background-flushes 1
| 42 max-background-compactions 2
| 43 max-cache-files 5000
| 44 max-bytes-for-level-multiplier 10
| -----------Pika config end----------

从服务器日志:

| 38 small-compaction-threshold 5000
| I0528 03:09:56.015735    50 pika_partition.cc:617] db0 Success purge 1
| 39 max-write-buffer-size 10737418240
| 40 max-client-response-size 1073741824
| I0530 03:40:00.199074     6 pika_repl_client_thread.cc:21] ReplClient Close conn, fd=92, ip_port=pika:11221
| 41 compression snappy
| W0530 03:40:00.199463     6 pika_repl_client_thread.cc:31] Master conn disconnect : pika:11221 try reconnect
| 42 max-background-flushes 1
| W0530 03:40:00.217085    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| 43 max-background-compactions 2
| 44 max-cache-files 5000
| W0530 03:40:03.321048    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| W0530 03:40:06.423389    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| 45 max-bytes-for-level-multiplier 10
| -----------Pika config end----------
| W0530 03:40:09.526119    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| W0530 03:40:12.628903    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| W0530 03:40:15.731853    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| W0530 03:40:18.834913    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| W0530 03:40:21.937832    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| W0530 03:40:25.039788    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| W0530 03:40:28.142222    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| W0530 03:40:31.245103    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| W0530 03:40:34.348099    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| W0530 03:40:37.449506    49 pika_repl_client.cc:114] Failed to connect master, Master (pika:9221), try reconnect
| I0530 03:40:40.551244    49 pika_repl_client.cc:145] Try Send Meta Sync Request to Master (pika:9221)
| I0530 03:40:40.555188     9 pika_server.cc:543] Mark try connect finish
| I0530 03:40:40.555253     9 pika_repl_client_conn.cc:139] Finish to handle meta sync response
| I0530 03:40:40.652730    10 pika_repl_client_conn.cc:220] Partition: db0 TrySync Ok
| I0531 06:51:09.089020    50 pika_partition.cc:617] db0 Success purge 1
| I0602 18:29:21.031162     6 pika_repl_client_thread.cc:37] ReplClient Timeout conn, fd=67, ip_port=pika:11221
| W0602 18:29:21.031205     6 pika_repl_client_thread.cc:48] Master conn timeout : pika:11221 try reconnect
| I0602 18:29:21.122483    49 pika_repl_client.cc:145] Try Send Meta Sync Request to Master (pika:9221)
| W0602 18:29:21.125631    11 pika_repl_client_conn.cc:100] Meta Sync Failed: Slave AlreadyExist
| W0602 18:29:21.125690    11 pika_server.cc:761] Sync error, set repl_state to PIKA_REPL_ERROR
| I0602 18:29:21.125988     6 pika_repl_client_thread.cc:21] ReplClient Close conn, fd=67, ip_port=pika:11221
| I0603 00:03:18.296692     1 pika.cc:98] Catch Signal 15, cleanup...
| I0603 00:03:18.296782     1 pika_server.cc:273] Goodbye...
| I0603 00:03:24.651252     1 pika_dispatch_thread.cc:27] dispatch thread 139759930283776 exit!!!
| I0603 00:03:25.453156     1 pika_auxiliary_thread.cc:17] PikaAuxiliary thread 139759703811840 exit!!!
| sh: line 0: kill: -: arguments must be process or job IDs
| I0603 00:03:25.480860     1 pika_rsync_service.cc:35] PikaRsyncService exit!!!
| I0603 00:03:25.481288     1 pika_monitor_thread.cc:28] PikaMonitorThread 139761466087936 exit!!!
| I0603 00:03:25.509163     1 pika_server.cc:132] PikaServer 139761466087936 exit!!!
| I0603 00:03:25.509490     1 pika_repl_client.cc:38] PikaReplClient exit!!!
| I0603 00:03:25.509539     1 pika_repl_server.cc:31] PikaReplServer exit!!!
@kernelai
Copy link
Collaborator

主有0602 18:29左右的日志吗?

@epubreader
Copy link
Author

今天又遇到了, 服务器版本为pikadb/pika:v3.2.9

从服务器日志
vudswggb2xcp@Ubuntu-1910-eoan-64-minimal | I1103 15:05:48.621208 50 pika_partition.cc:617] db0 Success purge 1
vudswggb2xcp@Ubuntu-1910-eoan-64-minimal | I1106 10:49:08.022241 50 pika_partition.cc:617] db0 Success purge 1
vudswggb2xcp@Ubuntu-1910-eoan-64-minimal | I1110 02:16:35.936123 50 pika_partition.cc:617] db0 Success purge 1
vudswggb2xcp@Ubuntu-1910-eoan-64-minimal | I1114 05:12:28.891971 50 pika_partition.cc:617] db0 Success purge 1
vudswggb2xcp@Ubuntu-1910-eoan-64-minimal | I1118 01:19:40.028442 6 pika_repl_client_thread.cc:37] ReplClient Timeout conn, fd=364, ip_port=pika:11221
vudswggb2xcp@Ubuntu-1910-eoan-64-minimal | W1118 01:19:40.028730 6 pika_repl_client_thread.cc:48] Master conn timeout : pika:11221 try reconnect
vudswggb2xcp@Ubuntu-1910-eoan-64-minimal | I1118 01:19:40.105703 49 pika_repl_client.cc:145] Try Send Meta Sync Request to Master (pika:9221)
vudswggb2xcp@Ubuntu-1910-eoan-64-minimal | W1118 01:19:40.108279 11 pika_repl_client_conn.cc:100] Meta Sync Failed: Slave AlreadyExist
vudswggb2xcp@Ubuntu-1910-eoan-64-minimal | W1118 01:19:40.108314 11 pika_server.cc:761] Sync error, set repl_state to PIKA_REPL_ERROR
vudswggb2xcp@Ubuntu-1910-eoan-64-minimal | I1118 01:19:40.108405 6 pika_repl_client_thread.cc:21] ReplClient Close conn, fd=364, ip_port=pika:11221
cx30kyrklc14@Ubuntu-1910-eoan-64-minimal | 37 max-cache-statistic-keys 0
cx30kyrklc14@Ubuntu-1910-eoan-64-minimal | 38 small-compaction-threshold 5000
cx30kyrklc14@Ubuntu-1910-eoan-64-minimal | 39 max-write-buffer-size 10737418240
cx30kyrklc14@Ubuntu-1910-eoan-64-minimal | 40 max-client-response-size 1073741824
cx30kyrklc14@Ubuntu-1910-eoan-64-minimal | 41 compression snappy
cx30kyrklc14@Ubuntu-1910-eoan-64-minimal | 42 max-background-flushes 1
cx30kyrklc14@Ubuntu-1910-eoan-64-minimal | 43 max-background-compactions 2
cx30kyrklc14@Ubuntu-1910-eoan-64-minimal | 44 max-cache-files 5000
cx30kyrklc14@Ubuntu-1910-eoan-64-minimal | 45 max-bytes-for-level-multiplier 10
cx30kyrklc14@Ubuntu-1910-eoan-64-minimal | -----------Pika config end----------

主服务器日志:

I0901 04:10:02.364224 1 pika.cc:187] Server at: /pika/conf/pika.conf
I0901 04:10:02.372625 1 pika_server.cc:167] Using Networker Interface: eth2
I0901 04:10:02.373453 1 pika_server.cc:210] host: 172.21.0.3 port: 9221
I0901 04:10:02.373473 1 pika_server.cc:87] Worker queue limit is 4100
I0901 04:10:07.839260 1 pika_partition.cc:92] db0 DB Success
I0901 04:10:07.839316 1 pika_binlog.cc:106] Binlog: Find the exist file.
I0901 04:10:07.840185 1 pika_server.cc:264] Pika Server going to start
I0901 04:10:08.240557 20 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.1.106, Slave port:9222
I0901 04:10:08.240618 20 pika_server.cc:745] Add New Slave, 10.0.1.106:9222
I0901 04:10:08.340098 21 pika_repl_server_conn.cc:109] Receive Trysync, Slave ip: 10.0.1.106, Slave port:9222, Partition: db0, filenum: 700, pro_offset: 27570382
I0901 04:10:08.340250 21 pika_rm.cc:163] Add Slave Node, partition: (db0:0), ip_port: 10.0.1.106:9222
I0901 04:10:08.340271 21 pika_repl_server_conn.cc:175] Partition: db0 TrySync Success, Session: 0
I0902 15:24:30.369750 50 pika_partition.cc:617] db0 Success purge 1
I0905 09:55:44.243384 50 pika_partition.cc:617] db0 Success purge 1
I0909 01:49:26.009101 50 pika_partition.cc:617] db0 Success purge 1
I0913 03:19:02.033335 50 pika_partition.cc:617] db0 Success purge 1
I0914 18:31:28.627552 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.2:64764
I0914 18:36:22.788204 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 835, ip_port: 10.0.0.136:65218
I0915 00:08:44.055868 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.2:64245
I0915 00:13:41.264639 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 835, ip_port: 10.0.0.136:64645
I0915 05:33:52.884670 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.2:65425
I0915 05:38:44.045670 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.136:64637
I0916 04:24:18.839614 50 pika_partition.cc:617] db0 Success purge 1
I0919 00:47:33.991093 50 pika_partition.cc:617] db0 Success purge 1
I0920 13:47:14.317665 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.136:61104
I0921 06:43:36.434998 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.136:63803
I0922 04:08:51.768266 50 pika_partition.cc:617] db0 Success purge 1
I0922 13:44:23.716714 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.2:64282
I0922 14:15:03.829587 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.136:63064
I0923 10:56:43.835459 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.2:64075
I0923 11:27:29.668740 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.136:63656
I0924 05:37:49.914567 50 pika_partition.cc:617] db0 Success purge 1
I0924 09:44:42.913715 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.62:63327
I0924 09:44:42.914286 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 849, ip_port: 10.0.0.62:64958
I0924 09:44:51.920141 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 850, ip_port: 10.0.0.62:64231
I0924 09:44:51.920186 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 851, ip_port: 10.0.0.62:64814
I0924 09:44:54.921677 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 852, ip_port: 10.0.0.62:64170
I0924 10:31:59.685631 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.136:65463
I0924 10:55:42.596652 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 89, ip_port: 10.0.0.2:65021
I0927 01:21:04.879813 50 pika_partition.cc:617] db0 Success purge 1
I1001 00:36:49.808722 50 pika_partition.cc:617] db0 Success purge 1
I1002 07:49:59.762763 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.62:63398
I1002 07:54:29.941581 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.62:63187
I1002 07:54:32.944775 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 850, ip_port: 10.0.0.62:63019
I1002 08:05:39.383591 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.62:63967
I1002 09:22:06.217175 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.62:63524
I1002 09:26:27.372673 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.62:62665
I1002 09:26:30.374841 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 850, ip_port: 10.0.0.62:61438
I1002 09:37:33.785661 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.62:62389
I1002 12:45:33.122915 50 pika_partition.cc:617] db0 Success purge 1
I1005 01:30:45.272097 50 pika_partition.cc:617] db0 Success purge 1
I1006 19:54:09.535692 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.136:62629
I1008 00:54:01.476917 50 pika_partition.cc:617] db0 Success purge 1
I1008 14:27:19.351547 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.62:65241
I1008 14:34:07.620779 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.62:63139
I1009 07:14:20.206580 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.62:63634
I1009 07:20:17.413683 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.62:64393
I1010 10:05:02.469635 50 pika_partition.cc:617] db0 Success purge 1
I1014 05:36:45.834579 50 pika_partition.cc:617] db0 Success purge 1
I1016 15:07:46.814539 50 pika_partition.cc:617] db0 Success purge 1
I1019 02:42:48.491452 50 pika_partition.cc:617] db0 Success purge 1
I1022 01:32:14.208393 50 pika_partition.cc:617] db0 Success purge 1
I1024 07:50:27.507771 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.62:60005
I1025 03:37:31.192142 50 pika_partition.cc:617] db0 Success purge 1
I1028 23:21:14.693290 50 pika_partition.cc:617] db0 Success purge 1
I1101 00:46:31.760932 50 pika_partition.cc:617] db0 Success purge 1
I1101 21:15:52.458124 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.136:48995
I1101 21:17:35.445365 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.136:49325
I1103 15:05:44.642551 50 pika_partition.cc:617] db0 Success purge 1
I1105 00:09:44.079906 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.2:63669
I1105 13:19:51.560529 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 867, ip_port: 10.0.0.62:57476
I1105 13:19:51.564136 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.62:57469
I1105 13:19:51.573916 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 866, ip_port: 10.0.0.62:57473
I1105 13:19:52.558544 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 868, ip_port: 10.0.0.62:57472
I1105 13:20:00.543669 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 869, ip_port: 10.0.0.62:57471
I1105 16:31:18.228132 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 834, ip_port: 10.0.0.2:55384
I1105 16:31:18.275168 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 866, ip_port: 10.0.0.136:55433
I1106 10:48:59.522523 50 pika_partition.cc:617] db0 Success purge 1
I1109 19:17:42.218080 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.2:31382
I1109 19:19:25.149984 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.2:32611
I1110 02:16:41.661084 50 pika_partition.cc:617] db0 Success purge 1
I1113 07:35:43.929067 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.62:64987
I1113 07:53:21.577018 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.62:64327
I1114 05:12:28.130618 50 pika_partition.cc:617] db0 Success purge 1
I1114 07:04:57.056501 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.62:55328
I1114 07:04:59.079994 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 867, ip_port: 10.0.0.62:47482
I1114 07:05:02.971405 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 789, ip_port: 10.0.0.62:58176
W1118 01:18:48.027034 49 pika_rm.cc:530] (db0:0) Master del Recv Timeout slave success 10.0.1.106:9222
I1118 01:19:40.107879 20 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.1.106, Slave port:9222
W1118 01:19:40.108070 20 pika_server.cc:738] Slave Already Exist, ip_port: 10.0.1.106:9222
I1118 01:19:40.108561 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 868, ip_port: 10.0.1.150:47474
I1118 01:21:31.269938 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 70, ip_port: 10.0.1.150:55004
I1118 01:21:31.270294 19 pika_server.cc:649] Delete Slave Success, ip_port: 10.0.1.106:9222
I1118 02:42:02.538383 50 pika_partition.cc:617] db0 Success purge 1
I1119 10:07:39.898283 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 866, ip_port: 10.0.0.2:45025
I1119 10:07:40.271010 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 868, ip_port: 10.0.0.136:47559
I1119 15:42:27.444428 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 866, ip_port: 10.0.0.136:50918
I1119 17:08:45.986027 19 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 866, ip_port: 10.0.0.2:50624
I1119 22:17:58.105327 22 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.1.244, Slave port:9222
I1119 22:17:58.105684 22 pika_server.cc:745] Add New Slave, 10.0.1.244:9222
I1119 22:17:58.201710 21 pika_repl_server_conn.cc:109] Receive Trysync, Slave ip: 10.0.1.244, Slave port:9222, Partition: db0, filenum: 725, pro_offset: 87694191
I1119 22:17:58.204950 21 pika_rm.cc:163] Add Slave Node, partition: (db0:0), ip_port: 10.0.1.244:9222
I1119 22:17:58.204969 21 pika_repl_server_conn.cc:175] Partition: db0 TrySync Success, Session: 1
I1119 22:26:59.705724 1 pika.cc:98] Catch Signal 15, cleanup...
I1119 22:26:59.706132 1 pika_server.cc:273] Goodbye...

@epubreader
Copy link
Author

epubreader commented Nov 19, 2020

这个问题偶尔发生一次,
重启slave服务器就可以了,
是不是缓存了master服务器的一些信息, 才导致这个问题? 从服务器中配置master服务器为pika:11221, 用的不是IP地址。
我现在升级到pikadb/pika:v3.3.6, 看能不能解决这个问题?

@epubreader
Copy link
Author

主有0602 18:29左右的日志吗?

能不能把这个bug重新打开?
我猜测原因是master服务器有保存了slave服务器的IP地址, 但是docker swarm集群的时候,会改变slave的IP地址,造成slave之后不能同步主服务器

@epubreader epubreader reopened this Nov 20, 2020
@kernelai
Copy link
Collaborator

优先升级到v3.3.6吧。从日志看是在网络不稳定的情况下,slave断线重连的时候返现主节点里关于slave的信息没有清除,因此slave把状态置为error。v3.3.6已经修复了这个问题。

@c93614
Copy link
Contributor

c93614 commented Jun 25, 2021

我们目前使用 v3.3.6,在两台服务器直接做了3组pika主从(意思是网络环境是一样的),昨晚发生网络抖动,其中2组主从服务器很快就恢复了,但有1组也遇到这个「主节点里关于slave的信息没有清除」的问题,造成slave连接不上,需要手动修复。

不知道能从日志中看出些端倪吗? @kernelai

主节点日志:

path : conf/pika.conf
-----------Pika server----------
pika_version: 3.3.6
pika_git_sha:9e74c8cd0040a0a63c35e9d426c7d3b6464b378e
pika_build_compile_date: Dec  4 2020
...
...
...
W0624 22:21:58.540555    58 pika_rm.cc:407] (db0:0) Master del Recv Timeout slave success 10.0.0.2:9221
I0624 22:21:58.573012    23 pika_repl_server_conn.cc:108] Receive Trysync, Slave ip: 10.0.0.2, Slave port:9221, Partition: db0, filenum: 0, pro_offset: 82948733
I0624 22:21:58.573614    23 pika_rm.cc:79] Add Slave Node, partition: (db0:0), ip_port: 10.0.0.2:9221
I0624 22:21:58.573644    23 pika_repl_server_conn.cc:181] Partition: db0 TrySync Success, Session: 17
W0624 22:22:18.653208    58 pika_rm.cc:407] (db0:0) Master del Recv Timeout slave success 10.0.0.2:9221
I0624 22:23:00.986393    20 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 50, ip_port: 10.0.0.2:18217
I0624 22:23:00.986515    20 pika_server.cc:740] Delete Slave Success, ip_port: 10.0.0.2:9221
I0624 22:23:00.986547    20 pika_rm.cc:90] Remove Slave Node, Partition: (db0:0), ip_port: 10.0.0.2:9221
I0624 22:23:21.231633    21 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.0.2, Slave port:9221
I0624 22:23:21.231693    21 pika_server.cc:843] Add New Slave, 10.0.0.2:9221
I0624 22:23:42.002051    22 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.0.2, Slave port:9221
I0624 22:23:42.002051    23 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.0.2, Slave port:9221
W0624 22:23:42.002104    22 pika_server.cc:836] Slave Already Exist, ip_port: 10.0.0.2:9221
W0624 22:23:42.002159    23 pika_server.cc:836] Slave Already Exist, ip_port: 10.0.0.2:9221
I0624 22:24:03.504433    21 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.0.2, Slave port:9221
W0624 22:24:03.504493    21 pika_server.cc:836] Slave Already Exist, ip_port: 10.0.0.2:9221
I0624 22:24:32.689980    22 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.0.2, Slave port:9221
W0624 22:24:32.690040    22 pika_server.cc:836] Slave Already Exist, ip_port: 10.0.0.2:9221
I0624 22:24:53.553961    23 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.0.2, Slave port:9221
I0624 22:24:53.553970    21 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.0.2, Slave port:9221
W0624 22:24:53.554008    23 pika_server.cc:836] Slave Already Exist, ip_port: 10.0.0.2:9221
W0624 22:24:53.554059    21 pika_server.cc:836] Slave Already Exist, ip_port: 10.0.0.2:9221
I0624 22:24:56.139577    22 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.0.2, Slave port:9221
W0624 22:24:56.139647    22 pika_server.cc:836] Slave Already Exist, ip_port: 10.0.0.2:9221
I0624 22:25:06.146618    23 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 10.0.0.2, Slave port:9221
W0624 22:25:06.146677    23 pika_server.cc:836] Slave Already Exist, ip_port: 10.0.0.2:9221

从节点日志:

path : conf/pika.conf
-----------Pika server----------
pika_version: 3.3.6
pika_git_sha:9e74c8cd0040a0a63c35e9d426c7d3b6464b378e
pika_build_compile_date: Dec  4 2020
...
...
...
I0624 22:23:00.981081     7 pika_repl_client_thread.cc:38] ReplClient Timeout conn, fd=51, ip_port=10.0.0.1:11221
W0624 22:23:00.981707     7 pika_repl_client_thread.cc:49] Master conn timeout : 10.0.0.1:11221 try reconnect
I0624 22:23:10.014001    58 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (10.0.0.1:9221)
W0624 22:23:21.521927    58 pika_repl_client.cc:115] Failed to connect master, Master (10.0.0.1:9221), try reconnect
W0624 22:23:23.548414    14 pika_rm.cc:989] Failed to connect remote node(10.0.0.1:9221)
W0624 22:23:23.548641    14 pika_server.cc:612] Corruption: connect remote node error
I0624 22:23:23.548672    14 pika_server.cc:618] Mark try connect finish
I0624 22:23:23.548718    14 pika_repl_client_conn.cc:146] Finish to handle meta sync response
I0624 22:23:24.626231    58 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (10.0.0.1:9221)
I0624 22:23:35.037050    58 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (10.0.0.1:9221)
W0624 22:23:43.450937    15 pika_repl_client_conn.cc:101] Meta Sync Failed: Slave AlreadyExist will keep sending MetaSync msg
W0624 22:23:43.450942    16 pika_repl_client_conn.cc:101] Meta Sync Failed: Slave AlreadyExist will keep sending MetaSync msg
W0624 22:23:46.545130    58 pika_repl_client.cc:115] Failed to connect master, Master (10.0.0.1:9221), try reconnect
W0624 22:23:51.146888    58 pika_repl_client.cc:115] Failed to connect master, Master (10.0.0.1:9221), try reconnect
W0624 22:23:55.748637    58 pika_repl_client.cc:115] Failed to connect master, Master (10.0.0.1:9221), try reconnect
W0624 22:24:00.350386    58 pika_repl_client.cc:115] Failed to connect master, Master (10.0.0.1:9221), try reconnect
I0624 22:24:03.454694    58 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (10.0.0.1:9221)
W0624 22:24:03.502259    17 pika_repl_client_conn.cc:101] Meta Sync Failed: Slave AlreadyExist will keep sending MetaSync msg
W0624 22:24:14.562423    58 pika_repl_client.cc:115] Failed to connect master, Master (10.0.0.1:9221), try reconnect
I0624 22:24:17.666697    58 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (10.0.0.1:9221)
W0624 22:24:28.574239    58 pika_repl_client.cc:115] Failed to connect master, Master (10.0.0.1:9221), try reconnect
W0624 22:24:33.176010    58 pika_repl_client.cc:115] Failed to connect master, Master (10.0.0.1:9221), try reconnect
W0624 22:24:34.135169    18 pika_repl_client_conn.cc:101] Meta Sync Failed: Slave AlreadyExist will keep sending MetaSync msg
I0624 22:24:36.280272    58 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (10.0.0.1:9221)
I0624 22:24:46.086923    58 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (10.0.0.1:9221)
W0624 22:24:53.548004    19 pika_repl_client_conn.cc:101] Meta Sync Failed: Slave AlreadyExist will keep sending MetaSync msg
W0624 22:24:53.548195     8 pika_repl_client_conn.cc:101] Meta Sync Failed: Slave AlreadyExist will keep sending MetaSync msg
I0624 22:24:56.093751    58 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (10.0.0.1:9221)
W0624 22:24:56.133637     9 pika_repl_client_conn.cc:101] Meta Sync Failed: Slave AlreadyExist will keep sending MetaSync msg
I0624 22:25:06.100692    58 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (10.0.0.1:9221)
W0624 22:25:06.140516    10 pika_repl_client_conn.cc:101] Meta Sync Failed: Slave AlreadyExist will keep sending MetaSync msg

谢谢,祝您工作开心!

@luky116 luky116 closed this as completed May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants