ighack

0 声望

这家伙太懒,什么都没留下

个人动态

冰海落花 关注了Ta · 3月前
冰海落花 回复 ighack kafka报错打开文件数过多导致kafka关闭? 中 :

请问这个告警解决了吗

3月前
ighack kafka集群每隔30多天的时候就会有副本被踢出ISR 发表评论:

我的Partition应该算是比较均衡

topic: mdb_Fd_Route_GD    Partition: 0    Leader: 2    Replicas: 2,3,4    Isr: 2,4,3
    Topic: mdb_Fd_Route_GD    Partition: 1    Leader: 3    Replicas: 3,4,0    Isr: 4,3,0
    Topic: mdb_Fd_Route_GD    Partition: 2    Leader: 4    Replicas: 4,0,1    Isr: 4,1,0
    Topic: mdb_Fd_Route_GD    Partition: 3    Leader: 0    Replicas: 0,1,2    Isr: 2,1,0
    Topic: mdb_Fd_Route_GD    Partition: 4    Leader: 1    Replicas: 1,2,3    Isr: 2,3,1
    Topic: mdb_Fd_Route_GD    Partition: 5    Leader: 2    Replicas: 2,4,0    Isr: 2,4,0
    Topic: mdb_Fd_Route_GD    Partition: 6    Leader: 3    Replicas: 3,0,1    Isr: 3,1,0
    Topic: mdb_Fd_Route_GD    Partition: 7    Leader: 4    Replicas: 4,1,2    Isr: 4,2,1
    Topic: mdb_Fd_Route_GD    Partition: 8    Leader: 0    Replicas: 0,2,3    Isr: 2,3,0
    Topic: mdb_Fd_Route_GD    Partition: 9    Leader: 1    Replicas: 1,3,4    Isr: 4,3,1

大多数都是这样的

8月前
ighack 回复 半兽人 kafka集群每隔30多天的时候就会有副本被踢出ISR 中 :

我在27这台机器看到有很多controller.log.2020-03-25-08内容为

[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 0 is 0.000000 (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 1 is 0.000000 (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 2 is 0.000000 (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 3 is 0.000000 (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 4 is 0.000000 (kafka.controller.KafkaController)
[2020-03-25 08:05:01,450] TRACE [Controller 4]: checking need to trigger partition rebalance (kafka.controller.KafkaController)
[2020-03-25 08:05:01,454] DEBUG [Controller 4]: preferred replicas by broker Map(0 -> Map([gtp_data_log,1] -> List(0, 3, 4), [wlpt_to_mdb,2] -> List(0, 2, 3), [JLP_TO_LMIS_CHO
NGQ1,3] -> List(0, 4, 1), [JLP_TO_LMIS_SHANGH,5] -> List(0, 1, 2), [mdb_Fd_Route_NM,4] -> List(0, 2, 3), [TMP_TO_LMIS_SD,1] -> List(0, 3, 4), [JLP_TO_LMIS_GD,0] -> List(0, 1,
2), [consumer_offsets,30] -> List(0, 2, 3), [JLP_TO_LMIS_HEN,7] -> List(0, 2, 3), [TMP_TO_LMIS_GD,7] -> List(0, 4, 1), [TMP_TO_LMIS_LZ,9] -> List(0, 2, 3), [gtp_data_log,6]
-> List(0, 4, 1), [TMP_TO_LMIS_CHONGQ,4] -> List(0, 4, 1), [TMP_TO_LMIS_HAIN,7] -> List(0, 2, 3), [JTmdb_Fd_Good,2] -> List(0, 3, 4), [JLP_TO_LMIS_FJ,6] -> List(0, 2, 3), [sen
demail,2] -> List(0, 4), [JLP_TO_LMIS_SHANGH,0] -> List(0, 4, 1), [mdb_Fd_Route_LZ,0] -> List(0, 2, 3), [consumer_offsets,10] -> List(0, 2, 3), [JLP_TO_LMIS_FJ1,2] -> List(0
, 3, 4), [mdb_Fd_Route_HAIN,2] -> List(0, 1, 2), [JLP_TO_LMIS_HEN,2] -> List(0, 1, 2), [TMP_TO_LMIS_FJ,6] -> List(0, 4, 1), [TMP_TO_LMIS_XM,0] -> List(0, 1, 2), [JLP_TO_LMIS_S
D,2] -> List(0, 3, 4), [TMP_TO_LMIS_JIANGX,1] -> List(0, 1, 2), [__consumer_offsets,40] -> List(0, 4, 1), [TMP_TO_LMIS_BEIJ,4] -> List(0, 3, 4), [Parallel_Computing_Stock,0]

其他的机器上也有controller.log.2020-03-这样的日志。但不会每个小时都生成。内容也不像上面这样

[2020-03-03 10:57:28,711] INFO [Controller 1]: Controller startup complete (kafka.controller.KafkaController)
[2020-03-03 10:57:31,354] DEBUG [Controller 1]: Controller resigning, broker id 1 (kafka.controller.KafkaController)
[2020-03-03 10:57:31,354] DEBUG [Controller 1]: De-registering IsrChangeNotificationListener (kafka.controller.KafkaController)
[2020-03-03 10:57:31,356] INFO [Partition state machine on Controller 1]: Stopped partition state machine (kafka.controller.PartitionStateMachine)
[2020-03-03 10:57:31,357] INFO [Replica state machine on controller 1]: Stopped replica state machine (kafka.controller.ReplicaStateMachine)
[2020-03-03 10:57:31,358] INFO [Controller 1]: Broker 1 resigned as the controller (kafka.controller.KafkaController)
[2020-03-03 10:57:33,325] INFO [Controller 1]: Controller starting up (kafka.controller.KafkaController)
[2020-03-03 10:57:33,342] INFO [Controller 1]: Controller startup complete (kafka.controller.KafkaController)

看起来比较正常,只有在踢出ISR中的副本时有的机器上有这样的日志

[2020-03-03 10:59:54,553] DEBUG [Controller 2]: Removing replica 1 from ISR 3,0 for partition [TMP_TO_LMIS_SHANGH,6]. (kafka.controller.KafkaController)
[2020-03-03 10:59:54,554] WARN [Controller 2]: Cannot remove replica 1 from ISR of partition [TMP_TO_LMIS_SHANGH,6] since it is not in the ISR. Leader = 3 ; ISR = List(3, 0) (
kafka.controller.KafkaController)[2020-03-03 10:59:54,554] DEBUG The stop replica request (delete = true) sent to broker 1 is  (kafka.controller.ControllerBrokerRequestBatch)
[2020-03-03 10:59:54,554] DEBUG The stop replica request (delete = false) sent to broker 1 is [Topic=TMP_TO_LMIS_SHANGH,Partition=6,Replica=1] (kafka.controller.ControllerBrok
erRequestBatch)[2020-03-03 10:59:54,554] DEBUG The stop replica request (delete = true) sent to broker 1 is  (kafka.controller.ControllerBrokerRequestBatch)
[2020-03-03 10:59:54,554] DEBUG The stop replica request (delete = false) sent to broker 1 is [Topic=__consumer_offsets,Partition=17,Replica=1] (kafka.controller.ControllerBro
kerRequestBatch)[2020-03-03 10:59:54,554] INFO [Replica state machine on controller 2]: Invoking state change to OfflineReplica for replicas [Topic=__consumer_offsets,Partition=17,Replica=1] 
(kafka.controller.ReplicaStateMachine)[2020-03-03 10:59:54,554] DEBUG [Controller 2]: Removing replica 1 from ISR 2,0 for partition [__consumer_offsets,17]. (kafka.controller.KafkaController)
zookeeper

/controller_epoch 记录了controller变化的次数,也就是切换了多少次,次数大了说明集群不稳定,controller总是重新选举
我有225。但不知道不稳定在那里

8月前
ighack kafka集群每隔30多天的时候就会有副本被踢出ISR 发表评论:

我在27这台机器看到有很多controller.log.2020-03-25-08内容为:

[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 0 is 0.000000 (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 1 is 0.000000 (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 2 is 0.000000 (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 3 is 0.000000 (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 4 is 0.000000 (kafka.controller.KafkaController)
[2020-03-25 08:05:01,450] TRACE [Controller 4]: checking need to trigger partition rebalance (kafka.controller.KafkaController)
[2020-03-25 08:05:01,454] DEBUG [Controller 4]: preferred replicas by broker Map(0 -> Map([gtp_data_log,1] -> List(0, 3, 4), [wlpt_to_mdb,2] -> List(0, 2, 3), [JLP_TO_LMIS_CHO
NGQ1,3] -> List(0, 4, 1), [JLP_TO_LMIS_SHANGH,5] -> List(0, 1, 2), [mdb_Fd_Route_NM,4] -> List(0, 2, 3), [TMP_TO_LMIS_SD,1] -> List(0, 3, 4), [JLP_TO_LMIS_GD,0] -> List(0, 1, 
2), [__consumer_offsets,30] -> List(0, 2, 3), [JLP_TO_LMIS_HEN,7] -> List(0, 2, 3), [TMP_TO_LMIS_GD,7] -> List(0, 4, 1), [TMP_TO_LMIS_LZ,9] -> List(0, 2, 3), [gtp_data_log,6] 
-> List(0, 4, 1), [TMP_TO_LMIS_CHONGQ,4] -> List(0, 4, 1), [TMP_TO_LMIS_HAIN,7] -> List(0, 2, 3), [JTmdb_Fd_Good,2] -> List(0, 3, 4), [JLP_TO_LMIS_FJ,6] -> List(0, 2, 3), [sen
demail,2] -> List(0, 4), [JLP_TO_LMIS_SHANGH,0] -> List(0, 4, 1), [mdb_Fd_Route_LZ,0] -> List(0, 2, 3), [__consumer_offsets,10] -> List(0, 2, 3), [JLP_TO_LMIS_FJ1,2] -> List(0
, 3, 4), [mdb_Fd_Route_HAIN,2] -> List(0, 1, 2), [JLP_TO_LMIS_HEN,2] -> List(0, 1, 2), [TMP_TO_LMIS_FJ,6] -> List(0, 4, 1), [TMP_TO_LMIS_XM,0] -> List(0, 1, 2), [JLP_TO_LMIS_S
D,2] -> List(0, 3, 4), [TMP_TO_LMIS_JIANGX,1] -> List(0, 1, 2), [__consumer_offsets,40] -> List(0, 4, 1), [TMP_TO_LMIS_BEIJ,4] -> List(0, 3, 4), [Parallel_Computing_Stock,0]

其他的机器上也有controller.log.2020-03-这样的日志。但不会每个小时都生成。内容也不像上面这样

8月前
半兽人 kafka集群每隔30多天的时候就会有副本被踢出ISR 发表评论:

你得先从节点日志查查。

8月前
张乘辉 关注了Ta · 1年前

(゚∀゚ )
暂时没有任何数据