kafka 集群假死的问题

不二の 发表于: 2021-03-22   最后更新时间: 2021-04-03 01:46:47   411 游览
0

消费kafka集群数据异常,按照异常事件点去查询三台集群的日志。最近总是出这个问题,几乎两三天一次。求大佬指点

版本信息:

  • kafka版本信息:0.11.0.1; 3台broker
  • zookeeper版本信息:3.4.5-cdh5.12.1, 3台

broker 0

[2021-03-19 15:48:09,221] INFO Partition [__consumer_offsets,2] on broker 0: Shrinking ISR from 0,2,1 to 0 (kafka.cluster.Partition)
[2021-03-19 16:15:59,128] INFO Found deletable segments with base offsets [3128945856] due to retention time 604800000ms breach (kafka.log.Log)
[2021-03-19 16:15:59,155] INFO Scheduling log segment 3128945856 for log message_data_kakou-2 for deletion. (kafka.log.Log)
[2021-03-19 16:15:59,177] INFO Incrementing log start offset of partition message_data_kakou-2 to 3129658564 in dir /cdpdata/data/cdwdata/data2 (kafka.log.Log)

broker 1 和 broker 2一致

 [2021-03-19 15:48:26,650] WARN [ReplicaFetcherThread-0-0]: Error in fetch to broker 0, request (type=FetchRequest, replicaId=1, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={__consumer_offsets-17=(offset=2051566, logStartOffset=2051566, maxBytes=1048576), __consumer_offsets-32=(offset=22, logStartOffset=0, maxBytes=1048576), __consumer_offsets-47=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-14=(offset=509154393, logStartOffset=0, maxBytes=1048576), __consumer_offsets-44=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-29=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-41=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-26=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-38=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-20=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-5=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-35=(offset=95593484, logStartOffset=0, maxBytes=1048576), __consumer_offsets-2=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-11=(offset=33791003, logStartOffset=0, maxBytes=1048576), __consumer_offsets-8=(offset=1226467258, logStartOffset=0, maxBytes=1048576), __consumer_offsets-23=(offset=3540661929, logStartOffset=0, maxBytes=1048576)}) (kafka.server.ReplicaFetcherThread)

未发现error级别的日志



发表于 1月前

  • 不是kafka假死吧,是消费者客户端不消费数据吧?夯住了?

    • 看下集群状态:

      bin/kafka-topics.sh --describe --zookeeper 127.0.0.1:2181
      

      是否可以执行,还是所有命令都不行了?

      或者看看该系统日志中是否有异常日志,例如:openfile满了

      /var/log/messages
      
        • 重启kafka?还是python客户端恢复的?
          其实目的还是想先聚焦一下是kafka集群的问题,还是客户端的问题。

          你可以通过命令行尝试在出现问题之后,验证kafka集群是否正常:

          ## 生产者
          bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
          
          ## 消费者
          bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test
          
            • 期间刷这个error ERROR [ReplicaFetcherThread-0-1]: Error for partition [__consumer_offsets,30] to broker 1:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)

                • 这个是存储kakfa消费者位置的topic,如果这个topic也是有多个副本,那就没问题,否则可能会导致你描述的问题,可以把这个topic贴出来看看。

                  • 找不到想要的答案?

                    我要提问
                    相关