客户端无法启动初始化事务init_transactions failed,err is Failed to initialize Producer ID:

祖晓晖 发表于: 2022-07-21   最后更新时间: 2022-07-22 10:09:29   102 游览

kafka 2.12-2.3.1 12台 48C 256G 48T

kafka服务端有个节点报错后,有个topic的客户端就报异常,重启后也无法生产,该topic其他分区数据也正常写入和生产,其他topic均正常写入和消费。

服务端报错:

[2022-07-21 10:06:04,019] ERROR TransactionMetadata(transactionalId=52001_100084_8, producerId=61008, producerEpoch=42, txnTimeoutMs=600000, state=CompleteCommit, pendingState=Some(Ongoing), topicPartitions=Set(), txnStartTimestamp=1658369167977, txnLastUpdateTimestamp=1658369173970)'s transition to TxnTransitMetadata(producerId=61008, producerEpoch=42, txnTimeoutMs=600000, txnState=Ongoing, topicPartitions=Set(cy_prep_52001-15, cy_prep_52001-14), txnStartTimestamp=1658369163948, txnLastUpdateTimestamp=1658369163948) failed: this should not happen (kafka.coordinator.transaction.TransactionMetadata)
[2022-07-21 10:06:04,020] ERROR [KafkaApi-12] Error when handling request: clientId=broker-4-fetcher-0, correlationId=-1788778357, api=FETCH, body={replica_id=4,max_wait_time=500,min_bytes=1,max_bytes=10485760,isolation_level=0,session_id=1013979821,session_epoch=358705288,topics=[{topic=__transaction_state,partitions=[{partition=7,current_leader_epoch=3,fetch_offset=9540696,log_start_offset=0,partition_max_bytes=1048576}]}],forgotten_topics_data=[],rack_id=} (kafka.server.KafkaApis)
java.lang.IllegalStateException: TransactionalId 52001_100084_8 failed transition to state TxnTransitMetadata(producerId=61008, producerEpoch=42, txnTimeoutMs=600000, txnState=Ongoing, topicPartitions=Set(cy_prep_52001-15, cy_prep_52001-14), txnStartTimestamp=1658369163948, txnLastUpdateTimestamp=1658369163948) due to unexpected metadata
[2022-07-21 10:06:16,138] ERROR TransactionMetadata(transactionalId=52000_100084_3, producerId=61006, producerEpoch=15, txnTimeoutMs=600000, state=CompleteCommit, pendingState=Some(Ongoing), topicPartitions=Set(), txnStartTimestamp=1658369191978, txnLastUpdateTimestamp=1658369176091)'s transition to TxnTransitMetadata(producerId=61006, producerEpoch=15, txnTimeoutMs=600000, txnState=Ongoing, topicPartitions=Set(cy_prep_52000-12, cy_prep_52000-13), txnStartTimestamp=1658369176137, txnLastUpdateTimestamp=1658369176137) failed: this should not happen (kafka.coordinator.transaction.TransactionMetadata)
[2022-07-21 10:06:16,139] ERROR [KafkaApi-12] Error when handling request: clientId=broker-5-fetcher-0, correlationId=76531944, api=FETCH, body={replica_id=5,max_wait_time=500,min_bytes=1,max_bytes=10485760,isolation_level=0,session_id=2050385498,session_epoch=76531940,topics=[{topic=__transaction_state,partitions=[{partition=7,current_leader_epoch=3,fetch_offset=9540699,log_start_offset=0,partition_max_bytes=1048576}]}],forgotten_topics_data=[],rack_id=} (kafka.server.KafkaApis)
java.lang.IllegalStateException: TransactionalId 52000_100084_3 failed transition to state TxnTransitMetadata(producerId=61006, producerEpoch=15, txnTimeoutMs=600000, txnState=Ongoing, topicPartitions=Set(cy_prep_52000-12, cy_prep_52000-13), txnStartTimestamp=1658369176137, txnLastUpdateTimestamp=1658369176137) due to unexpected metadata

客户端一启动报错:

initTransactions|inittransactions failed,err is Failed to initialize Producer ID: Broker: Producer attempted to update a transaction while another concurrent operation on the same transaction was ongoing[56564]

新建topic,切过去也无法解决;换了个kakfa集群就解决了。

通过工具查看kafka __transaction_statee 中的数据,发现有个分区的数据只记录到故障时间点,怀疑客户端无法写入和这个有关,请问能否删除该分区中的数据,如何才能使客户端启动。

发表于 2022-07-21
添加评论

这个错误比较明显,争抢了导致并发事务ConcurrentTransactions,代码问题占大头:
1、假死的进程一直占用并未释放(或者其他生产者抢占了),所以你换了个集群可用了。
2、代码不健壮,在特定情况下,导致数据产生了争抢提交。

Producer attempted to update a transaction while another concurrent operation on the same transaction was ongoing

生产者试图更新一个事务,而同一事务上的另一个并发操作正在进行中

祖晓晖 -> 半兽人 19天前

感谢专家回复,请问下这个问题怎么解决:
1、客户端如何才能正常启动
2、kafka服务端这里可以做哪些操作来恢复客户端的启动

你的答案

查看kafka相关的其他问题或提一个您自己的问题
提问