kafka被nmap扫描端口时报错“Unexpected ApiKeys id `-173`”,然后就OOM了,请问有什么方法可以解决吗?

发表于: 2021-09-09   最后更新时间: 2021-09-09 09:56:32   2,429 游览

kafka版本:2.11-0.10.1.0

1、生产的kafka集群会定期被绿盟扫描,本地尝试使用nmap工具扫描kafka端口后立即出现OOM故障,问题原因可能是nmap使用内置的接口访问kafka服务端,该接口提供的参数又都是错误的,感觉可能还是kafka的bug。

2、不修改buffer.memory扫描一次就OOM,增大buffer.memory虽然可以撑多几次,但是也不是解决办法,开启防火墙成本太高,很多安全扫描的东西需要连过来。

nmap扫描

nmap -p 9092 -T4 -A -v 172.17.1.6

kafka异常日志

[2021-08-04 22:00:01,092] ERROR Closing socket for 172.17.1.6:6667-172.17.1.1:47865 because of error (kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Error parsing request header. Our best guess of the apiKey is: 27265
Caused by: org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'client_id': Error reading string of length 513, only 103 bytes available
        at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
        at org.apache.kafka.common.requests.RequestHeader.parse(RequestHeader.java:80)
        at kafka.network.RequestChannel$Request.liftedTree1$1(RequestChannel.scala:82)
        at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:82)
        at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:492)
        at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:487)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at kafka.network.Processor.processCompletedReceives(SocketServer.scala:487)
        at kafka.network.Processor.run(SocketServer.scala:417)
        at java.lang.Thread.run(Thread.java:748)
[2021-08-04 22:00:01,094] ERROR Closing socket for 172.17.1.6:6667-172.17.1.1:47867 because of error (kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Error getting request for apiKey: -173 and apiVersion: 19778
Caused by: java.lang.IllegalArgumentException: Unexpected ApiKeys id `-173`, it should be between `0` and `20` (inclusive)
        at org.apache.kafka.common.protocol.ApiKeys.forId(ApiKeys.java:73)
        at org.apache.kafka.common.requests.AbstractRequest.getRequest(AbstractRequest.java:39)
        at kafka.network.RequestChannel$Request.liftedTree2$1(RequestChannel.scala:96)
        at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:91)
        at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:492)
        at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:487)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at kafka.network.Processor.processCompletedReceives(SocketServer.scala:487)
        at kafka.network.Processor.run(SocketServer.scala:417)
        at java.lang.Thread.run(Thread.java:748)
[2021-08-04 22:00:39,516] ERROR Processor got uncaught exception. (kafka.network.Processor)
java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:694)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
        at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
        at sun.nio.ch.IOUtil.read(IOUtil.java:195)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:110)
        at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
        at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
        at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154)
        at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135)
        at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:343)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:291)
        at kafka.network.Processor.poll(SocketServer.scala:476)
        at kafka.network.Processor.run(SocketServer.scala:416)
        at java.lang.Thread.run(Thread.java:748)
发表于 2021-09-09
添加评论

你调整的是-XX:MaxDirectMemorySize吗,并且要保障系统有足够的内存。
参考:kafka内存溢出 java.lang.OutOfMemoryError: Direct buffer memory

-> 半兽人 2年前

生产调整过了,调过了大概能撑1-2两个月,然后某天被扫描后就立即假死(整个集群不可消费),必须kill掉假死的节点,整个集群才能恢复。

刚试了下高版本没这个问题,可正常抛出访问异常,不会OOM,哎。

【存活节点日志如下】

[2020-09-09 16:58:59,103] WARN [ReplicaFetcherThread-0-1], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@7a8b66ef (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 1 was disconnected before the response was read
        at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
        at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
        at scala.Option.foreach(Option.scala:257)
        at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
        at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
        at kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
        at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
        at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
        at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
        at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
        at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
        at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
        at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
半兽人 -> 2年前

一般是kakfa节点内部的资源利用率很高导致的夯住了,或者某些资源一直拿不到在等待(死锁),可以系统监控cpu和内存的使用情况可以观察下,排除系统资源的问题。

另外,你说的对,大概率是kafka入口bug导致的,(猜测)同一时刻执行某些命令的时候,并行导致资源争抢死锁。

我没有更好的办法(0.10.1.0版本太旧了,也没维护了),全球都没找到相关的问题...

-> 半兽人 2年前

多谢了,也是因为客户用的这个版本,为了保持一致才降到这个版本的。
nmap扫描的时候基本都在半夜(那时候资源占用应该很低),而且是几秒钟内发起了16次异常连接,不行升版本了,哈哈。

你的答案

查看kafka相关的其他问题或提一个您自己的问题