kafka磁盘和文件系统

原创

半兽人 发表于: 2015-03-10 最后更新时间: 2019-11-09 14:17:12

We recommend using multiple drives to get good throughput and not sharing the same drives used for Kafka data with application logs or other OS filesystem activity to ensure good latency. As of 0.8 you can either RAID these drives together into a single volume or format and mount each drive as its own directory. Since Kafka has replication the redundancy provided by RAID can also be provided at the application level. This choice has several tradeoffs.
我们推荐使用多种驱动来获取良好的吞吐量，而不是Kafka与应用程序日志或其他操作系统的文件系统共享相同的驱动。你可以将这些RAID驱动器一起打成一个卷或格式，并将每个驱动器作为其自己的目录。由于Kafka有副本功能，RAID提供的冗余也可以在应用程序级别提供。这个选择有几个权衡。

If you configure multiple data directories partitions will be assigned round-robin to data directories. Each partition will be entirely in one of the data directories. If data is not well balanced among partitions this can lead to load imbalance between disks.
如果配置多个数据目录，分区将被轮询分配个数据目录。每个分区将在一个数据目录中（完全的）。如果数据在分区之间没有平衡，这将导致磁盘之间的负载不平衡。

RAID can potentially do better at balancing load between disks (although it doesn't always seem to) because it balances load at a lower level. The primary downside of RAID is that it is usually a big performance hit for write throughput and reduces the available disk space.
RAID可以在平衡磁盘负载之间做的更好（尽管并不是总是这样），因为它在低级别平衡负载。RAID的主要缺点是通常对写入吞吐造成很大的性能损失并减少可用的磁盘空间。

Another potential benefit of RAID is the ability to tolerate disk failures. However our experience has been that rebuilding the RAID array is so I/O intensive that it effectively disables the server, so this does not provide much real availability improvement.
RAID的另一个优点是能够容忍磁盘故障，但是，以我们的经验来看，重建RAID阵列需要短时间内进行大量I/O操作，实际上会导致服务器不可用，因此不能在可用性方面提供太多改进。

kafka磁盘和文件系统

昵称