Ceph ingress一直创建中,状态0/2

what 发表于: 2025-11-28   最后更新时间: 2025-11-28 09:43:26   32 游览

我创建了一个ingress,然后应用:

ceph orch apply -i nfs-ingress.yaml

# cat nfs-ingress.yaml 
service_type: ingress
service_id: nfs.nfs-cephfs-ha
placement:
  hosts:
    - ww209                 # 您要部署的主机名
spec:
  backend_service: nfs.nfs-cephfs       # 您后端的 NFS 服务名
  virtual_ip: 10.0.19.203/24            # 必须带子网掩码,例如 /24、/23 等
  frontend_port: 2049                   # 客户端访问的 NFS 端口
  monitor_port: 9000                    # HAProxy 统计页面端口(可选)

但是一直卡主,ceph orch ls | grep nfs

# ceph orch ls | grep nfs
ingress.nfs.nfs-cephfs-ha  10.0.19.203:2049,9000      0/2  -          20m  ww209      
nfs.nfs-cephfs             ?:2049                     1/1  7m ago     15h  count:1

查看明细是

# ceph orch ls --service-name ingress.nfs.nfs-cephfs-ha --format yaml
service_type: ingress
service_id: nfs.nfs-cephfs-ha
service_name: ingress.nfs.nfs-cephfs-ha
placement:
  hosts:
  - ww209
spec:
  backend_service: nfs.nfs-cephfs
  first_virtual_router_id: 50
  frontend_port: 2049
  monitor_port: 9000
  virtual_ip: 10.0.19.203/24
status:
  created: '2025-11-28T01:20:14.776574Z'
  ports:
  - 2049
  - 9000
  running: 0
  size: 2
  virtual_ip: 10.0.19.203/24
events:
- '2025-11-28T01:37:23.031504Z service:ingress.nfs.nfs-cephfs-ha [ERROR] "Failed while
  placing keepalived.nfs.nfs-cephfs-ha.ww209.keldtu on ww209: Failed to generate keepalived.conf:
  No daemons deployed for ingress.nfs.nfs-cephfs-ha"'
- "2025-11-28T01:38:51.752066Z service:ingress.nfs.nfs-cephfs-ha [ERROR] \"Failed\
  \ while placing haproxy.nfs.nfs-cephfs-ha.ww209.mizxcw on ww209: cephadm exited\
  \ with an error code: 1, stderr: Non-zero exit code 1 from /usr/bin/docker container\
  \ inspect --format {{.State.Status}} ceph-52eabf04-b7ba-11ef-8287-bba5b8705822-haproxy-nfs-nfs-cephfs-ha-ww209-mizxcw\n\
  /usr/bin/docker: stdout \n/usr/bin/docker: stderr Error response from daemon: No\
  \ such container: ceph-52eabf04-b7ba-11ef-8287-bba5b8705822-haproxy-nfs-nfs-cephfs-ha-ww209-mizxcw\n\
  Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}}\
  \ ceph-52eabf04-b7ba-11ef-8287-bba5b8705822-haproxy.nfs.nfs-cephfs-ha.ww209.mizxcw\n\
  /usr/bin/docker: stdout \n/usr/bin/docker: stderr Error response from daemon: No\
  \ such container: ceph-52eabf04-b7ba-11ef-8287-bba5b8705822-haproxy.nfs.nfs-cephfs-ha.ww209.mizxcw\n\
  Deploy daemon haproxy.nfs.nfs-cephfs-ha.ww209.mizxcw ...\nVerifying port 10.0.19.203:2049\
  \ ...\nCannot bind to IP 10.0.19.203 port 2049: [Errno 98] Address already in use\n\
  Verifying port 0.0.0.0:9000 ...\nERROR: TCP Port(s) '10.0.19.203:2049,0.0.0.0:9000'\
  \ required for haproxy already in use\""
- '2025-11-28T01:38:51.756738Z service:ingress.nfs.nfs-cephfs-ha [ERROR] "Failed while
  placing keepalived.nfs.nfs-cephfs-ha.ww209.mcznzd on ww209: Failed to generate keepalived.conf:
  No daemons deployed for ingress.nfs.nfs-cephfs-ha"'
- "2025-11-28T01:40:19.799027Z service:ingress.nfs.nfs-cephfs-ha [ERROR] \"Failed\
  \ while placing haproxy.nfs.nfs-cephfs-ha.ww209.tijdnt on ww209: cephadm exited\
  \ with an error code: 1, stderr: Non-zero exit code 1 from /usr/bin/docker container\
  \ inspect --format {{.State.Status}} ceph-52eabf04-b7ba-11ef-8287-bba5b8705822-haproxy-nfs-nfs-cephfs-ha-ww209-tijdnt\n\
  /usr/bin/docker: stdout \n/usr/bin/docker: stderr Error response from daemon: No\
  \ such container: ceph-52eabf04-b7ba-11ef-8287-bba5b8705822-haproxy-nfs-nfs-cephfs-ha-ww209-tijdnt\n\
  Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}}\
  \ ceph-52eabf04-b7ba-11ef-8287-bba5b8705822-haproxy.nfs.nfs-cephfs-ha.ww209.tijdnt\n\
  /usr/bin/docker: stdout \n/usr/bin/docker: stderr Error response from daemon: No\
  \ such container: ceph-52eabf04-b7ba-11ef-8287-bba5b8705822-haproxy.nfs.nfs-cephfs-ha.ww209.tijdnt\n\
  Deploy daemon haproxy.nfs.nfs-cephfs-ha.ww209.tijdnt ...\nVerifying port 10.0.19.203:2049\
  \ ...\nCannot bind to IP 10.0.19.203 port 2049: [Errno 98] Address already in use\n\
  Verifying port 0.0.0.0:9000 ...\nERROR: TCP Port(s) '10.0.19.203:2049,0.0.0.0:9000'\
  \ required for haproxy already in use\""
- '2025-11-28T01:40:19.803933Z service:ingress.nfs.nfs-cephfs-ha [ERROR] "Failed while
  placing keepalived.nfs.nfs-cephfs-ha.ww209.jwsiyg on ww209: Failed to generate keepalived.conf:
  No daemons deployed for ingress.nfs.nfs-cephfs-ha"'

数量一致是0/2,请问我该怎么办。

发表于 2025-11-28
添加评论

你的日志显示是:端口 2049 冲突

日志中关键的错误信息是:

Cannot bind to IP 10.0.19.203 port 2049: [Errno 98] Address already in use
ERROR: TCP Port(s) '10.0.19.203:2049,0.0.0.0:9000' required for haproxy already in use

原因分析

  1. Ingress (HAProxy) 试图绑定到 2049 端口(这是 NFS 的标准端口,用于对外提供服务)。
  2. 但是,后端的 NFS 守护进程 (nfs.nfs-cephfs...) 当前也配置为监听 2049 端口,并且它已经正在运行中。
  3. 因为端口已被 NFS 守护进程占用,HAProxy 无法启动。

解决方案

你需要将后端的 NFS 服务修改为监听一个非标准端口(例如 12049),把 2049 端口让给 Ingress (HAProxy) 使用。

按照以下步骤操作:

第一步:导出当前的 NFS 服务配置

找到你的后端 NFS 服务名称(通常是 nfs.nfs-cephfs,根据你的 ceph orch ps 输出):

ceph orch ls --service-name nfs.nfs-cephfs --export > nfs_backend.yaml

第二步:修改配置文件

使用文本编辑器(如 vinano)打开 nfs_backend.yaml
你需要添加或修改 spec 下的 port 字段。将其设置为一个非 2049 的端口(例如 12049)。

修改后的文件应该类似这样:

service_type: nfs
service_id: nfs-cephfs
service_name: nfs.nfs-cephfs
placement:
  count: 1
spec:
  port: 2049

第三步:应用新配置

将修改后的配置应用到集群:

ceph orch apply -i nfs_backend.yaml

第四步:验证 NFS 是否重启

应用配置后,Ceph 编排器会重启后端的 NFS 守护进程。
运行以下命令,确认 NFS 守护进程现在是否正在监听新端口(PORTS 列应该显示 *:12049):

ceph orch ps --service_name nfs.nfs-cephfs

第五步:验证 Ingress 是否恢复

一旦后端 NFS 释放了 2049 端口,Ingress 服务(HAProxy)应该会自动尝试重启并成功绑定。

运行以下命令查看状态:

ceph orch ps --service_name ingress.nfs.nfs-cephfs-ha

或者查看列表:

ceph orch ls --service-name ingress.nfs.nfs-cephfs-ha

如果 RUNNING 列显示 2/2,说明问题已解决。

总结

Ingress 架构是:
客户端 (2049) -> HAProxy (监听 2049) -> NFS Ganesha (监听 12049)

你目前两个都试图监听 2049,所以必须把后面那个改成别的端口。

你的答案

查看ceph相关的其他问题或提一个您自己的问题