Skip to content

Pod creation fails when requesting vfio-pci bound resource via SRIOV CNI, as DANM unable to setup dummy kernel interface for the device #231

@superfix906

Description

@superfix906

Is this a BUG REPORT or FEATURE REQUEST?:

bug

What happened:

Network CNI could not be setup for SRIOV, when the allocated resource is a vfio-pci bound device. Fails in creation of dummy interface, with error : "cannot create dummy interface for DPDK because:cannot assign requested address"

What you expected to happen:

Network CNI should have been setup and and pod requesting DPDK (vfio-pci) interface should have started, with a dummy kernel interface in Pods' n/w namespace.

How to reproduce it:

Install DANM in lightweight mode using the installer job, once all services are running, launch danmNet and pod with requests for a vfio-pci bound interface, via SRIOV CNI

Anything else we need to know?:

Am using flannel for IPV4 based cluster networking, danm is installed as per the installer job document in lightweight mode, all danm services are up and running. Am able to create a pod with SRIOV as CNI when the resource is bound to kernel/netdevice, and even IPAM is able to allocate IP for the same. The same is not true, when the resource is bound to vfio-pci driver, the CNI setup fails to create the dummy kernel interface, with the following error message :

Events:
Type Reason Age From Message
Normal Scheduled default-scheduler Successfully assigned example/app to test
Warning FailedCreatePodSandBox 2s kubelet, test Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ee0d160e99d6bb410d8b75d2fef6f0f546811537598ab1282c8b2cd29e8cf925" network for pod "app": networkPlugin cni failed to set up pod "app_example" network: CNI network could not be set up: CNI operation for network:sriov-vfio failed with:Post-processing failed for interface:eth1 because:failed to create dummy kernel interface for eth1 because:cannot create dummy interface for DPDK because:cannot assign requested address
Normal SandboxChanged 2s kubelet, test Pod sandbox changed, it will be killed and re-created.

POD yaml

apiVersion: v1
kind: Pod
metadata:
name: app
namespace: example
labels:
env: test
annotations:
danm.k8s.io/interfaces: |
[
{"network":"management", "ip":"dynamic"},
{"network":"sriov-vfio", "ip":"dynamic"}
]
spec:
containers:
- name: sriov-pod
image: centos:latest
args:
- sleep
- "10000"
resources:
requests:
intel.com/sriov_vfio_vf: '1'
limits:
intel.com/sriov_vfio_vf: '1'

DanmNet Yaml

apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
name: management
namespace: example
spec:
NetworkID: 10-flannel
NetworkType: flannel
---
apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
name: sriov-vfio
namespace: example
spec:
NetworkID: sriov-vfio
NetworkType: sriov
Options:
device_pool: "intel.com/sriov_vfio_vf"
cidr: 10.1.20.0/24

SRIOV resources

{
"cpu": "48",
"ephemeral-storage": "280411618864",
"hugepages-1Gi": "17Gi",
"intel.com/sriov_dpdk_vf": "0",
"intel.com/sriov_fec_vf": "1",
"intel.com/sriov_netdevice_vf": "15",
"intel.com/sriov_vfio_vf": "1",
"memory": "79530372Ki",
"pods": "110"
}

Environment:

  • DANM version (use danm -version): v4.2.0, commit: c0a4c15

  • Kubernetes version (use kubectl version): v1.18.6

  • DANM configuration (K8s manifests, kubeconfig files, CNI config file):

  • /etc/cni/net.d/00-danm.conf

{
"cniVersion": "0.3.1",
"name": "danm_meta_cni",
"type": "danm",
"kubeconfig": "/etc/cni/net.d/danm-kubeconfig",
"cniDir": "/etc/cni/net.d",
"namingScheme": ""
}

  • /etc/cni/net.d/10-flannel.conf

{
"name": "cbr0",
"cniVersion": "0.3.1",
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
}

  • kubeadm config view

apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.18.6
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/24
serviceSubnet: 10.96.0.0/16
scheduler: {}

  • /var/lib/kubelet/config.yaml

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

  • OS (e.g. from /etc/os-release): CentOS Linux release 7.7.1908 (Core)
  • Kernel (e.g. uname -a): 3.10.0-1062.18.1.rt56.1044.el7.x86_64
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions