Glossary Terms: DOCA = Data Center-on-a-Chip Architecture DPU = Data Processing Unit NIC = Network Interface Card OVS = Open Virtual Switch (also known as Open vSwitch) SF = Scalable Function VF = Virtual Function VM = Virtual Machine |
Motivation
In part one of this series, we showed how to set up an NVIDIA BlueField-2 DPU with the aim of simply using it as a NIC (https://rcblog.erc.monash.edu.au/blog/2022/02/how-do-i-use-a-dpu-as-nic/). Today we’re going to delve a bit deeper and offload network layer encryption to the DPUs so that VMs running on the Host can use more of their allocated resources while still communicating securely.
Background
Servers and processors are becoming faster and more powerful, but at the same time the requirement for greater network security is increasing and end-to-end network encryption is mandatory in some of the more sensitive data research workloads. As the data throughput on the CPU is pushed to the limits, processors start to steal compute cycles for the network to be encrypted away from computing tasks for research work.
Our aim is to try to mitigate part of this issue by offloading the work of encryption and decryption of traffic from the processors on the hypervisor(host) to the DPU and enable our researchers to work more efficiently and securely.
Set up
Below is a diagram of the setup we used while testing:
Figure: Diagram of the setup we are describing. Two hosts running one VM each, each host has a DPU installed in a PCI slot, the DPUs are connected directly to each other (back to back).
As we explained in our previous blog post, the DPU’s have programmable hardware for data processing, in this example the DPU has been programmed to process data coming in on port p0 which is then passed to the host which in turn is presented to the VM as the Virtual Function (vf0). The virtual function interface appears at the VM as eth1.
Setting up the DPU
Once the environment is set up as above we can move on to configure Open vSwitch and strongSwan encryption software on the DPU(yes that is the correct capitalization stylization)
To set up the OVS bridges and strongSwan (5.9.0bf) IPSec tunnel we used NVIDIA’s example from this website
https://docs.nvidia.com/doca/sdk/east-west-overlay-encryption/index.html
The change we needed to make was to add the interface pf0vf0 to the OVS Bridge vxlan-br0 on both DPUs to allow the VM’s to communicate with each other.
Example of OVS settings from one of the DPUs
ubuntu@localhost:~$ sudo ovs-vsctl show 352ec404-2519-4751-9a1b-3fd33780543c Bridge vxlan-br0 Port pf0vf0 Interface pf0vf0 Port vxlan-br0 Interface vxlan-br0 type: internal Port vxlan11 Interface vxlan11 type: vxlan options: {dst_port=”4789″, key=”100″, local_ip=”192.168.50.1″, remote_ip=”192.168.50.2″} Port pf0hpf Interface pf0hpf ovs_version: “2.15.1-d246dab” |
With the above OVS configuration, the logical flow of packets will be as below.
Here is a diagram of the “Path” defined by the above configuration:
Diagram of the logical setup:
Figure: Logical flow of packets through the DPU to the VM.
Our current setup uses the Slow Path
The diagram was referenced on NVIDIA’s original from (https://docs.nvidia.com/doca/sdk/l4-ovs-firewall/index.html)
Setting IPSec Full Offload Using strongSwan
strongSwan configures IPSec HW full offload using a new value added to its configuration file.
By default two files are created in /etc/swanctl/conf.d when flashing the DPUs with DOCA SDK.
BFL.swanctl.conf and BFR.swanctl.conf
We only want one of these on each host. BFL on Host 16 and BFR on Host 17
We also want to make some changes to the .conf files.
On DPU 16
cd /etc/swanctl/conf.d/ mv BFR.swanctl.conf BFR.swanctl.conf.old vi /etc/swanctl/conf.d/BFL.swanctl.conf #Note edit this file manually, copying the below output will probably result in issues cat /etc/swanctl/conf.d/BFL.swanctl.conf # LEFT: strongswan BF-2 config file connections { BFL-BFR { local_addrs = 3.3.3.2 remote_addrs = 3.3.3.3 local { auth = psk id = host2 } remote { auth = psk id = host1 } children { bf { local_ts = 3.3.3.2/24 [udp/4789] remote_ts = 3.3.3.3/24 [udp/4789] esp_proposals = aes128gcm128-x25519-esn mode = transport policies_fwd_out = yes hw_offload = full } } version = 2 mobike = no reauth_time = 0 proposals = aes128-sha256-x25519 } } secrets { ike-BF { id-host1 = host1 id-host2 = host2 secret = 0sv+NkxY9LLZvwj4qCC2o/gGrWDF2d21jL } } |
On DPU 17
cd /etc/swanctl/conf.d/ mv BFL.swanctl.conf BFL.swanctl.conf.old vi /etc/swanctl/conf.d/BFR.swanctl.conf #Note edit this file manually, copying the below output will probably result in issues cat /etc/swanctl/conf.d/BFR.swanctl.conf # RIGHT: strongswan BF-2 config file connections { BFL-BFR { local_addrs = 3.3.3.3 remote_addrs = 3.3.3.2 local { auth = psk id = host1 } remote { auth = psk id = host2 } children { bf { local_ts = 3.3.3.3/24 [udp/4789] remote_ts = 3.3.3.2/24 [udp/4789] esp_proposals = aes128gcm128-x25519-esn mode = transport policies_fwd_out = yes hw_offload = full } } version = 2 mobike = no reauth_time = 0 proposals = aes128-sha256-x25519 } } secrets { ike-BF { id-host1 = host1 id-host2 = host2 secret = 0sv+NkxY9LLZvwj4qCC2o/gGrWDF2d21jL } } |
Note: Make sure there is a new line at the end of these files or the config may not be applied correctly.
Commands to load strongSwan configuration
On Both DPUs systemctl stop strongswan-starter.service systemctl start strongswan-starter.service swanctl –load-all On left DPU (DPU 16) swanctl -i –child bf |
Commands to switch offloading on and off
#To enable offloading: ovs-vsctl set Open_vSwitch . Other_config:hw-offload=true systemctl restart openvswitch-switch #To disable offloading: ovs-vsctl –no-wait set Open_vSwitch . other_config:hw-offload=false systemctl restart openvswitch-switch #Check current offloading state ovs-vsctl get Open_vSwitch . other_config:hw-offload |
Experiments and results
We can now transmit data between the VMs which will automatically be encrypted by the DPUs as the information goes over the wire. We use iperf3 to generate network traffic between the VM’s while switching on and off hardware offloading capability on the DPU.
We start with hw-offloading disabled to observe the following results
Offload Disabled
#To disable offloading: ovs-vsctl –no-wait set Open_vSwitch . other_config:hw-offload=false systemctl restart openvswitch-switch |
With offloading turned off via OVS on the DPU, we can see a ksoftirqd process with high cpu utilisation in the DPU.
Next we turn on hw-offloading and observed the change in behaviour
Offload Enabled
#To enable offloading: ovs-vsctl set Open_vSwitch . Other_config:hw-offload=true systemctl restart openvswitch-switch |
With hw-offload enabled we no longer see the ksoftirqd having high CPU utilisation on the DPU
And the rate of data transfer increased more than 10 times than what we were seeing with hw-offload disabled.
We can also view the offloaded flows on the DPU when hw-offload is enabled:
ovs-appctl dpctl/dump-flows type=offloaded #Example output root@localhost:/home/ubuntu# ovs-appctl dpctl/dump-flows type=offloaded recirc_id(0),in_port(4),eth(src=7a:9a:41:5d:11:81,dst=1e:e1:b3:b7:cb:db),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:3987309, bytes:6229832925, used:0.710s, actions:set(tunnel(tun_id=0x64,src=192.168.50.2,dst=192.168.50.1,ttl=64,tp_dst=4789,flags(key))),3 tunnel(tun_id=0x64,src=192.168.50.1,dst=192.168.50.2,tp_dst=4789,flags(+key)),recirc_id(0),in_port(3),eth(src=1e:e1:b3:b7:cb:db,dst=7a:9a:41:5d:11:81),eth_type(0x0800),ipv4(frag=no), packets:49131, bytes:5899556, used:0.710s, actions:4 |
Which proves that the offloaded flows were indeed executed on the DPU.
The above output shows the offloaded flows and their source IP address, destination IP address, source port, destination port, protocol and the number of packets and bytes in the flow.
Conclusion
At this point we’ve set up two VMs and built an ipsec tunnel between them on the DPUs which automatically encrypts data sent over the wire.
We’ve shown a noticeable speed difference between enabling and disabling hw-offload and shown how to check the flows when they are being offloaded and the impact on performance when they are not being offloaded.
Future
As we provide IaaS (Infrastructure-as-a-Service) using OpenStack, the next step is to automate the setup of this configuration in the infrastructure using our CICD configuration management at DPUs so that the network traffic from the research workload VMs are encrypted underneath. We also consider having dedicated IPSec tunnels for the VMs from different projects running on the host so that underlay encrypted network fabric is separated for each project.
Written by Ben Boreham, Shahaan Ayyub, Swe Aung and Steve Quenette as part of a partnership between Nvidia, the Australian Research Data Commons (ARDC), and Monash University