Go through the instructions below in sequence. Please submit your answers to the questions in the submit your assignment section and your DPDK echo code (just the .c file) to Canvas by 11:59 pm on Tuesday, September 24.
The goal of this assignment is to familiarize you with CloudLab, so that you may use it as an experimentation and development platform for your research project. In addition, you will analyze the overheads of different network stacks (Linux's, RDMA, and DPDK) by measuring round-trip times (RTTs) between two servers. Note: Cloudlab is a public resource where due to high demand, certain machine types may already all be in use. We have chosen a machine type that historically has not had much demand, so you should be able to get a machine allocation in time to complete the assignment. Please start early so you avoid not submitting the assignment due to lack of resources.
Sign up for a CloudLab Account using the signup link. Select "Join Existing Project" and enter "browncs2690fa24" as the Project ID. You will also need to upload an SSH public key. The instructor will approve your account application.
A "project profile" describes the configuration of one or more servers, including their network connections and disk images. Create your own profile:
From your dashboard, select your new profile and instantiate it. It may take several minutes for your experiment to start. Once it has started, ssh
to your servers (see the "List View" tab for ssh commands).
Remember to terminate your experiment when you are not using it. Experiments will expire automatically after 16 hours. When your experiment ends, the contents of the local disks will be lost, so remember to save your code elsewhere (e.g., in git or by using scp
to copy it to your own machine).
The m510 nodes each have two network interfaces. One is for control traffic, including connecting to the public internet, and the other is for running experiments. List the network interfaces on each node using the lshw
tool: sudo lshw -class net
. The interface labeled "eno1d1" is the experimental interface. Find the IP and ethernet (MAC) addresses of this interface on each server by running ifconfig
. If you don't see the "eno1d1" interface listed by ifconfig
, you can bring it up and set its IP (If you are unsure what IP to use, you can try 10.10.1.1
for one and 10.10.1.2
for the other):
$ sudo ifconfig eno1d1 up
$ sudo ifconfig eno1d1 [IP]
Begin by using the Linux ping utility to measure the round-trip time between your two servers. For starters, on server 0 run: ping [IP of server 1]
. Next, use the "count" and "flood" options to send 1 million pings back-to-back (you will need to use sudo
). This will take about 30 seconds to complete. Record the average RTT.
ping -h
to see the help options.
Next, measure the RTT between your two servers using RDMA with the Infiniband Verbs Performance Tests (perftest). First, install perftest:
$ sudo apt-get update $ sudo apt-get install perftest
Use the tool ib_read_lat
to measure the RTT between the two servers using RDMA READ operations. For example, run ib_read_lat
on server 1 and then ib_read_lat [IP of server 1]
on server 0. Use the "iters" and "size" options to again perform 1 million operations and to set the message size to 64 bytes to match the size of ping packets. Record the average RTT.
Next, use the tool ib_send_lat
to measure the RTT between the two servers using RDMA SEND operations. Again, perform 1 million operations and set the message size to 64 bytes (you must specify these options on both servers). Note that the ib_send_lat
tool divides the RTT by 2 before outputing results, so you will need to double the reported values to obtain the RTT. Record the average RTT.
Finally, implement a simple echo server using DPDK, and use it to again measure the RTT between your two servers. These instructions will explain how to download and build DPDK and its dependencies; you should do this on both servers. We will provide starter code that implements a DPDK client that generates packets and measures RTTs, and your job is to implement the DPDK server that echos packets back to the client.
The NICs on the m510 nodes are NVIDIA ConnectX-3 NICs. To use these NICs with DPDK, you need to download and install the NVIDIA MLNX_OFED, which provides several libraries and drivers for these NICs.
First, download the OFED version that is compatible with these NICs and OS. Open the downloads page in a browser and download OFED version 4.9-7.1.0.0-LTS for Ubuntu 20.04 x86_64, in tar format. Then copy it to both of your CloudLab nodes (e.g., using scp
).
Next, untar the OFED and install it. The installation process takes about 7 minutes.
$ tar -xvzf MLNX_OFED_LINUX-4.9-7.1.0.0-ubuntu20.04-x86_64.tgz
$ pushd MLNX_OFED_LINUX-4.9-7.1.0.0-ubuntu20.04-x86_64
$ sudo ./mlnxofedinstall --upstream-libs --dpdk
$ popd
Now reload the driver:
$ sudo /etc/init.d/openibd restart
We found that the interface sometimes go down after this step; run ifconfig
to see if the eno1d1
interface is still up, and if not, follow the instructions above to get the interface back up with ip link
.
First install DPDK's dependencies:
$ sudo apt-get install meson python3-pyelftools
Next, clone DPDK and checkout version 22.11 (other versions may also work):
$ git clone https://github.com/DPDK/dpdk
$ cd dpdk
$ git checkout tags/v22.11 -b v22.11
Build DPDK:
$ meson build
$ ninja -C build
$ sudo ninja -C build install
$ sudo ldconfig
DPDK relies on huge pages (we will discuss these later when we discuss memory management). Configure huge pages:
$ echo 1024 | sudo tee /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
Download the starter code from Canvas (Makefile and dpdk_echo.c) and copy the files to each server using scp
. Then build the app by running make
in the directory with the copied files.
To run the echo app, first start the server running on server 0:
$ sudo ./dpdk_echo -l2 --socket-mem=128 -- UDP_SERVER [IP of server 0]
Next, run the client on server 1:
$ sudo ./dpdk_echo -l2 --socket-mem=128 -- UDP_CLIENT [IP of server 1] [IP of server 0] [MAC of server 0]
The server should print out that it receives a packet. However, the current server code does not respond, so the client should exit after 5 seconds and output that 0 echos were completed.
Your job is to implement the DPDK echo server by filling in the code where it currently says /* TODO: YOUR CODE HERE */
. For each packet the server receives, it should immediately echo a packet back to the client. The echoed packet should contain the same contents as the received packet, except for the addresses in the packet headers.
As a starting point, you may find it helpful to read through the code for run_client
, to see an example of how to send, receive, and manipulate packets using DPDK. Note that struct rte_mbuf
is the data structure that DPDK uses to represent a packet. You can also consult the DPDK API documentation (e.g., DPDK's functions for manipulating ethernet addresses). To debug your code, consider using DPDK's rte_pktmbuf_dump()
function to view packets or Linux's inet_ntop()
to convert IP addresses to a human-readable format.
Once your echo server works, use your echo app to measure the RTT between your two servers. Make sure to remove the print statement in the server before you measure the RTT!
Briefly answer the questions below. Feel free to consult the Internet, but answer the questions in your own words.