ADC
ご意見をお寄せいただきありがとうございました

この記事は機械翻訳されています.免責事項

Optimize Citrix ADC VPX performance on VMware ESX, Linux KVM, and Citrix Hypervisors

The Citrix ADC VPX performance greatly varies depending on the hypervisor, allocated system resources, and the host configurations. To achieve the desired performance, first follow the recommendations in the VPX data sheet, and then further optimize it using the best practices provided in this document.

Citrix ADC VPX instance on VMware ESX hypervisors

This section contains details of configurable options and settings, and other suggestions that help you achieve optimal performance of Citrix ADC VPX instance on VMware ESX hypervisors.

To achieve high performance for VPX with E1000, VMXNET3, SR-IOV, and PCI passthrough network interfaces, follow these recommendations:

  • The total number of virtual CPUs (vCPUs) provisioned on the ESX host must be less than or equal to the total number of physical CPUs (pCPUs) on the ESX host.
  • Non-uniform Memory Access (NUMA) affinity and CPU affinity must be set for the ESX host to achieve good results.

    – To find the NUMA affinity of a Vmnic, log in to the host locally or remotely, and type:

    #vsish -e get /net/pNics/vmnic7/properties | grep NUMA Device NUMA Node: 0

Citrix ADC VPX with E1000 network interfaces

Perform the following settings on the VMware ESX host:

  • On the VMware ESX host, create two vNICs from one pNIC vSwitch. Multiple vNICs create multiple Rx threads in the ESX host. This increases the Rx throughput of the pNIC interface.
  • Enable VLANs on the vSwitch port group level for each vNIC that you have created.
  • To increase vNIC transmit (Tx) throughput, use a separate Tx thread in the ESX host per vNIC. Use the following ESX command:
    • For ESX version 5.5:

      esxcli system settings advanced set –o /Net/NetTxWorldlet –i
    • For ESX version 6.0 onwards:

      esxcli system settings advanced set -o /Net/NetVMTxType –i 1
  • To further increase the vNIC Tx throughput, use a separate Tx completion thread and Rx threads per device (NIC) queue. Use the following ESX command:

    esxcli system settings advanced set -o /Net/NetNetqRxQueueFeatPairEnable -i 0

Note:

Make sure that you reboot the VMware ESX host to apply the updated settings.

Two vNICs per pNIC deployment

The following is a sample topology and configuration commands for the Two vNICs per pNIC model of deployment that delivers better network performance.

Two vNICs per pNIC deployment

Citrix ADC VPX sample configuration:

To achieve the deployment shown in the preceding sample topology, perform the following configuration on the Citrix ADC VPX instance:

  • On the client side, bind the SNIP (1.1.1.2) to network interface 1/1 and enable the VLAN tag mode.

    bind vlan 2 -ifnum 1/1 –tagged bind vlan 2 -IPAddress 1.1.1.2 255.255.255.0
  • On the server side, bind the SNIP (2.2.2.2) to network interface 1/1 and enable the VLAN tag mode.

    bind vlan 3 -ifnum 1/2 –tagged bind vlan 3 -IPAddress 2.2.2.2 255.255.255.0
  • Add an HTTP virtual server (1.1.1.100) and bind it to a service (2.2.2.100).

    add lb vserver v1 HTTP 1.1.1.100 80 -persistenceType NONE -Listenpolicy None -cltTimeout 180 add service s1 2.2.2.100 HTTP 80 -gslb NONE -maxClient 0 -maxReq 0 -cip DISABLED -usip NO -useproxyport YES -sp ON -cltTimeout 180 -svrTimeout 360 -CKA NO -TCPB NO -CMP NO bind lb vserver v1 s1

Note:

Make sure that you include the following two entries in the route table:

  • 1.1.1.0/24 subnet with gateway pointing to SNIP 1.1.1.2
  • 2.2.2.0/24 subnet with gateway pointing to SNIP 2.2.2.2

Citrix ADC VPX with VMXNET3 network interfaces

To achieve high performance for VPX with VMXNET3 network interfaces, do the following settings on the VMware ESX host:

  • Create two vNICs from one pNIC vSwitch. Multiple vNICs create multiple Rx threads in the ESX host. This increases the Rx throughput of the pNIC interface.
  • Enable VLANs on the vSwitch port group level for each vNIC that you have created.
  • To increase vNIC transmit (Tx) throughput, use a separate Tx thread in the ESX host per vNIC. Use the following ESX commands:
    • For ESX version 5.5:
    esxcli system settings advanced set –o /Net/NetTxWorldlet –i
    • For ESX version 6.0 onwards:
    esxcli system settings advanced set -o /Net/NetVMTxType –i 1

On the VMware ESX host, perform the following configuration:

  • On the VMware ESX host, create two vNICs from 1 pNIC vSwitch. Multiple vNICs create multiple Tx and Rx threads in the ESX host. This increases the Tx and Rx throughput of the pNIC interface.
  • Enable VLANs on the vSwitch port group level for each vNIC that you have created.
  • To increase Tx throughput of a vNIC, use a separate Tx completion thread and Rx threads per device (NIC) queue. Use the following command:

    esxcli system settings advanced set -o /Net/NetNetqRxQueueFeatPairEnable -i 0
  • Configure a VM to use one transmit thread per vNIC, by adding the following setting to the VM’s configuration:

    ethernetX.ctxPerDev = "1"

For more information, see Best Practices for Performance Tuning of Telco and NFV Workloads in vSphere

Note:

Make sure that you reboot the VMware ESX host to apply the updated settings.

You can configure VMXNET3 as a Two vNICs per pNIC deployment. For more information, see Two vNICs per pNIC deployment.

Citrix ADC VPX with SR-IOV and PCI passthrough network interfaces

To achieve high performance for VPX with SR-IOV and PCI passthrough network interfaces, see Recommended configuration on ESX hosts.

Citrix ADC VPX instance on Linux-KVM platform

This section contains details of configurable options and settings, and other suggestions that help you achieve optimal performance of Citrix ADC VPX instance on Linux-KVM platform.

Performance settings for KVM

Perform the following settings on the KVM host:

Find the NUMA domain of the NIC using the lstopo command:

Make sure that memory for the VPX and the CPU is pinned to the same location. In the following output, the 10G NIC “ens2” is tied to NUMA domain #1.

NUMA domain #1

Allocate the VPX memory from the NUMA domain.

The numactl command indicates the NUMA domain from which the memory is allocated. In the following output, around 10 GB RAM is allocated from NUMA node #0.

NUMA node #0

To change the NUMA node mapping, follow these steps.

  1. Edit the .xml of the VPX on the host.

    /etc/libvirt/qemu/<VPX_name>.xml
  2. Add the following tag:

    <numatune> <memory mode="strict" nodeset="1"/>  This is the NUMA domain name </numatune>
  3. Shut down the VPX.

  4. Run the following command:

    virsh define /etc/libvirt/qemu/<VPX_name>.xml

    This command updates the configuration information for the VM with the NUMA node mappings.

  5. Power on the VPX. Then check the numactl –hardware command output on the host to see the updated memory allocations for the VPX.

    Output of the numactl hardware command

Pin vCPUs of VPX to physical cores.

  • To view the vCPU to pCPU mappings of a VPX, type the following command

    virsh vcpupin <VPX name>

    Output of the virsh-vcpupin command

    The vCPUs 0–4 are mapped to physical cores 8–11.

  • To view the current pCPU usage, type the following command:

    mpstat -P ALL 5

    Output of the mpstat command

    In this output, 8 is management CPU, and 9–11 are packet engines.

  • To change the vCPU to pCPU pinning, there are two options.

    • Change it at runtime after the VPX boots up using the following command:

      virsh vcpupin <VPX name> <vCPU id> <pCPU number> virsh vcpupin NetScaler-VPX-XML 0 8 virsh vcpupin NetScaler-VPX-XML 1 9 virsh vcpupin NetScaler-VPX-XML 2 10 virsh vcpupin NetScaler-VPX-XML 3 11
    • To make static changes to the VPX, edit the .xml file as before with the following tags:

      1. Edit the .xml file of the VPX on the host

        /etc/libvirt/qemu/<VPX_name>.xml
      2. Add the following tag:

        <vcpu placement='static' cpuset='8-11'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='8'/> <vcpupin vcpu='1' cpuset='9'/> <vcpupin vcpu='2' cpuset='10'/> <vcpupin vcpu='3' cpuset='11'/> </cputune>
      3. Shut down the VPX.

      4. Update the configuration information for the VM with the NUMA node mappings using the following command:

        virsh define /etc/libvirt/qemu/ <VPX_name>.xml
      5. Power on the VPX. Then check the virsh vcpupin <VPX name> command output on the host to see the updated CPU pinning.

Eliminate host interrupt overhead.

  • Detect VM_EXITS using the kvm_stat command.

    At the hypervisor level, host interrupts are mapped to the same pCPUs on which the vCPUs of the VPX are pinned. This might cause vCPUs on the VPX to get kicked out periodically.

    To find the VM exits done by VMs running the host, use the kvm_stat command.

    [root@localhost ~]# kvm_stat -1 | grep EXTERNAL kvm_exit(EXTERNAL_INTERRUPT) 1728349 27738 [root@localhost ~]#

    A higher value in the order of 1+M indicates an issue.

    If a single VM is present, the expected value is 30–100 K. Anything more than that can indicate that there are one or more host interrupt vectors mapped to the same pCPU.

  • Detect host interrupts and migrate host interrupts.

    When you run the concatenate command for the “/proc/interrupts” file, it displays all the host interrupt mappings. If one or more active IRQs map to the same pCPU, its corresponding counter increments.

    Move any interrupts that overlap with your Citrix ADC VPX’s pCPUs to unused pCPUs:

    echo 0000000f > /proc/irq/55/smp_affinity 0000000f - - > it is a bitmap, LSBs indicates that IRQ 55 can only be scheduled on pCPUs 03
  • Disable IRQ balance.

    Disable IRQ balance daemon, so that no rescheduling happens on the fly.

    service irqbalance stop service irqbalance show - To check the status service irqbalance start - Enable if needed

    Make sure you run the kvm_stat command to ensure that there are not many counters.

Citrix ADC VPX with PV network interfaces

You can configure para-virtualization (PV), SR-IOV, and PCIe passthrough network interfaces as a Two vNICs per pNIC deployment. For more information, see Two vNICs per pNIC deployment.

For optimal performance of PV (virtio) interfaces, follow these steps:

  • Identify the NUMA domain to which the PCIe slot/NIC is tied to.
  • The Memory and vCPU for the VPX must be pinned to the same NUMA domain.
  • Vhost thread must be bound to the CPUs in the same NUMA domain.

Bind the virtual host threads to the corresponding CPUs:

  1. Once the traffic is started, run the top command on the host.

    Run the top command

  2. Identify the virtual host process (named as vhost-<pid-of-qemu>) affinity.
  3. Bind the vHost processes to the physical cores in the NUMA domain identified earlier using the following command:

    taskset –pc <core-id> <process-id>

    Example:

    taskset –pc 12 29838
  4. The processor cores corresponding to the NUMA domain can be identified with the following command:

    [root@localhost ~]# virsh capabilities | grep cpu <cpu> </cpu> <cpus num='8'> <cpu id='0' socket_id='0' core_id='0' siblings='0'/> <cpu id='1' socket_id='0' core_id='1' siblings='1'/> <cpu id='2' socket_id='0' core_id='2' siblings='2'/> <cpu id='3' socket_id='0' core_id='3' siblings='3'/> <cpu id='4' socket_id='0' core_id='4' siblings='4'/> <cpu id='5' socket_id='0' core_id='5' siblings='5'/> <cpu id='6' socket_id='0' core_id='6' siblings='6'/> <cpu id='7' socket_id='0' core_id='7' siblings='7'/> </cpus> <cpus num='8'> <cpu id='8' socket_id='1' core_id='0' siblings='8'/> <cpu id='9' socket_id='1' core_id='1' siblings='9'/> <cpu id='10' socket_id='1' core_id='2' siblings='10'/> <cpu id='11' socket_id='1' core_id='3' siblings='11'/> <cpu id='12' socket_id='1' core_id='4' siblings='12'/> <cpu id='13' socket_id='1' core_id='5' siblings='13'/> <cpu id='14' socket_id='1' core_id='6' siblings='14'/> <cpu id='15' socket_id='1' core_id='7' siblings='15'/> </cpus> <cpuselection/> <cpuselection/>

Bind the QEMU process to the corresponding physical core:

  1. Identify the physical cores on which the QEMU process is running. For more information, see the preceding output.
  2. Bind the QEMU process to the same physical cores to which you bind the vCPUs, using the following command:

    taskset –pc 8-11 29824

Citrix ADC VPX with SR-IOV and Fortville PCIe passthrough network interfaces

For optimal performance of the SR-IOV and Fortville PCIe passthrough network interfaces, follow these steps:

  • Identify the NUMA domain to which the PCIe slot/NIC is tied to.
  • The Memory and vCPU for the VPX must be pinned to the same NUMA domain.

Sample VPX XML file for vCPU and memory pinning for Linux KVM:

<domain type='kvm'> <name>NetScaler-VPX</name> <uuid>138f7782-1cd3-484b-8b6d-7604f35b14f4</uuid> <memory unit='KiB'>8097152</memory> <currentMemory unit='KiB'>8097152</currentMemory> <vcpu placement='static'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='8'/> <vcpupin vcpu='1' cpuset='9'/> <vcpupin vcpu='2' cpuset='10'/> <vcpupin vcpu='3' cpuset='11'/> </cputune> <numatune> <memory mode='strict' nodeset='1'/> </numatune> </domain>

Citrix ADC VPX instance on Citrix Hypervisors

This section contains details of configurable options and settings, and other suggestions that help you achieve optimal performance of Citrix ADC VPX instance on Citrix Hypervisors.

Performance settings for Citrix Hypervisors

Find the NUMA domain of the NIC using the “xl” command:

xl info -n

Pin vCPUs of VPX to physical cores.

xl vcpu-pin <Netsclaer VM Name> <vCPU id> <physical CPU id>

Check binding of vCPUs.

xl vcpu-list

Allocate more than 8 vCPUs to Citrix ADC VMs.

For configuring more than 8 vCPUs, run the following commands from the Citrix Hypervisor console:

xe vm-param-set uuid=your_vms_uuid VCPUs-max=16 xe vm-param-set uuid=your_vms_uuid VCPUs-at-startup=16

Citrix ADC VPX with SR-IOV network interfaces

For optimal performance of the SR-IOV network interfaces, follow these steps:

  • Identify the NUMA domain to which the PCIe slot or NIC is tied to.
  • Pin the Memory and vCPU for the VPX to the same NUMA domain.
  • Bind the Domain-0 vCPU to the remaining CPU.

Citrix ADC VPX with para-virtualized interfaces

For optimal performance, two vNICs per pNIC and one vNIC per pNIC configurations are advised, as in other PV environments.

To achieve optimal performance of para-virtualized (netfront) interfaces, follow these steps:

  • Identify the NUMA domain to which the PCIe slot or NIC is tied to.
  • Pin the memory and vCPU for the VPX to the same NUMA domain.
  • Bind the Domain-0 vCPU to the remaining CPU of the same NUMA domain.
  • Pin host Rx/Tx threads of vNIC to Domain-0 vCPUs.

Pin host threads to Domain-0 vCPUs:

  1. Find Xen-ID of the VPX by using the xl list command on the Citrix Hypervisor host shell.
  2. Identify host threads by using the following command:

    ps -ax | grep vif <Xen-ID>

    In the following example, these values indicate:

    • vif5.0 - The threads for first interface allocated to VPX in XenCenter (management interface).
    • vif5.1 - The threads for second interface assigned to VPX and so on.

    Output of the xl list command

  3. Pin the threads to Domain-0 vCPUs using the following command:

    taskset –pc <core-id> <process-id>

    Example:

    taskset -pc 1 29189
このコンテンツの正式なバージョンは英語で提供されています。Cloud Software Groupドキュメントのコンテンツの一部は、お客様の利便性のみを目的として機械翻訳されています。Cloud Software Groupは機械翻訳されたコンテンツを管理していないため、誤り、不正確な情報、不適切な用語が含まれる場合があります。英語の原文から他言語への翻訳について、精度、信頼性、適合性、正確性、またはお使いのCloud Software Group製品またはサービスと機械翻訳されたコンテンツとの整合性に関する保証、該当するライセンス契約書またはサービス利用規約、あるいはCloud Software Groupとのその他すべての契約に基づき提供される保証、および製品またはサービスのドキュメントとの一致に関する保証は、明示的か黙示的かを問わず、かかるドキュメントの機械翻訳された範囲には適用されないものとします。機械翻訳されたコンテンツの使用に起因する損害または問題について、Cloud Software Groupは責任を負わないものとします。
Optimize Citrix ADC VPX performance on VMware ESX, Linux KVM, and Citrix Hypervisors