-
Getting Started with NetScaler
-
Deploy a NetScaler VPX instance
-
Optimize NetScaler VPX performance on VMware ESX, Linux KVM, and Citrix Hypervisors
-
Apply NetScaler VPX configurations at the first boot of the NetScaler appliance in cloud
-
Configure simultaneous multithreading for NetScaler VPX on public clouds
-
Install a NetScaler VPX instance on Microsoft Hyper-V servers
-
Install a NetScaler VPX instance on Linux-KVM platform
-
Prerequisites for installing NetScaler VPX virtual appliances on Linux-KVM platform
-
Provisioning the NetScaler virtual appliance by using OpenStack
-
Provisioning the NetScaler virtual appliance by using the Virtual Machine Manager
-
Configuring NetScaler virtual appliances to use SR-IOV network interface
-
Configure a NetScaler VPX on KVM hypervisor to use Intel QAT for SSL acceleration in SR-IOV mode
-
Configuring NetScaler virtual appliances to use PCI Passthrough network interface
-
Provisioning the NetScaler virtual appliance by using the virsh Program
-
Provisioning the NetScaler virtual appliance with SR-IOV on OpenStack
-
Configuring a NetScaler VPX instance on KVM to use OVS DPDK-Based host interfaces
-
-
Deploy a NetScaler VPX instance on AWS
-
Deploy a VPX high-availability pair with elastic IP addresses across different AWS zones
-
Deploy a VPX high-availability pair with private IP addresses across different AWS zones
-
Protect AWS API Gateway using the NetScaler Web Application Firewall
-
Configure a NetScaler VPX instance to use SR-IOV network interface
-
Configure a NetScaler VPX instance to use Enhanced Networking with AWS ENA
-
Deploy a NetScaler VPX instance on Microsoft Azure
-
Network architecture for NetScaler VPX instances on Microsoft Azure
-
Configure multiple IP addresses for a NetScaler VPX standalone instance
-
Configure a high-availability setup with multiple IP addresses and NICs
-
Configure a high-availability setup with multiple IP addresses and NICs by using PowerShell commands
-
Deploy a NetScaler high-availability pair on Azure with ALB in the floating IP-disabled mode
-
Configure a NetScaler VPX instance to use Azure accelerated networking
-
Configure HA-INC nodes by using the NetScaler high availability template with Azure ILB
-
Configure a high-availability setup with Azure external and internal load balancers simultaneously
-
Configure a NetScaler VPX standalone instance on Azure VMware solution
-
Configure a NetScaler VPX high availability setup on Azure VMware solution
-
Configure address pools (IIP) for a NetScaler Gateway appliance
-
Deploy a NetScaler VPX instance on Google Cloud Platform
-
Deploy a VPX high-availability pair on Google Cloud Platform
-
Deploy a VPX high-availability pair with external static IP address on Google Cloud Platform
-
Deploy a single NIC VPX high-availability pair with private IP address on Google Cloud Platform
-
Deploy a VPX high-availability pair with private IP addresses on Google Cloud Platform
-
Install a NetScaler VPX instance on Google Cloud VMware Engine
-
-
Solutions for Telecom Service Providers
-
Load Balance Control-Plane Traffic that is based on Diameter, SIP, and SMPP Protocols
-
Provide Subscriber Load Distribution Using GSLB Across Core-Networks of a Telecom Service Provider
-
Authentication, authorization, and auditing application traffic
-
Basic components of authentication, authorization, and auditing configuration
-
Web Application Firewall protection for VPN virtual servers and authentication virtual servers
-
On-premises NetScaler Gateway as an identity provider to Citrix Cloud™
-
Authentication, authorization, and auditing configuration for commonly used protocols
-
Troubleshoot authentication and authorization related issues
-
-
Configure spillover route request to a different language model
-
Configure rate limiting based on token consumption
-
-
-
-
-
-
Configure DNS resource records
-
Configure NetScaler as a non-validating security aware stub-resolver
-
Jumbo frames support for DNS to handle responses of large sizes
-
Caching of EDNS0 client subnet data when the NetScaler appliance is in proxy mode
-
Use case - configure the automatic DNSSEC key management feature
-
Use Case - configure the automatic DNSSEC key management on GSLB deployment
-
-
-
Source IP address whitelisting for GSLB communication channels
-
Use case: Deployment of domain name based autoscale service group
-
Use case: Deployment of IP address based autoscale service group
-
-
Persistence and persistent connections
-
Advanced load balancing settings
-
Gradually stepping up the load on a new service with virtual server–level slow start
-
Protect applications on protected servers against traffic surges
-
Retrieve location details from user IP address using geolocation database
-
Use source IP address of the client when connecting to the server
-
Use client source IP address for backend communication in a v4-v6 load balancing configuration
-
Set a limit on number of requests per connection to the server
-
Configure automatic state transition based on percentage health of bound services
-
-
Use case 2: Configure rule based persistence based on a name-value pair in a TCP byte stream
-
Use case 3: Configure load balancing in direct server return mode
-
Use case 6: Configure load balancing in DSR mode for IPv6 networks by using the TOS field
-
Use case 7: Configure load balancing in DSR mode by using IP Over IP
-
Use case 10: Load balancing of intrusion detection system servers
-
Use case 11: Isolating network traffic using listen policies
-
Use case 12: Configure Citrix Virtual Desktops for load balancing
-
Use case 13: Configure Citrix Virtual Apps and Desktops for load balancing
-
Use case 14: ShareFile wizard for load balancing Citrix ShareFile
-
Use case 15: Configure layer 4 load balancing on the NetScaler appliance
-
-
-
-
Authentication and authorization for System Users
-
-
-
Configuring a CloudBridge Connector Tunnel between two Datacenters
-
Configuring CloudBridge Connector between Datacenter and AWS Cloud
-
Configuring a CloudBridge Connector Tunnel Between a Datacenter and Azure Cloud
-
Configuring CloudBridge Connector Tunnel between Datacenter and SoftLayer Enterprise Cloud
-
Configuring a CloudBridge Connector Tunnel Between a NetScaler Appliance and Cisco IOS Device
-
CloudBridge Connector Tunnel Diagnostics and Troubleshooting
This content has been machine translated dynamically.
Dieser Inhalt ist eine maschinelle Übersetzung, die dynamisch erstellt wurde. (Haftungsausschluss)
Cet article a été traduit automatiquement de manière dynamique. (Clause de non responsabilité)
Este artículo lo ha traducido una máquina de forma dinámica. (Aviso legal)
此内容已经过机器动态翻译。 放弃
このコンテンツは動的に機械翻訳されています。免責事項
이 콘텐츠는 동적으로 기계 번역되었습니다. 책임 부인
Este texto foi traduzido automaticamente. (Aviso legal)
Questo contenuto è stato tradotto dinamicamente con traduzione automatica.(Esclusione di responsabilità))
This article has been machine translated.
Dieser Artikel wurde maschinell übersetzt. (Haftungsausschluss)
Ce article a été traduit automatiquement. (Clause de non responsabilité)
Este artículo ha sido traducido automáticamente. (Aviso legal)
この記事は機械翻訳されています.免責事項
이 기사는 기계 번역되었습니다.책임 부인
Este artigo foi traduzido automaticamente.(Aviso legal)
这篇文章已经过机器翻译.放弃
Questo articolo è stato tradotto automaticamente.(Esclusione di responsabilità))
Translation failed!
Configure rate limiting based on token consumption
Generative AI workloads are uniquely resource-intensive, often consuming large volumes of tokens per request for generating text. Without proper controls, this can lead to unpredictable costs, degraded performance, and unfair resource consumption across users and applications. Token-based rate limiting provides a precise mechanism to manage usage by measuring consumption at the token level rather than just counting requests. This ensures that lightweight queries and heavy prompts are treated proportionally, enabling enterprises to protect infrastructure, enforce quotas, and maintain consistent service quality while integrating Large Language Models (LLM) into production environments.
Note:
Token based rate limiting is supported only for OpenAI Chat Completions API.
-
Create a stream selector to identify the entity or a parameter part of the HTTP request for which you want to throttle requests. In this example, the AI application is sending the user id in the “X-user-id” HTTP header. NetScaler can perform rate limiting on any attribute that is part of the HTTP header or body. For more information on stream selector, see Configure a selector.
add stream selector <Selector Name> <Attribute> <!--NeedCopy-->Example:
add stream selector UserIdHeader "HTTP.REQ.HEADER(\"X-user-id\")" <!--NeedCopy--> -
Create a rate limit identifier to check if the number of tokens exceeds a specified value within a particular time interval. Say, if we need to rate limit users based on token consumption per minute where interval start and end are UTC minute aligned and limit rate limit alerts to 100 in a minute.
Note:
In multi PE, the configured threshold is split equally across the PEs.
add ns limitIdentifier <Identifier Name> -threshold <Rate Threshold> -timeSlice <millisec> -mode TOKEN_RATE -selectorName <Selector Name> -alertsInTimeSlice <Number of Alerts> -timeAlign <MINUTE> <!--NeedCopy-->In this configuration:
-
alertsInTimeSlice: Number of Appflow alerts to be sent in the timeslice configured. A value of 0 indicates that alerts are disabled. A value of 65535 indicates no limit on number of Appflow alerts. -
timeAlign: Possible values are:-
MINUTE: Aligns with the time windows for a configured timeslice to minute boundary. If you choose
MINUTEoption, the time slice values must be integrals of 60000 ms. -
NONE:
NONEis the default value and time slice alignments happen every 10 ms.
-
MINUTE: Aligns with the time windows for a configured timeslice to minute boundary. If you choose
Example:
add ns limitIdentifier RateToken -threshold 500 -timeSlice 60000 -mode TOKEN_RATE -selectorName UserIdHeader -alertsInTimeSlice 100 -timeAlign MINUTE <!--NeedCopy--> -
-
Create a responder action to specify the response that is sent to the client when the rate limit applied by NetScaler.
add responder action <Action Name> respondwith <HTTP Response Expression> <!--NeedCopy-->Example:
add responder action TokenRateLimitAction respondwith q<"\"HTTP/1.1 429 Too Many Requests\r\nContent-Type: application/json\r\nRetry-After: 60\r\n\r\n{\"error\": {\"message\": \"You exceeded your current token quota. Please check your plan and billing details.\", \"type\": \"rate_limit_exceeded\", \"param\": null, \"code\": \"429\"}}\""> <!--NeedCopy--> -
Create a responder policy with the action (3) and the condition (2).
add responder policy <Policy Name> "SYS.CHECK_LIMIT(\"<Identifier Name>\")" <Action Name> <!--NeedCopy-->Example:
add responder policy DemoRatePolicy "SYS.CHECK_LIMIT(\"RateToken\")" TokenRateLimitAction <!--NeedCopy--> -
Bind the responder policy to the load balancing virtual server.
bind lb vserver <Vserver Name> -policyName <Policy Name> -priority <Priority Numbber> -gotoPriorityExpression END -type REQUEST <!--NeedCopy-->Example:
bind lb vserver AzureOpenAIGPT5.1 -policyName DemoRatePolicy -priority 3 -gotoPriorityExpression END -type REQUEST <!--NeedCopy--> -
Optionally, if the
X-user-idheader info contains PII, or the user does not want it to be part of the OpenAI query. We can write a rewrite policy to drop the header in the backend request to OpenAI.-
Add rewrite action to delete the “X-user-id” header from the HTTP request
add rewrite action <Rewrite Action Name> <Type> <Target> <!--NeedCopy-->Example:
add rewrite action drop_user_header delete_http_header X-user-id <!--NeedCopy--> -
Add rewrite policy using the rewrite action.
add rewrite policy <Rewrite Policy Name> <Rule><Action> <!--NeedCopy-->Example:
add rewrite policy drop_user_policy "HTTP.REQ.HEADER(\"X-user-id\").EXISTS" drop_user_header <!--NeedCopy--> -
Bind rewrite policy to the load balancing virtual server.
bind lb vserver <Vserver Name> -policyName <PolicyName> -priority <int> -gotoPriorityExpression <expression> -type <REQUEST> <!--NeedCopy-->Example:
bind lb vs AzureOpenAIGpt5.1 -policyName drop_user_policy -priority 2 -gotoPriorityExpression NEXT -type REQUEST <!--NeedCopy-->
-
-
Rate limited requests can be exported to Splunk for Observability.
-
Add Splunk endpoint as a Service.
add service <ServiceName> <IPaddress> <Type> <Port> <!--NeedCopy-->Example:
add service splunk_collector 10.0.0.2 HTTP 8088 <!--NeedCopy--> -
Add analytics profile of type Web Insight using the Splunk service as collector.
add analytics profile <ProfileName> -collectors <Splunk_Service> -type webinsight –dataFormatFile splunk_new.txt -analyticsAuthToken <"Splunk {HEC token}"> -analyticsEndPointUrl “/services/collector/event” -analyticsEndPointContentType “application/json” <!--NeedCopy-->Example:
add analytics profile demowebinsight -collectors splunk_collector -type webinsight -dataFormatFile splunk_new.txt -analyticsAuthToken "Splunk {HEC token}" -analyticsEndpointUrl "/services/collector/event" -analyticsEndpointContentType "application/json" <!--NeedCopy-->Note:
See Web insight records for information on the JSON fields that must be part of the
splunk_new.txtfor exporting rate limit alerts. -
Bind an analytics profile on lb virtual server where rate limit responder policies are bound.
bind lb vs <VserverName> -analyticsProfile <Analytics Profile Name> <!--NeedCopy-->Example:
bind lb vserver AzureOpenAIGpt5.1 -analyticsProfile demowebinsight <!--NeedCopy-->
-
Share
Share
In this article
This Preview product documentation is Cloud Software Group Confidential.
You agree to hold this documentation confidential pursuant to the terms of your Cloud Software Group Beta/Tech Preview Agreement.
The development, release and timing of any features or functionality described in the Preview documentation remains at our sole discretion and are subject to change without notice or consultation.
The documentation is for informational purposes only and is not a commitment, promise or legal obligation to deliver any material, code or functionality and should not be relied upon in making Cloud Software Group product purchase decisions.
If you do not agree, select I DO NOT AGREE to exit.