ADC

AI gateway - Observability

NetScaler AI gateway collects AI specific metrics and logs and exposes them to Splunk by default. A sample Splunk dashboard can be downloaded from the Citrix Download website to visualize the metrics and logs exported by the AI gateway.

Entity: server_svc_cfg

Metric Name Description
si_tot_llm_input_tokens Total Number of input tokens processed by the server
si_tot_llm_output_tokens Total Number of output tokens processed by the server
si_tot_llm_tokens Total Number of tokens (input + output)
si_cur_llm_tpm Number of total (input + output) tokens per frequency interval
si_err_llm_token_limit_hit_on_server Number of times the token limit reached on the server
si_cur_llm_latency Token latency for this server
si_llm_tokenspermin Configured value of token limit for the server
si_err_llm_token_limit Number of times the token limit reached for the service in NetScaler

Entity: vserver_lb

Metric Name Description
vsvr_llm_apptype Configured Large Language Model (LLM) app type for the virtual server (Currently Azure OpenAI)
vsvr_err_llm_unsupported_request Error counter when NetScaler receives an unsupported request
si_tot_llm_input_tokens Total number of input tokens processed by the load balancing virtual server
si_tot_llm_output_tokens Total number of output tokens processed by the load balancing virtual server
si_tot_llm_tokens Total number of tokens (input + output) processed by the load balancing virtual server

Entity: vserver_cs

Metric Name Description
si_tot_llm_input_tokens Total number of input tokens processed by the content switching virtual server
si_tot_llm_output_tokens Total number of output tokens processed by the content switching virtual server
si_tot_llm_tokens Total number of tokens (input + output) processed by the content switching virtual server
si_cur_llm_tpm Number of total (input + output) tokens per frequency interval
vsvr_llm_apptype Configured Large Language Model (LLM) app type for the virtual server (Currently Azure OpenAI)

Entity: cs_pol

Metric Name Description
pcb_hits Number of hits on the policy on this binding.
pcb_undef_hits Number of undef hits on the policy on this binding.

Note:

These counters are not exported by default and need to be added in the schema file. For the analytics time series profile using the schema, run -metrics DISABLED followed by -metrics ENABLED to refresh any change in schema.json.

Refer to the NetScaler observability integrations on sending metrics :

Web Insight records

The following fields are exported as part of Web Insight records if there are rate limit alerts.

JSON Field Name Description
rate_limit_identifier_name Configured name of ns limitidentifer.
rate_limit_selector_stream_name Stream name based on selector expressions for which rate-limiting was applied
rate_limit_mode Configured Rate limit mode
rate_limit_threshold Configured Rate limiting threshold per stream.
rate_limit_value Value at which rate-limiting was applied.

Note:

  • These fields are not exported by default and need to be added in the data format file. If the data format file is changed then use the update analytics profile <profile name> -data FormatFile <filename> command to ensure that the analytics profile is using the updated data format file.
  • Set the log_all_json_field attribute in the NetScaler CPX YAML file to send all the JSON fields for insights. If the log_all_json_field attribute is not set, then the data format file in the NetScaler CPX must be updated manually for the relevant fields, which is not recommended for the NetScaler CPX form factor.

The rate-limiting logs can be sent to Splunk. For information on sending logs to Splunk, see Export transaction logs directly from NetScaler to Splunk.

Usage tracking

Usage tracking allows you to track the input and output tokens or requests based on criteria such as team, user, application. NetScaler expects that the AI application sends the attributes such as the userid or teamid in HTTP header (such as X-user-id or X-org-id). This feature uses processed insights for tracking.

JSON Field Name Description
observationPointId An identifier of an Observation Point that is unique per Observation Domain.
nsPartitionId An identifier of the NetScaler partition exporting the records.
stream_usecase Stream Insights use case.
stream_sess_name Stream Insights Stream session name.
stream_iden_name Stream Insights Stream identifier name.
Requests Number of requests consumed in the stream.
Bandwidth Bandwidth used in the stream.
Connections Number of active connections in the streams.
Resptime Average response time.
Tokens Number of input and output tokens consumed for LLM traffic in the stream.
stream_sort_key Sort Identifier for the Top N results (Example: REQUESTS, TOKENS).
Timestamp Timestamp of the export.

Here is a sample configuration where the tokens are being tracked per user and the user-id is sent in X-user-id HTTP header.

  1. Create a Stream selector. In this step, the statistics are aggregated for the user id.

    add stream selector <stream selector name> <rule>
    <!--NeedCopy-->
    

    Example:

    add stream selector user_header "HTTP.REQ.HEADER(\"X-user-id\")"
    <!--NeedCopy-->
    
  2. Create a stream identifier.

    add stream identifier <stream identifier name> <stream selector name> -interval <interval in mins> -logInterval <log interval in minutes> -logLimit <log limit> -sort TOKENS -trackTransactions TOKENS
    <!--NeedCopy-->
    

    Example:

    add stream identifier si_gpt41_user_token testheader -interval 10 -logInterval 10 -logLimit 20 -sort TOKENS -trackTransactions TOKENS
    <!--NeedCopy-->
    

    In this configuration:

    • Interval: Number of minutes of data to use when calculating session statistics (number of requests, number of tokens). The interval is a moving window that keeps the most recently collected data. Older data is discarded at regular intervals.
    • logInterval: Time interval in minutes for logging the collected objects. The log interval must be greater than or equal to the interval of the stream identifier.
    • logLimit: Maximum number of objects to be logged in the log interval.
  3. Create a collector service for Splunk.

    add service <collector> <splunk-server-ip-address> <protocol> <port>
    <!--NeedCopy-->
    

    Example:

    add service splunk_service 10.102.34.155 HTTP 8088
    <!--NeedCopy-->
    

    In this configuration:

    • ip-address: Splunk server IP address.
    • collector-name: Name of the collector.
    • protocol: Specify the protocol as HTTP or SSL.
    • port: Port number.
  4. Create analytics profile of type stream analytics and enable topN.

    add analytics profile <profile-name> -type <insight> -collectors <collector-name> -analyticsAuthToken "<auth-scheme> <authorization-parameters>" -analyticsEndpointContentType "application/json" -analyticsEndpointUrl <endpoint-url> -topn ENABLED
    <!--NeedCopy-->
    

    Example:

    add analytics profile topn_stream_profile -type streaminsight -topn ENABLED -analyticsAuthToken "Splunk 0471e73f-ee4b-44c3-90db-2461341d7b24" -analyticsEndpointUrl "/services/collector/event" -analyticsEndpointContentType "application/json" -collector splunk -dataFormatFile splunk_new1.txt
    <!--NeedCopy-->
    
  5. Bind analytics profile to stream identifiers.

    bind stream identifier <stream identifier name> -analyticsProfile <analytics profile name>
    <!--NeedCopy-->
    

    Example:

    bind stream identifier si_gpt41_user_token -analyticsProfile topn_stream_profile
    <!--NeedCopy-->
    
  6. Create a responder policy to collect stats for the given identifier.

    add responder policy pol_collect_gpt41_user_token 'analytics.stream("si_gpt41_user_token").COLLECT_STATS' NOOP
    <!--NeedCopy-->
    
  7. Bind the responder policy to the target AI gateway virtual server for which the traffic must be analyzed by the identifier. To enable the same stream identifier to process traffic from multiple virtual servers, bind the responder policy to all the virtual servers.

    bind lb <LBVserver Name> -policyName <Responder Policy Name> -priority 1 -gotoPriorityExpression NEXT -type REQUEST
    <!--NeedCopy-->
    

    Example:

    bind lb vserver gpt-4.1 -policyName pol_collect_gpt41_user_token -priority 220 -gotoPriorityExpression NEXT -type REQUEST
    <!--NeedCopy-->
    
AI gateway - Observability