Prometheus Probe
Prometheus probe allows you to run Prometheus queries and match the resulting output against specific conditions. You can define metrics-based SLOs in a declarative way and determine the experiment verdict based on their success. The probe runs the query on a Prometheus server defined by the endpoint and checks whether the output satisfies the specified criteria. The outcome of a PromQL query (that is provided) is used for probe validation.
In case of complex queries that span multiple lines, the queryPath attribute can be used to provide the link to a file consisting of the query. This file can be made available in the experiment pod via a ConfigMap resource, with the ConfigMap being passed in the ChaosEngine or the ChaosExperiment CR. Refer to the probe schema here.
Input
Prometheus probe takes a PromQL query along with Prometheus service endpoints as input to check for specific criteria.
Schema
Listed below is the probe schema for the Prometheus probe, with properties shared across all the probes and properties unique to the Prometheus probe.
| Field | Description | Type | Range | Notes | 
| name | Flag to hold the name of the probe | Mandatory | N/A type: string | The nameholds the name of the probe. It can be set based on the use case | 
| endpoint | Flag to hold the prometheus endpoints for the promProbe | Mandatory | N/A type: string | The endpointcontains the prometheus endpoints | 
| query | Flag to hold the promql query for the promProbe | Mandatory | N/A type: string | The querycontains the promql query to extract out the desired prometheus metrics via running it on the given prometheus endpoint | 
| queryPath | Flag to hold the path of the promql query for the promProbe | Optional | N/A type: string | The queryPathfield is used in case of complex queries that spans multiple lines, the queryPath attribute can be used to provide the path to a file consisting of the same. This file can be made available to the experiment pod via a ConfigMap resource, with the ConfigMap name being defined in the ChaosEngine OR the ChaosExperiment CR. | 
Comparator
| Field | Description | Type | Range | Notes | 
| type | Flag to hold type of the data used for comparison | Mandatory | float | The typecontains type of data, which should be compared as part of comparison operation. Prometheus probe only compares with float data. | 
| criteria | Flag to hold criteria for the comparison | Mandatory | It supports <, >, <=, >=, !=, ==, oneOf, betweenfor int and float type. Andequal, notEqual, contains, matches, notMatches, oneOffor string type. | The criteriacontains criteria of the comparison, as a part of comparison operation. | 
| value | Flag to hold value for the comparison | Mandatory | N/A type: string | The valuecontains value of the comparison, which should follow the given criteria as part of comparison operation. | 
Authentication
This establishes a fundamental authentication mechanism for the Prometheus server. The "username:password", encoded in base64, should be placed either within the credentials field or as a file path in the credentialsFile field.
The credentials and credentialsFile are two options that can't be used simultaneously.
| Field | Description | Type | Range | Notes | 
| type | Flag to hold the authentication type | Optional | string | The typeencompasses the authentication method, which includes support for bothbasicandbearerauthentication types. | 
| credentials | Flag to hold the basic auth credentials in base64format orbearer. token | Optional | string | The credentialsconsists of the basic authentication credentials, either as username:password encoded inbase64format or as abearertoken, depending on the authentication type | 
| credentialsFile | Flag to hold the basic auth credentials or bearer token file path | Optional | string | The credentialsconsists of file path for basic authentication credentials or a bearer token, which are then attached to the experiment pod as volume secrets. These secret resources contain either the username:password encoded inbase64format or abearertoken, depending on the authentication type | 
TLS
It offers a mechanism to validate TLS certifications for the Prometheus server. You can supply the cacert or the client certificate and client key to perform the validation.
Alternatively, you have the option to enable the insecureSkipVerify check to bypass certificate validation.
| Field | Description | Type | Range | Notes | 
| caFile | Flag to hold the ca file path | Optional | string | The caFileholds the file path of the CA certificates utilized for server TLS verification | 
| certFile | Flag to hold the client cert file path | Optional | string | The certFileholds the file path of the client certificates utilized for TLS verification | 
| keyFile | Flag to hold the client key file path | Optional | string | The keyFileholds the file path of the client key utilized for TLS verification | 
| insecureSkipVerify | Flag to skip the tls certificates checks | Optional | boolean | The insecureSkipVerifyskip the tls certificates checks | 
| serverName | Flag to hold the server name | Optional | string | The serverNamename of the server | 
Run properties
| Field | Description | Type | Range | Notes | 
| probeTimeout | Flag to hold the timeout of the probe | Mandatory | N/A type: string | The probeTimeoutrepresents the time limit for the probe to execute the specified check and return the expected data | 
| attempt | Flag to hold the attempt of the probe | Mandatory | N/A type: integer | The attemptcontains the number of times a check is run upon failure in the previous attempts before declaring the probe status as failed. | 
| interval | Flag to hold the interval of the probe | Mandatory | N/A type: string | The intervalcontains the interval for which probes waits between subsequent retries | 
| probePollingInterval | Flag to hold the polling interval for the probes (applicable for all modes) | Optional | N/A type: string | The probePollingIntervalcontains the time interval for which continuous and onchaos probe should be sleep after each iteration | 
| initialDelaySeconds | Flag to hold the initial delay interval for the probes | Optional | N/A type: integer | The initialDelaySecondsrepresents the initial waiting time interval for the probes. | 
| stopOnFailure | Flags to hold the stop or continue the experiment on probe failure | Optional | N/A type: boolean | The stopOnFailurecan be set to true/false to stop or continue the experiment execution after probe fails | 
Definition
probe:
  - name: "check-probe-success"
    type: "promProbe"
    promProbe/inputs:
      endpoint: "prometheus-server.prometheus.svc.cluster.local:9090"
      query: "sum(rate(http_requests_total{code=~\"2..\"}[1m])) by (job)"
      comparator:
        criteria: ">" #supports >=,<=,>,<,==,!= comparison
        value: "0"
      auth:
        credentials: "base64(<username:password>)"
      tlsConfig:
        insecureSkipVerify: true
    mode: "Edge"
    runProperties:
      probeTimeout: 5s
      interval: 2s
      attempt: 1
Prometheus query (simple query)
This section holds the PromQL query used to extract the desired Prometheus metrics by executing it on the specified Prometheus endpoint. You can input the Prometheus query in the 'query' field, and this can be initiated by configuring the .promProbe/inputs.query field.
Use the following example to tune this:
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: pod-delete
    spec:
      probe:
      - name: "check-probe-success"
        type: "promProbe"
        promProbe/inputs:
          # endpoint for the promethus service
          endpoint: "prometheus-server.prometheus.svc.cluster.local:9090"
          # promql query, which should be executed
          query: "sum(rate(http_requests_total{code=~\"2..\"}[1m])) by (job)"
          comparator:
            # criteria which should be followed by the actual output and the expected output
            #supports >=,<=,>,<,==,!= comparision
            criteria: ">" 
            # expected value, which should follow the specified criteria
            value: "0"
        mode: "Edge"
        runProperties:
          probeTimeout: 5s
          interval: 2s
          attempt: 1
Prometheus query (complex query)
For intricate queries that extend across multiple lines, you can use the 'queryPath' attribute to specify the path to a file containing the query. This file can be accessed by the experiment pod through a ConfigMap resource, with the ConfigMap name defined in either the ChaosEngine or the ChaosExperiment CR. To set this up, configure the promProbe/inputs.queryPath field.
The fields queryPath and query are mutually exclusive. If query is specified, it is used for the query; otherwise, queryPath is used.
Use the following example to tune this:
# contains the prom probe which execute the query and match for the expected criteria
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: pod-delete
    spec:
      probe:
      - name: "check-probe-success"
        type: "promProbe"
        promProbe/inputs:
          # endpoint for the promethus service
          endpoint: "prometheus-server.prometheus.svc.cluster.local:9090"
          # the configMap should be mounted to the experiment which contains promql query
          # use the mounted path here
          queryPath: "/etc/config/prometheus-query"
          comparator:
            # criteria which should be followed by the actual output and the expected output
            #supports >=,<=,>,<,==,!= comparision
            criteria: ">" 
            # expected value, which should follow the specified criteria
            value: "0"
        mode: "Edge"
        runProperties:
          probeTimeout: 5s
          interval: 2s
          attempt: 1
Authentication
This establishes a fundamental authentication mechanism for the Prometheus server. The "username:password" encoded in base64 or bearer token, should be placed either within the credentials field or as a file path in the credentialsFile field.
The credentials and credentialsFile are mutually exclusive, that is, these fields can't be used simultaneously.
Use the following example to tune this:
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: pod-delete
    spec:
      probe:
      - name: "check-probe-success"
        type: "promProbe"
        promProbe/inputs:
          # endpoint for the promethus service
          endpoint: "prometheus-server.prometheus.svc.cluster.local:9090"
          # promql query, which should be executed
          query: "sum(rate(http_requests_total{code=~\"2..\"}[1m])) by (job)"
          comparator:
            # criteria which should be followed by the actual output and the expected output
            #supports >=,<=,>,<,==,!= comparison
            criteria: ">"
            # expected value, which should follow the specified criteria
            value: "0"
          auth:
            type: Basic
            credentials: "base64(<username:password>)"
        mode: "Edge"
        runProperties:
          probeTimeout: 5s
          interval: 2s
          attempt: 1
TLS with custom certificates
It offers a mechanism to validate TLS certifications for the Prometheus server. You can supply the cacert or the client certificate and client key to perform the validation.
The CA certificate file must be incorporated into the experiment pod either as a configMap or a secret. The volume name (configMap or secret) and mountPath should be specified within the chaosengine at the spec.components.secrets path.
Use the following example to tune this:
# contains the prom probe which execute the query and match for the expected criteria
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: pod-delete
    spec:
      components:
        secrets:
          - name: ca-cert
            mountPath: /etc/config
      probe:
      - name: "check-probe-success"
        type: "promProbe"
        promProbe/inputs:
          # endpoint for the promethus service
          endpoint: "https://prometheus-server.harness.io"
          # promql query, which should be executed
          query: "sum(rate(http_requests_total{code=~\"2..\"}[1m])) by (job)"
          comparator:
            # criteria which should be followed by the actual output and the expected output
            #supports >=,<=,>,<,==,!= comparision
            criteria: ">" 
            # expected value, which should follow the specified criteria
            value: "0"
          tlsConfig:
            caFile: "/etc/config/ca.crt"
        mode: "Edge"
        runProperties:
          probeTimeout: 5s
          interval: 2s
          attempt: 1
TLS skip certificate verification
You can bypass the TLS certificate checks by enabling the insecureSkipVerify option.
Use the following example to tune this:
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: pod-delete
    spec:
      probe:
      - name: "check-probe-success"
        type: "promProbe"
        promProbe/inputs:
          # endpoint for the promethus service
          endpoint: "https://prometheus-server.harness.io"
          # promql query, which should be executed
          query: "sum(rate(http_requests_total{code=~\"2..\"}[1m])) by (job)"
          comparator:
            # criteria which should be followed by the actual output and the expected output
            #supports >=,<=,>,<,==,!= comparision
            criteria: ">"
            # expected value, which should follow the specified criteria
            value: "0"
          tlsConfig:
            insecureSkipVerify: true
        mode: "Edge"
        runProperties:
          probeTimeout: 5s
          interval: 2s
          attempt: 1