top of page
Recent Posts

User Application or Workload Monitoring on OpenShift Container Platform

Let’s assume a running application has been deployed to the RHOCP cluster inside a project (or namespace) called uat1, and that the Prometheus metrics endpoint is exposed on path /metrics.


In RHOCP 4.6, application monitoring can be set up by enabling monitoring for user-defined projects without the need to install an additional monitoring solution. This solution will deploy a second Prometheus Operator instance inside the openshift-user-workload-monitoring namespace that is configured to monitor all namespaces excluding the openshift- prefixed namespaces already monitored by the cluster's default platform monitoring.


Note: To understanding the monitoring stack for OpenShift Container Platform: https://docs.openshift.com/container-platform/4.6/monitoring/understanding-the-monitoring-stack.html

Let’s start the configuration as follows.


Step1: To enabling monitoring for user-defined projects:

Source Link: https://docs.openshift.com/container-platform/4.6/monitoring/enabling-monitoring-for-user-defined-projects.html

# oc -n openshift-monitoring edit configmap cluster-monitoring-config
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  config.yaml: |
    enableUserWorkload: true
    prometheusK8s:
      retention: 24h
kind: ConfigMap
metadata:
  creationTimestamp: "2021-09-27T12:00:54Z"
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "4912259"
  selfLink: /api/v1/namespaces/openshift-monitoring/configmaps/cluster-monitoring-config
  uid: 4590cb83-99e3-404b-92da-ffdeacbccc0d
Note: To check the prometheus-operator, prometheus-user-workload and thanos-ruler-user-workload pods are running in the openshift-user-workload-monitoring project.
# oc -n openshift-user-workload-monitoring get pod
NAME                                  READY   STATUS    RESTARTS   AGE
prometheus-operator-646cb67c9-qbr8z   2/2     Running   0          3d21h
prometheus-user-workload-0            4/4     Running   1          11h
prometheus-user-workload-1            4/4     Running   1          11h
thanos-ruler-user-workload-0          3/3     Running   0          4d15h
thanos-ruler-user-workload-1      3/3 Running   0      4d11h

Step2: To add the necessary permission to your user using: (Optional)

Source Link: https://docs.openshift.com/container-platform/4.6/monitoring/enabling-monitoring-for-user-defined-projects.html#granting-users-permission-to-monitor-user-defined-projects_enabling-monitoring-for-user-defined-projects


Though cluster administrators can monitor all core OpenShift Container Platform and user-defined projects. you can grant developers and other users permission to monitor their own projects if required.


As an Example, To assign the user-workload-monitoring-config-edit role to a ocp4user1 in the openshift-user-workload-monitoring project:

# oc -n openshift-user-workload-monitoring adm policy add-role-to-user user-workload-monitoring-config-edit ocp4user1 --role-namespace openshift-user-workload-monitoring

Step3: To set up metrics collection for the application projects apart from projects named openshift-*:

Source Link: https://docs.openshift.com/container-platform/4.6/monitoring/managing-metrics.html#deploying-a-sample-service_managing-metrics


We are going to use the prometheus-example-app example application and then to verify its available metrics to view. In this example, it is called sample-http-service.yaml to create a YAML file for the service configuration.

# cat sample-http-service.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: uat1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: http-sample
  name: http-sample
  namespace: uat1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: http-sample
  template:
    metadata:
      labels:
        app: http-sample
    spec:
      containers:
      - image: ghcr.io/rhobs/prometheus-example-app:0.3.0
        imagePullPolicy: IfNotPresent
        name: http-sample
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: http-sample
  name: http-sample
  namespace: uat1
spec:
  ports:
  - port: 8080
    protocol: TCP
    targetPort: 8080
    name: 8080-tcp
  selector:
    app: http-sample
  type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    app: http-sample
  name: http-sample
  namespace: uat1
spec:
  host: http-sample-uat1.apps.ocp-prod.jazakallah.info
  port:
    targetPort: 8080-tcp
  to:
    kind: Service
    name: http-sample
    weight: 100
  wildcardPolicy: None

# oc apply -f  sample-http-service.yaml

# oc get pod -n uat1
NAME                           READY   STATUS    RESTARTS   AGE
http-sample-6b47b86c6d-wc54g   1/1     Running   0          8m48s

# oc get route -n uat1
NAME          HOST/PORT                                              PATH   SERVICES      PORT       TERMINATION   WILDCARD
http-sample   http-sample-uat1.apps.ocp-prod.jazakallah.info         http-sample   8080-tcp                 None

# curl http-sample-uat1.apps.ocp-prod.jazakallah.info
Hello from example application.

Now we have the exposed metrics through an HTTP service endpoint under the /metrics canonical name. We can list all available metrics for a service by running a curl query against http://<endpoint>/metrics.


For instance, We have exposed a route to the application http-sample and now run the following to view all of its available metrics:

# curl http-sample-uat1.apps.ocp-prod.jazakallah.info/metrics
# HELP http_request_duration_seconds Duration of all HTTP requests
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="0.005"} 6

::::::::::::: CUT SOME OUTPUT :::::::::::::

http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="+Inf"} 6
http_request_duration_seconds_sum{code="200",handler="found",method="get"} 7.1784e-05
http_request_duration_seconds_count{code="200",handler="found",method="get"} 6