Skip to content

Newly introduced critical-op PDB causes tons of alerts with kube-prometheus-stack #3020

@Luzifer

Description

@Luzifer

The newly introduced postgres-<instance>-critical-op-pdb is created with the maximum of healthy instances and a selector not matching any instances in normal operation. When using the kube-prometheus-stack in its default configuration, this raises an alarm because the PDB does not have any healthy pods:

PDB does not have enough healthy pods.
PDB /keycloak/postgres-keycloak-pg-critical-op-pdb expects 3 more healthy pods. The desired number of healthy pods has not been met for at least 15m.

Alert-Rule is defined here: templates/prometheus/rules-1.14/kubernetes-apps.yaml#L568-L601

Though after the first "what's going on here?!?" this is not critical, it causes a lot of noise in alerting and mitigation is either

  • to disable PDB for postgres-operator
  • disable PDB monitoring
  • introduce silences, muting the postgres-<instance>-critical-op-pdb alerts

which is not ideal, as in all cases parts are missing or alerts are not seen when it's important to see them.

A better solution would be, for example, to create those PDBs only on demand and not to let them stick around all the time. That way the alerts would be meaningful and not noisy while retaining the use case those PDBs fulfill.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions