Skip to main content

CLM and observability

What is observability?​

Observability is the ability to understand what is happening in an IT system in real time, even when it is complex or distributed. It is based on a set of contextualized events from various sources, originating from systems that may be dynamic.

  • In CLM, an event is a log entry.
  • Contextualized: the log indicates its source (service name, environment, etc.).
  • From various sources: Logs come from all types of devices and applications. For example, technical data can be cross-referenced with sales data.

Observability meets two main types of needs:

  • Response: for operational teams, observability is a rapid response tool. It allows them to detect a symptom, understand what is really happening, and perform a root cause analysis to correct the problem. The goal is to reduce the impact of incidents and restore service as quickly as possible.
  • Decision-making: for managers, observability provides a comprehensive and reliable overview of the system's status. Dashboards enable them to track trends, evaluate performance, anticipate risks, and make informed decisions to improve services or optimize resources.

How does observability complement monitoring?​

  • Monitoring detects problems that can be anticipated, as it relies on metrics declared in the tool and alerts based on thresholds defined through prior analysis. It answers the following question: "Is the system working as expected?"
  • On the other hand, observability allows you to discover and address unexpected problems, even in dynamic environments. In CLM, it means analyzing detailed and contextualized logs. This allows you to investigate unknown problems in a complex system (microservices, events, queues, etc.).
  • Once unknown issues have been diagnosed using CLM, you can integrate their detection into your monitoring tool.

Example:

  1. I notice an incident in Centreon Infra Monitoring, but I can't find enough information to determine the cause.
  2. In CLM, I investigate the relevant logs and explore their context to identify the root cause of the problem.
  3. Once the cause is understood, I can create an alert in Centreon Infra Monitoring to detect the issue automatically in the future.

Simple Summary​

AspectMonitoringObservability
PurposeKnow that there is a problemUnderstand why and where it occurs
NaturePredefined (known thresholds)Exploratory, open-ended
DataSimple metricsEnriched logs
Relevant forSimple systemsMicroservices, event-driven, cloud
CapabilityDetectDiagnose