Skip to main content

Data collection issues troubleshooting

Delay or loss of events

In situations where the recommendations for data collection and parsing are followed, but there is still an overload of the collector, which can be expressed in a delay or loss of events, it is recommended to study the following metrics.

Backpressure

This metric should tend to zero. An increase in the value occurs when the collector cannot cope with the volume of incoming data and begins to slow down or suspend their processing. This can happen for various reasons, for example, due to bandwidth limitations, high CPU load, or lack of memory. Endpoint _node/stats/flow is used to track the metric.

Throughput

Throughput is a metric that displays the amount of data processed by Logstash per unit of time. A decrease in this indicator relative to the target value may indicate problems in the configuration of resource usage or a shortage of resources. Moreover, this metric is useful for assessing the load on the storage into which data enters from the collector. To increase throughput caused by low utilization of available resources, it is recommended to increase the value of pipeline.workers or pipeline.batch.size. To track metrics, endpoints _node/stats/flow and _node/stats/pipelines are used to get statistics for each pipeline.

Queue

The queue in the collector is used to store events waiting to be processed. Monitoring the status of the queue is important to prevent data loss. There are two types of queues: memory and persistent. The in-memory queue provides fast data processing, but is prone to data loss during service failures and restarts. Disk queue (persistent), saves data to disk, which ensures their safety, but reduces the processing speed. The choice of queue type depends on the characteristics of the stream and data security requirements. When using both types of queues, make sure that the space allocated for them is available in memory and on disk, respectively. Endpoint _node/stats/pipelines is used to track the status of queues for each pipeline.