Skip to main content
Version: 5.0

Self-Monitoring

Self-Monitoring is a module that includes a set of dashboards for centralized cluster health monitoring. It enables timely detection and resolution of cluster and data collection layer anomalies before failures occur.

Conventions

  • SAF_INSTALLER - Directory where the Search Anywhere Framework installation package is extracted
  • USER - System administrator user, typically admin
  • OS_HOME - OpenSearch home directory, typically /app/opensearch/
  • OSD_HOME - OpenSearch Dashboards home directory, typically /app/opensearch-dashboards/
  • LOGSTASH_HOME - Logstash home directory, typically /app/logstash/
  • SBM_HOME - Search Anywhere Framework Beat Manager home directory, typically /app/safBeatManager/
  • SB_HOME - Search Anywhere Framework Beat home directory, typically /app/safBeat/

Self-Monitoring Usage

The module consists of dashboards displaying various critical metrics of the target (monitored) cluster, enabling rapid response to any changes or issues.

Monitored components include:

  • SA Data Storage: Tracks all processes related to cluster operations
  • SA Data Collector: Monitors event ingestion and proper data collection
  • SA Master Node: Monitors control components status and cluster coordination

Architecture

Self-monitoring can be deployed using two distinct architecture types:

  1. Deployment on a dedicated monitoring cluster
  2. Deployment within the target cluster
Terminology

Target cluster - the cluster being monitored for metrics.

Type I Architecture (Dedicated Monitoring Cluster)

Recommendations

We recommend deploying all self-monitoring server components on a single node, particularly for small to medium-sized solutions - this simplifies deployment and maintenance.

Minimum self-monitoring server specifications:

  • CPU: from 4 cores

  • RAM: from 16 GB

  • Storage: from 100 GB

    Storage capacity depends on: Volume of collected data,number of monitored clusters, data retention period Parameters can be scaled up as needed.

In Type I architecture, the self-monitoring server is a fully isolated cluster comprising:

  • SA Data Storage - Handles data storage and full-text search
  • SA Dashboards - Provides data visualization
  • SA Data Collector - Receives data from target cluster agents and polls via REST API
  • SAF Beat Manager - Manages agents installed on target clusters

Advantages

Deploying the self-monitoring server on a dedicated cluster offers key benefits:

  • Availability during target cluster outages

    • The independent self-monitoring server remains operational even if the target cluster fails
  • Reduced target cluster load

    • Offloading monitoring eliminates resource contention. Monitoring processes are resource-intensive by nature. Separate infrastructure ensures stable production cluster performance

Disadvantages

While this is the recommended default configuration, consider these factors:

  • Additional infrastructure resources

    • Requires dedicated servers/compute resources. May increase operational costs
  • Increased configuration complexity

    • To get self-monitoring data from the target cluster, you need to configure the cross-cluster search mechanism (for configuration information, see useful links). This adds steps to the configuration process and requires special attention when maintaining the system

Type II Architecture (Primary Cluster)

In the second architecture type, all previously described operations occur within a single cluster where self-monitoring is deployed.

Advantages

  • No need for additional servers or their configuration

    • Eliminates the requirement to allocate and configure extra resources for self-monitoring servers, potentially reducing infrastructure costs and simplifying deployment
  • Immediate access to self-monitoring data on the primary cluster

    • Self-monitoring data becomes instantly available within the target cluster, facilitating easier access and analysis for system operators and administrators

Disadvantages

  • Self-monitoring unavailability during cluster failures

    • If the primary cluster experiences a critical failure, the self-monitoring server also becomes unavailable, potentially complicating system status and performance monitoring
  • Additional load on the primary cluster

    • Running self-monitoring on the primary cluster may impose extra resource demands, which could impact application or service performance

Data Collection

The data collected by self-monitoring can be categorized into two types:

  1. Log files
  2. Metrics and other statistics

Log Collection

Log collection is performed from all nodes of the SA Data Storage, SA Data Collector, and SA Master Node in the target cluster. SAF Beat agents are installed and configured on the target cluster hosts (installation instructions can be found in the useful links section). During configuration, the SAF Beat Manager address of the self-monitoring server is specified, which contains the necessary configurations for data collection and transmission agents. These components will be installed and running on the cluster hosts. For log collection, Filebeat is configured to collect the following logs:

For SA Data Storage hosts:

  • Cluster logs
  • sme logs
  • sme-re logs
  • job scheduler logs

For SA Data Collector hosts:

  • SA Data Collector logs
  • SA Data Collector pipeline logs

Metrics Collection

For collecting metrics from SA Data Storage, the http_poller input plugin in SA Data Collector is used. This plugin periodically polls specified REST endpoints.

note

In the self-monitoring pipeline templates, you can see that the http_poller plugin can send identical requests to all master nodes of the target cluster and then use the throttle filter plugin to filter duplicate responses. This ensures data retrieval even if one or more master nodes fail, as requests will be executed to the remaining operational nodes.

For collecting metrics from SA Data Collector, Metricbeat is used.

Configuration

The self-monitoring package includes scripts for pipeline generation, agent configurations, and automation of other necessary operations:

  1. generate_pipelines.py: This script generates pipelines and agent configurations. It automatically creates the required elements for data collection and agent setup

  2. generate_opensearch_configs.py: This script generates Index State Management (ISM) policies, creates index templates, and copies dashboards. It can also connect to SA Data Storage and create corresponding policies, index templates, and indices when needed

  3. import_certs.sh: This script adds host certificates to the truststore, which is essential for establishing secure TLS connections when SA Data Collector accesses target hosts via API

Configuration file

All the scripts mentioned above extract the settings from the configuration file config.ini.

The fields of the configuration file are described in the table below.:

FieldParameter
rest

Data for loading ISM policies, index templates, and index creation in SA Data Storage (for the script generate_opensearch_config.py):

  • user = admin — SA Data Storage user (the password will be requested interactively by the script)
  • master_node = 127.0.0.1 — any master node of the self-monitoring server for accessing the REST API
  • dashboards_node = 127.0.0.1 — the address of the SA Web node
ism

Data for configuring the ISM policy for indexes with self-monitoring data:

  • patterns = clusterhealth-*, clusterlogs-*, sme_re_logs-*, job_scheduler_logs-*, job_scheduler_audit_logs-*, core_logs-*, clusterstats-*, logstashstats-*, logstashlogs-*, shardstats-*, indexstats-*, cluster_query_types-* — index patterns
  • policy_id = selfmonitoring — ID politicians
  • age_to_rollover = 7d — age of the index for rollover
  • primary_size_to_rollover = 10gb — the size of the primary shard for rollover
  • rollover_retry_count = 5 — number of repetitions rollover
  • rollover_retry_backoff = constant — the function of rolling back repeated attempts rollover
  • rollover_retry_delay = 5m — delay between retries rollover
  • age_to_delete = 30d — age of the index to delete
  • delete_retry_count = 3 — number of repeated deletion attempts
  • delete_retry_backoff = linear — the function of rolling back repeated deletion attempts
  • delete_retry_delay = 1h — delaying repeated deletion attempts
  • delete_timeout = 1h — deletion attempt timeout
index_templates

Data for creating index templates with self-monitoring data:

  • indexes = clusterhealth, clusterlogs, sme_re_logs, job_scheduler_logs, job_scheduler_audit_logs, core_logs, clusterstats, logstashstats, logstashlogs, shardstats, indexstats, cluster_query_types — list of aliases
  • routing_mode = warm — node type for allocation
  • number_of_shards = 1 — number of primary shards
  • number_of_replicas = 0 — number of replicas of shards
poller

Data for the http_poller input plugin:

  • request_period = 60 — frequency of polling of nodes of the target cluster (in seconds)
  • master_nodes = 172.16.0.1, 172.16.0.2, 172.16.0.3 — nodes in the target cluster with the master role
  • user = admin — the SA Data Storage user under which RestAPI requests to the target cluster will be executed
  • pwd_token = os_pwd — keystore token for the password of the SA Data Storage user of the target cluster for executing RestAPI requests
logstash

Configuration parameters of the SA Data Collector on the self-monitoring server:

  • user = admin — the user under whom the self-monitoring server will be sent to SA Data Storage
  • pwd_token = os_pwd — keystore token for the password to the SA Data Storage self-monitoring user
  • data_nodes = 172.16.0.4, 172.16.0.5, 172.16.0.6 — a list of nodes with the data role of the self-monitoring server for sending data to them
  • scripts_path = /app/logstash/config/conf.d/scripts/ — location of the script directory containing auxiliary scripts for the pipeline SA Data Collector
  • truststore_path = /app/logstash/config/truststore.jks — the location of the truststore with the imported certificates of the target cluster hosts
  • truststore_pwd_token = ts_pwd — keystore token for the password truststore
  • ca_cert_path = /app/logstash/config/ca-cert.pem — location of the CA certificate of the self-monitoring server
  • node_cert_path = /app/logstash/config/node-cert.pem — location of the node certificate for SA Data Collector self-monitoring
  • node_key_path = /app/logstash/config/node-key.pem — location of the key to the node certificate for SA Data Collector self-monitoring
beats

Agent Configuration Settings:

  • logstash_ips = 172.17.0.10 — the address of the SA Data Collectors of the self-monitoring cluster to which data will be sent from the target cluster
  • filebeat_selfmon_port = 5050 — port filebeat, sending logs SA Data Storage
  • filebeat_logstash_port = 5051 — port filebeat, sending logs SA Data Collector
  • metricbeat_logstash_port = 5052 — port metricbeat, the sender of the metric SA Data Collector
  • cert_path = /app/safBeat/cert.pem — location of the certificate SAF Beat
  • key_path = /app/safBeat/key.pem — location of the key to the certificate SAF Beat
  • ca_cert_path = /app/safBeat/ca-cert.pem — location of the CA certificate of the self-monitoring server
  • smos_cluster_log_path = /app/logs/opensearch/sm-cluster.log — location of cluster logs
  • sme_log_path = /app/logs/opensearch/sme.log — location of logs SA Engine
  • sme_re_log_path = /app/logs/sme-re/main.log — location of logs SA Engine RE
  • job_scheduler_log_path = /app/logs/job_scheduler.log — location of logs job scheduler
  • job_scheduler_audit_log_path = /app/logs/job_scheduler_audit.log — location of audit logs job scheduler
  • core_log_path = /app/logs/core.log — location of logs Core
  • logstash_logs_path = /app/logs/logstash/*.log — location of logs SA Data Collector

Installation

note

The SA Master Node, SA Data Storage, SA Data Collector, and SAF Beat Manager components (installation information can be found in the Useful Links section) must be properly installed and configured before proceeding with this guide.

The selfmonitoring component is included in the base SAF package and is located in the${SAF_INSTALLER}/utils/selfmonitoring/ directory. We recommend using Python, which is also included in the installer package and located at ${SAF_INSTALLER}/utils/python/bin/python3.

Configuration Setup

Populate the config.ini configuration file with actual parameters.

note

Most parameters related to paths, polling settings, ISM, and index templates can remain unchanged. Required modifications typically involve IP addresses only. However, we strongly recommend carefully reviewing each parameter before use to ensure all values are correct and meet your system's current requirements.

Running Scripts

Generating Files for SA Data Storage

When executed without arguments, the script will generate ISM policies, index templates, and dashboards, saving them to the ${SAF_INSTALLER}/utils/selfmonitoring/generated_data directory:

${SAF_INSTALLER}/utils/python/bin/python3 ${SAF_INSTALLER}/utils/selfmonitoring/generate_opensearch_configs.py

When executed with the --upload argument, the script will both generate files and upload the content to the self-monitoring SA Data Storage (also creating the actual indices):

${SAF_INSTALLER}/utils/python/bin/python3 ${SAF_INSTALLER}/utils/selfmonitoring/generate_opensearch_configs.py --upload
note

You may first run the script without the --upload flag to review the output, then rerun with --upload to upload to SA Data Storage after verifying the results.

Generating Files for SA Data Collector and SAF Beat Manager

Execute the ${SAF_INSTALLER}/utils/selfmonitoring/generate_pipelines.py:

${SAF_INSTALLER}/utils/python/bin/python3 ${SAF_INSTALLER}/utils/selfmonitoring/generate_pipelines.py

After execution, the ${SAF_INSTALLER}/utils/selfmonitoring/generated_data directory will contain new subdirectories with data for SA Data Collector and SAF Beat Manager.

Certificate Import

Execute the ${SAF_INSTALLER}/utils/selfmonitoring/import_certs.sh:

cd ${SAF_INSTALLER}/utils/selfmonitoring/ && chmod +x import_certs.sh
sudo -u logstash ./import_certs.sh
Important

The script must be executed under the logstash user account.

Script execution flow:

  1. Creating a new truststore: When launched, the script will prompt twice for a new password for the certificate store. Remember this password as it will be required in subsequent steps
  2. Certificate retrieval: The script will connect to each host and retrieve certificates. When prompted Trust this certificate? [no]:, answer yes
  3. Password storage: The script will request passwords for ts_pwd and os_pwd tokens. For ts_pwd: Enter the password created in step 1. For os_pwd: Enter the SA Data Storage user password specified in config.ini

If authentication credentials differ between target and monitoring clusters, and a different token is specified in the logstash.pwd_token configuration field, add it manually on the monitoring cluster's SA Data Collector server:

sudo -u logstash ${LOGSTASH_HOME}/bin/logstash-keystore add <TOKEN_NAME>
note
  • If a truststore has already been created on the SA Data Collector at the time of the script launch, following the path specified in the configuration, the script will ask you for its current password once

  • If no keystore exists when running the script, you'll see:The keystore password is not set... Continue without password protection on the keystore? [y/N] Answer y. For initialized keystores. With password: Script will prompt for it. Without password: No additional input required

Deploying Generated Files

After completing previous steps, all required files (build artifacts) will be available in ${SAF_INSTALLER}/utils/selfmonitoring/generated_data Below directory references are relative to this location unless absolute paths are specified.

SA Data Storage Configuration

Create ISM policies and index templates using files in the ism and index_templates directories respectively. Then create indices.

note

Skip this step if you automatically uploaded SA Data Storage configurations (by running generate_opensearch_configs.py with the --upload flag)

SA Data Collector Configuration

  1. Copy pipelines from ${SAF_INSTALLER}/utils/selfmonitoring/generated_data/logstash/pipelines/ to ${LOGSTASH_HOME}/config/conf.d/
  2. Transfer scripts from ${SAF_INSTALLER}/utils/selfmonitoring/generated_data/logstash/scripts/ to the directory specified in the logstash.scripts_pathconfiguration parameter (default: ${LOGSTASH_HOME}/config/conf.d/scripts/)
  3. Append the contents of pipelines.yml to ${LOGSTASH_HOME}/config/pipelines.yml
  4. Modify ownership of transferred files

After completing the steps, restart the SA Data Collector service:

sudo chown -R logstash:logstash ${LOGSTASH_HOME}/
sudo systemctl restart logstash.service

Configuration SAF Beat Manager

  1. Copy contents from ${SAF_INSTALLER}/utils/selfmonitoring/generated_data/sbm to ${SBM_HOME}/apps/

  2. Download required binaries: filebeat and metricbeat (contact technical support if download issues occur), place them in ${SBM_HOME}/binaries/

  3. Place them in ${SBM_HOME}/etc/serverclasses.yml. To include the following groups: linux_selfmon and linux_selfmon_logstash:

groups:
- name: linux_selfmon
filters:
- "<IP 1st node opensearch>"
- "<IP 2nd node opensearch>"
...
- "<IP N-th node opensearch>"
apps:
- filebeat_selfmon
binaries:
- filebeat-oss-8.9.2-linux-x86_64.tar.gz

- name: linux_selfmon_logstash
filters:
- "<IP 1-st logstash>"
- "<IP 2-nd logstash>"
...
- "<IP N-th logstash>"
apps:
- filebeat_logstash
- metricbeat_logstash
binaries:
- filebeat-oss-8.9.2-linux-x86_64.tar.gz
- metricbeat-oss-8.9.2-linux-x86_64.tar.gz
Important

Pay special attention to the filters and binaries sections. In filters: For the first group: Specify IP addresses of all target cluster nodes. For the second group: Specify IP addresses of all Logstash instances in the target cluster In the binaries section: Verify the archive names match those you downloaded to ${SBM_HOME}/binaries/.

After completing these steps, restart the SAF Beat Manager service:

sudo systemctl restart safBeatManager.service

Dashboard Deployment

note

SAF Beat must be installed on all SA Data Collector, SA Data Storage, and SA Master Node instances in the target cluster (installation instructions available in Useful Links). Configure SAF Beat to use the monitoring cluster's SAF Beat Manager server.

  1. In the monitoring web interface: Navigate to Dashboards (Main menu - General - Dashboards) Create new dashboards using JSON files from the dashboards directory
  2. Update node selection filters in dashboards with current cluster data
  3. In the Module Settings section, go to Index Templates (Main Menu - System Settings - Module Settings - Index Settings Templates) and create templates similar to index aliases (clusterstats, clusterhealth, etc.)

The self-monitoring installation is now complete.

info

For dedicated monitoring cluster architectures, configure cross-cluster search (configuration details available in Useful Links) to enable queries from the target cluster.