Self-Monitoring
Self-Monitoring is a module that includes a set of dashboards for centralized cluster health monitoring. It enables timely detection and resolution of cluster and data collection layer anomalies before failures occur.
Conventions
SAF_INSTALLER
- Directory where theSearch Anywhere Framework
installation package is extractedUSER
- System administrator user, typicallyadmin
OS_HOME
- OpenSearch home directory, typically/app/opensearch/
OSD_HOME
- OpenSearch Dashboards home directory, typically/app/opensearch-dashboards/
LOGSTASH_HOME
- Logstash home directory, typically/app/logstash/
SBM_HOME
- Search Anywhere Framework Beat Manager home directory, typically/app/safBeatManager/
SB_HOME
- Search Anywhere Framework Beat home directory, typically/app/safBeat/
Useful Links
- Installing SA Master Node
- Installing SA Data Storage
- Installing SA Data Collector
- Installing Search Anywhere Framework Beat Manager
- Installing Search Anywhere Framework Beat for Linux
- Configuring Cross Cluster Search
Self-Monitoring Usage
The module consists of dashboards displaying various critical metrics of the target (monitored) cluster, enabling rapid response to any changes or issues.
Monitored components include:
- SA Data Storage: Tracks all processes related to cluster operations
- SA Data Collector: Monitors event ingestion and proper data collection
- SA Master Node: Monitors control components status and cluster coordination
Architecture
Self-monitoring can be deployed using two distinct architecture types:
- Deployment on a
dedicated monitoring cluster
- Deployment within the
target cluster
Target cluster - the cluster being monitored for metrics.
Type I Architecture (Dedicated Monitoring Cluster)
We recommend deploying all self-monitoring server components on a single node, particularly for small to medium-sized solutions - this simplifies deployment and maintenance.
Minimum self-monitoring server specifications:
-
CPU: from 4 cores
-
RAM: from 16 GB
-
Storage: from 100 GB
Storage capacity depends on: Volume of collected data,number of monitored clusters, data retention period Parameters can be scaled up as needed.
In Type I architecture, the self-monitoring server is a fully isolated cluster comprising:
- SA Data Storage - Handles data storage and full-text search
- SA Dashboards - Provides data visualization
- SA Data Collector - Receives data from target cluster agents and polls via REST API
- SAF Beat Manager - Manages agents installed on target clusters
Advantages
Deploying the self-monitoring server on a dedicated cluster offers key benefits:
-
Availability during target cluster outages
- The independent self-monitoring server remains operational even if the target cluster fails
-
Reduced target cluster load
- Offloading monitoring eliminates resource contention. Monitoring processes are resource-intensive by nature. Separate infrastructure ensures stable production cluster performance
Disadvantages
While this is the recommended default configuration, consider these factors:
-
Additional infrastructure resources
- Requires dedicated servers/compute resources. May increase operational costs
-
Increased configuration complexity
- To get self-monitoring data from the target cluster, you need to configure the
cross-cluster search
mechanism (for configuration information, see useful links). This adds steps to the configuration process and requires special attention when maintaining the system
- To get self-monitoring data from the target cluster, you need to configure the
Type II Architecture (Primary Cluster)
In the second architecture type, all previously described operations occur within a single cluster where self-monitoring is deployed.
Advantages
-
No need for additional servers or their configuration
- Eliminates the requirement to allocate and configure extra resources for self-monitoring servers, potentially reducing infrastructure costs and simplifying deployment
-
Immediate access to self-monitoring data on the primary cluster
- Self-monitoring data becomes instantly available within the target cluster, facilitating easier access and analysis for system operators and administrators
Disadvantages
-
Self-monitoring unavailability during cluster failures
- If the primary cluster experiences a critical failure, the self-monitoring server also becomes unavailable, potentially complicating system status and performance monitoring
-
Additional load on the primary cluster
- Running self-monitoring on the primary cluster may impose extra resource demands, which could impact application or service performance
Data Collection
The data collected by self-monitoring can be categorized into two types:
- Log files
- Metrics and other statistics
Log Collection
Log collection is performed from all nodes of the SA Data Storage, SA Data Collector, and SA Master Node in the target cluster. SAF Beat agents are installed and configured on the target cluster hosts (installation instructions can be found in the useful links section). During configuration, the SAF Beat Manager address of the self-monitoring server is specified, which contains the necessary configurations for data collection and transmission agents. These components will be installed and running on the cluster hosts. For log collection, Filebeat
is configured to collect the following logs:
For SA Data Storage hosts:
- Cluster logs
- sme logs
- sme-re logs
- job scheduler logs
For SA Data Collector hosts:
- SA Data Collector logs
- SA Data Collector pipeline logs
Metrics Collection
For collecting metrics from SA Data Storage, the http_poller
input plugin in SA Data Collector is used. This plugin periodically polls specified REST endpoints.
In the self-monitoring pipeline templates, you can see that the http_poller
plugin can send identical requests to all master nodes of the target cluster and then use the throttle filter plugin to filter duplicate responses. This ensures data retrieval even if one or more master nodes fail, as requests will be executed to the remaining operational nodes.
For collecting metrics from SA Data Collector, Metricbeat is used.
Configuration
The self-monitoring package includes scripts for pipeline generation, agent configurations, and automation of other necessary operations:
-
generate_pipelines.py
: This script generates pipelines and agent configurations. It automatically creates the required elements for data collection and agent setup -
generate_opensearch_configs.py
: This script generates Index State Management (ISM) policies, creates index templates, and copies dashboards. It can also connect to SA Data Storage and create corresponding policies, index templates, and indices when needed -
import_certs.sh
: This script adds host certificates to the truststore, which is essential for establishing secure TLS connections when SA Data Collector accesses target hosts via API
Configuration file
All the scripts mentioned above extract the settings from the configuration file config.ini
.
The fields of the configuration file are described in the table below.:
Field | Parameter |
---|---|
rest | Data for loading ISM policies, index templates, and index creation in SA Data Storage (for the script
|
ism | Data for configuring the ISM policy for indexes with self-monitoring data:
|
index_templates | Data for creating index templates with self-monitoring data:
|
poller | Data for the
|
logstash | Configuration parameters of the SA Data Collector on the self-monitoring server:
|
beats | Agent Configuration Settings:
|
Installation
The SA Master Node, SA Data Storage, SA Data Collector, and SAF Beat Manager components (installation information can be found in the Useful Links section) must be properly installed and configured before proceeding with this guide.
The selfmonitoring
component is included in the base SAF package and is located in the${SAF_INSTALLER}/utils/selfmonitoring/
directory. We recommend using Python, which is also included in the installer package and located at ${SAF_INSTALLER}/utils/python/bin/python3
.
Configuration Setup
Populate the config.ini
configuration file with actual parameters.
Most parameters related to paths, polling settings, ISM, and index templates can remain unchanged. Required modifications typically involve IP addresses only. However, we strongly recommend carefully reviewing each parameter before use to ensure all values are correct and meet your system's current requirements.
Running Scripts
Generating Files for SA Data Storage
When executed without arguments, the script will generate ISM policies, index templates, and dashboards, saving them to the ${SAF_INSTALLER}/utils/selfmonitoring/generated_data directory
:
${SAF_INSTALLER}/utils/python/bin/python3 ${SAF_INSTALLER}/utils/selfmonitoring/generate_opensearch_configs.py
When executed with the --upload
argument, the script will both generate files and upload the content to the self-monitoring SA Data Storage (also creating the actual indices):
${SAF_INSTALLER}/utils/python/bin/python3 ${SAF_INSTALLER}/utils/selfmonitoring/generate_opensearch_configs.py --upload
You may first run the script without the --upload
flag to review the output, then rerun with --upload
to upload to SA Data Storage after verifying the results.
Generating Files for SA Data Collector and SAF Beat Manager
Execute the ${SAF_INSTALLER}/utils/selfmonitoring/generate_pipelines.py
:
${SAF_INSTALLER}/utils/python/bin/python3 ${SAF_INSTALLER}/utils/selfmonitoring/generate_pipelines.py
After execution, the ${SAF_INSTALLER}/utils/selfmonitoring/generated_data
directory will contain new subdirectories with data for SA Data Collector and SAF Beat Manager.
Certificate Import
Execute the ${SAF_INSTALLER}/utils/selfmonitoring/import_certs.sh
:
cd ${SAF_INSTALLER}/utils/selfmonitoring/ && chmod +x import_certs.sh
sudo -u logstash ./import_certs.sh
The script must be executed under the logstash
user account.
Script execution flow:
- Creating a new
truststore
: When launched, the script will prompt twice for a new password for the certificate store. Remember this password as it will be required in subsequent steps - Certificate retrieval: The script will connect to each host and retrieve certificates. When prompted
Trust this certificate? [no]:
, answeryes
- Password storage: The script will request passwords for
ts_pwd
andos_pwd
tokens. Forts_pwd
: Enter the password created in step 1. Foros_pwd
: Enter the SA Data Storage user password specified inconfig.ini
If authentication credentials differ between target and monitoring clusters, and a different token is specified in the logstash.pwd_token
configuration field, add it manually on the monitoring cluster's SA Data Collector server:
sudo -u logstash ${LOGSTASH_HOME}/bin/logstash-keystore add <TOKEN_NAME>
-
If a truststore has already been created on the SA Data Collector at the time of the script launch, following the path specified in the configuration, the script will ask you for its current password once
-
If no keystore exists when running the script, you'll see:
The keystore password is not set... Continue without password protection on the keystore? [y/N]
Answery
. For initialized keystores. With password: Script will prompt for it. Without password: No additional input required
Deploying Generated Files
After completing previous steps, all required files (build artifacts) will be available in ${SAF_INSTALLER}/utils/selfmonitoring/generated_data
Below directory references are relative to this location unless absolute paths are specified.
SA Data Storage Configuration
Create ISM policies and index templates using files in the ism
and index_templates
directories respectively. Then create indices.
Skip this step if you automatically uploaded SA Data Storage configurations (by running generate_opensearch_configs.py
with the --upload
flag)
SA Data Collector Configuration
- Copy pipelines from
${SAF_INSTALLER}/utils/selfmonitoring/generated_data/logstash/pipelines/
to${LOGSTASH_HOME}/config/conf.d/
- Transfer scripts from
${SAF_INSTALLER}/utils/selfmonitoring/generated_data/logstash/scripts/
to the directory specified in thelogstash.scripts_path
configuration parameter (default:${LOGSTASH_HOME}/config/conf.d/scripts/
) - Append the contents of
pipelines.yml
to${LOGSTASH_HOME}/config/pipelines.yml
- Modify ownership of transferred files
After completing the steps, restart the SA Data Collector service:
sudo chown -R logstash:logstash ${LOGSTASH_HOME}/
sudo systemctl restart logstash.service
Configuration SAF Beat Manager
-
Copy contents from
${SAF_INSTALLER}/utils/selfmonitoring/generated_data/sbm
to${SBM_HOME}/apps/
-
Download required binaries: filebeat and metricbeat (contact technical support if download issues occur), place them in
${SBM_HOME}/binaries/
-
Place them in
${SBM_HOME}/etc/serverclasses.yml
. To include the following groups:linux_selfmon
andlinux_selfmon_logstash
:
groups:
- name: linux_selfmon
filters:
- "<IP 1st node opensearch>"
- "<IP 2nd node opensearch>"
...
- "<IP N-th node opensearch>"
apps:
- filebeat_selfmon
binaries:
- filebeat-oss-8.9.2-linux-x86_64.tar.gz
- name: linux_selfmon_logstash
filters:
- "<IP 1-st logstash>"
- "<IP 2-nd logstash>"
...
- "<IP N-th logstash>"
apps:
- filebeat_logstash
- metricbeat_logstash
binaries:
- filebeat-oss-8.9.2-linux-x86_64.tar.gz
- metricbeat-oss-8.9.2-linux-x86_64.tar.gz
Pay special attention to the filters
and binaries
sections. In filters
: For the first group: Specify IP addresses of all target cluster nodes. For the second group: Specify IP addresses of all Logstash instances in the target cluster In the binaries
section: Verify the archive names match those you downloaded to ${SBM_HOME}/binaries/
.
After completing these steps, restart the SAF Beat Manager service:
sudo systemctl restart safBeatManager.service
Dashboard Deployment
SAF Beat must be installed on all SA Data Collector, SA Data Storage, and SA Master Node instances in the target cluster (installation instructions available in Useful Links). Configure SAF Beat to use the monitoring cluster's SAF Beat Manager server.
- In the monitoring web interface: Navigate to
Dashboards
(Main menu
-General
-Dashboards
) Create new dashboards using JSON files from thedashboards
directory - Update node selection filters in dashboards with current cluster data
- In the
Module Settings
section, go toIndex Templates
(Main Menu
-System Settings
-Module Settings
-Index Settings Templates
) and create templates similar to index aliases (clusterstats
,clusterhealth
, etc.)
The self-monitoring installation is now complete.
For dedicated monitoring cluster architectures, configure cross-cluster search
(configuration details available in Useful Links) to enable queries from the target cluster.