SAFL optimization recommendations
The section provides examples of search optimizations in query language Search Anywhere Framework Language
(далее - SAFL). For more information about the list of commands and their purpose, see here .
Using full-text search
Full-text search simplifies the functionality for processing raw events, but its use increases the consumption of computing resources. If you know exactly which fields to search for, then it is better to specify them.
Example query for full-text search of login events:
source sm_cs_auth_indexes
| search "*4624*"
The event code is stored in the event.code
field and it is recommended to use the following search:
source sm_cs_auth_indexes
| search event.code="4624"
Using wildcards
To search by value pattern in SAF
it is possible to use wildcards. Searching using wildcards increases the number of events to search and requires more computing resources. The file.path
field contains both the path to the file and its name
source sm_cs_auth_indexes
| search file.path="*/example.json"
In such cases, it is recommended to use data parsing from the source according to the standard ECS
. After applying ECS parsing, two fields will be available: file.path
- path to the file, file.name
- file name. The search will be performed on the specific value for the field, in this case file.name
with the value example.json
.
source sm_cs_auth_indexes
| search file.name="example.json"
Both examples perform the same task, but in the second example the search is more optimized and uses minimal cluster resources.
Use filtering before data manipulation
For best performance, it is recommended to use data filtering. If you need to filter already processed data, you can use the where
command.
The example searches for all events with event.code="4624" and then filters by the user.name
field:
source sm_cs_auth_indexes
| search event.code="4624"
| aggs count by user.name, host.name, source.ip
| where user.name=="maksimov.m"
Search with direct condition
При выполнении прямого поиска используется меньше ресурсов, чем при обратном поиске.
Search with negation condition:
source sm_cs_auth_indexes
| search user.name!="SMART-DC$" AND event.action!="logged-in"
Search with direct condition:
source sm_cs_auth_indexes
| search user.name="maksimov.m" AND event.action="logged-out"
Statistics calculation commands
В SAF
есть команды подсчета статистики: aggs
и stats
.
When using the aggs
command, the operation occurs at the SAF Data Storage
level and allows you to process a larger data set. For the stats
command, it is important to take into account the qsize
parameter, which filters the number of events processed. Changing the qsize
parameter increases the load on RAM.
The commands for calculating statistics on a timeline work in a similar way: timeaggs
и timechart
.
Using сommands effectively
To optimize the search and reduce the load on the system, you need to choose the right commands.
The search engine works in such a way that the input of the next command receives the results of the previous command. For example, if you need to perform transformations on data before aggregation (the aggs
command), then the eval
command will not work in this case and the corresponding error will be displayed. Example:
source sm_cs_auth_indexes
| eval user.domain=lower(user.domain)
| aggs count by user.domain
In this case, you must use the peval
command, which uses the internal SAF Data Storage
mechanism:
source sm_cs_auth_indexes
| peval user.domain=lower(user.domain)
| aggs count by user.domain
Для просмотра имеющихся полей в событии после обработки можно воспользоваться командой | table *
.
In SAF
the user can enrich the search query data using a nested subquery. To do this, use the join
command:
source sm_cs_auth_indexes
| search user.name!="*$" AND user.name!="unknown" AND event.code=4624
| peval source.address=coalesce(source.address, source.ip)
| aggs count, latest(@timestamp) as latest_time, earliest(@timestamp) as earliest_time, values(source.address) as source.address_values, values(winlog.computer_name) as winlog.computer_name_values by user.name
| eval duration=strptime(latest_time, "YYYY-MM-dd'T'HH:mm:ss.SSS'Z'") - strptime(earliest_time, "YYYY-MM-dd'T'HH:mm:ss.SSS'Z'")
| eval user.name=lower(user.name)
| join user.name
[ source sm_cs_auth_indexes
| search event.code=4634
| peval user.name=lower(user.name)
| aggs max(@timestamp) as time_password_last_change by user.name ]
SAF allows you to use multiple sources in one search query and combine the resulting data.