baler  2.3.1
Baler daemon

SYNOPSIS

balerd [OPTIONS]

DESCRIPTION

balerd (Baler Daemon) is the core program that process input messages (prepared by various input plugins – see more in Baler Input Plugin Interface). The process starts by an input plugin prepares an input entry and post it to the balerd's input queue. In an input entry, a message is decomposed roughly into three fields: timestamp, host or component name, and a list tokens composing the message.

balerd process an input entry by transforming the host name and tokens into numbers (IDs). The message at this stage will be described as a sequence of token IDs instead of a sequence of tokens. The mapping (token_ID <–> token) is stored in balerd internal store.

Next, balerd extracts a pattern out of a message by preserving static tokens in the message and replacing the variable tokens by a special token '*'. Right now, the heuristic to determine a static token is to check whether it is an English word–if so, it is a static token. The extracted pattern is checked against or inserted into a pattern mapping (pattern_ID <–> pattern) to obtain a pattern_ID. The pattern mapping is also stored inside balerd's internal store. Then, the message is reduced into the form of <pattern_ID, token_ID0, token_ID1, ...>, where token_ID#'s are the corresponding token_ID's in the variable positions.

The processed (reduced) message is then forwarded to Baler Output Plugins (see Baler Output Plugin) for further processing and message storage.

balerd input and output plugins can be configured via Baler Configuration file. Please see CONFIGURATION section below.

To support large-scale system, multiple balerd's are needed to parallelly process the large amount of input. One can run multiple balerd's independently, but this can be problematic because the same pattern (or token) that appear in two different balerd's can be assigned to a different ID. To solve this problem, balerd can run in two mode: master and slave (see -m OPTION). There should only be one master, and multiple slaves. The master balerd is the one who knows all of the mapping. The slave balerd's know only by the need-to-know basis. When a slave balerd encountered a new token (or pattern), it asks the master to assign the ID. When the ID is assigned, the slave also stores that ID mapping locally to reduce further network traffic.

OPTIONS

-l LOG_PATH
Log file (default: None, and log to stdout)
-s STORE_PATH
Path to a baler store (default: ./store)
-C CONFIG_FILE
Path to the configuration (Baler commands) file. This is optional as users may use ocmd to configure baler.
-F
Run in foreground mode (default: daemon mode)
-m SM_MODE
Either 'master' or 'slave' mode (default: master).
-x SM_XPRT
Zap transport to be used in slave-master communication (default: sock).
-h M_HOST
For slave balerd, this option specifies master hostname or IP address to connect to. This option is ignored in master-mode. This option is required in slave-mode balerd and has no default value.
-p M_PORT
For slave balerd, this specifies the port of the master to connect to. For master balerd, this specifies the port number to listen to. (default: ':30003').
-z OCM_PORT
Specifying a port for receiving OCM connection and configuration (default: 20005).
-I NUMBER
Specify the number of input worker threads (default: 1).
-O NUMBER
Specify the number of output worker threads (default: 1).
-?
Display help message.

CONFIGURATION

Baler configuration file (OPTION -C) contains a sequence of balerd config commands to configure balerd. The available commands are documented as follows.

CONFIGURATION COMMANDS

tokens type=(ENG|HOST) path=PATH
Load ENG or HOST tokens from PATH. Please see HOST AND TOKEN FILE FORMAT below for more information.
plugin name=PLUGIN_NAME [PLUGIN-SPECIFIC-OPTIONS]
Load the plugin PLUGIN_NAME and configure the plugin with PLUGIN-SPECIFIC-OPTIONS. The specified plugin can either be input or output plugins (or both ... if the developer wants to). Conventionally, the input plugin names start with 'bin_' and the output plugin names start with 'bout_'. It is advisable to load output plugins BEFORE the input plugins to prevent lost output data as balerd could finish processing some of the input before the output plugins finish loading. Please see each plugin documentation for its specific options (e.g. bin_rsyslog_tcp.config(5)).
# comment
The '#' comment at the beginning of each line is supported. However, the in-line trailing '#' comment is not supported. For example:
# This is a good comment.
tokens type=ENG path=my_dict # This is a bad comment.

CONFIGURATION_EXAMPLE

tokens type=ENG path=/path/to/word.list
tokens type=HOST path=/path/to/host.list
# Image output with 3600 seconds (1 hour) pixel granularity.
plugin name=bout_sos_img delta_ts=3600
# Another image output with 60 seconds (1 minute) pixel granularity.
plugin name=bout_sos_img delta_ts=60
# Message output
plugin name=bout_sos_msg
# Input plugin for rsyslog, don't forget to configure rsyslog in each
# node to forward messages to balerd host, port 11111.
plugin name=bin_rsyslog_tcp port=11111
# Input processing plugin for metric data. The metric data will be converted
# into message-based event data (metricX is in range [A, B]) to feed to
# balerd.
plugin name=bin_metric port=22222 bin_file=METRIC_BIN_FILE

For the detail of each plugin configuration, please see the respective plugin configuration page (e.g. bin_rsyslog_tcp.config(5))

HOST AND TOKEN FILE FORMAT

Each line of the file contains a token with an optional ID assignment: TOKEN [ID]

Token aliasing can be ndone by assign those tokens the same token ID.

TOKEN FILE

The following example of a token file with aliasing:

ABC 128
DEF 128
XYZ

Please note that token IDs less than 128 are reserved for balerd internal use. In the above example, if ABC or DEF appeared in messages, they will be recognized as the same token. If the ID is not present, balerd automatically assigns the max_ID + 1.

The output of balerd will always produce the first alias, because balerd stores messages as a sequence of token IDs which get translated back to strings at the output.

HOST FILE

The following example of a host file with aliasing:

nid00000 0
login0 0
nid00001 1
login1 1

Host IDs starts from 0 to make things more convenient for users. balerd will convert that into the real token ID space (starts from 128) internally.

From the above example, the host field of the messages generated from nid00000 and login0 will be recognized and stored as 0. Similar to token file, if the ID is not present, balerd will automatically assign the max_ID+1.

Please note that on the output side, the first alias will be printed.