You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenSearch is a common choice to store application logs. While the inverted index provides full-text search abilities that helps log searching, using OpenSearch as an observability solution comes with a few drawbacks:
the inverted index grows quickly proportional to the raw data, adding a large storage overhead
in the observability area, it could be unnecessary to index fields of every document because users usually focus on a short time range of data
metrics are usually not stored in OpenSearch with logs. this makes it difficult to correlate logs with metrics, and the user experience would be inconsistent if other products are used for metrics
This document will explore an alternate way for log storage using S3 to address these issues and bring down cost.
Data flow
Indexing
independent PPL library has log patterns configured by user or by extracting from existing logs
ingester receives raw logs from application/collector
ingester uses PPL library to process raw logs, sends derived metrics to OpenSearch metrics index
ingester compresses logs after certain conditions are met (time, log size) and sends them to S3
ingester sends log chunk metadata to OpenSearch S3 metadata index
Dashboards Observability displays visualizations by querying metrics using PPL plugin in OpenSearch
User sample workflow
user notices spikes from metrics in Dashboards Observability
Observability provides the S3 object which contains logs when spike happened using S3 metadata index
user queries S3 metadata index with parse pattern and filters
PPL pulls object from S3 and performs parse with filters
user identifies root cause of spike metrics in returned logs
Functional requirements
User should be able to define log metrics patterns in PPL library
User should be able to configure PPL library to connect to S3 bucket and OpenSearch endpoint
User should be able to integrate PPL library with existing ingestion solutions
User should be able to view metrics in Dashboards Observability and tail corresponding logs in S3 using PPL
User should be able to use regular PPL commands on S3 results
Non-functional requirements
Ingesting to S3 should be more cpu and memory efficient than to OpenSearch
Latency between input and output (s3) will exist but should be small
Non-goals
Full push down to s3 select (need evaluation)
Terms
Log chunk: S3 object size does not impact performance of queries with the same LIMIT, but will impact pagination performance. As a result, logs will be divided by a fixed time period (e.g. hour) and a maximum file size (e.g. 25MB compressed). Each compressed S3 object is a log chunk. Chunk size cannot be too small, otherwise it decreases compression rate and increases overhead when retrieving objects
S3 metadata index: each log chunk will correspond to a document in the S3 metadata index on OpenSearch, containing the S3 object URI, start and end timestamp of logs in the chunk
// metadata example"_source" : {"meta" : {"type" : "s3",// second log chunk for apache logs between 5PM to 6PM, containing logs from// 2022-04-04 17:42:57 to 2022-04-04 17:59:59"uri" : "sample-s3-ppl-logs-bucket/2022/04/04/apache-logs.17.2.log.gz","startTime" : "2022-04-04T17:42:57.754Z","endTime" : "2022-04-04T17:59:59.185Z"}}
Implementation
How to query by time range
Each document has startTime and endTime, to get all S3 objects within a given time range (e.g. 2022-04-04 17:11:00 to 2022-04-04 19:43:00) would be
... | where
`startTime` <= '2022-04-04 17:11:00' and `endTime` >= '2022-04-04 17:11:00'
or `startTime` <= '2022-04-04 19:43:00' and `endTime` >= '2022-04-04 19:43:00'
or `startTime` >= '2022-04-04 17:11:00' and `endTime` <= '2022-04-04 19:43:00'
To exclude logs from 17:00:00 to 17:11:00 and 19:43:00 to 20:00:00, pagination and additional metadata would be needed. One implementation could be storing the latest log line number after every fixed interval in the object metadata. For example, using fixed interval of 10 minutes, metadata of apache-logs.17.1.log.gz could have
"offset": [10318, 19908, 30631, 40710]
// 17:00:00 to 17:10:00 corresponds to log lines 0 to 10318
// 17:10:00 to 17:20:00 corresponds to log lines 10319 to 19908
// ...
And 17:11:00 rounds down to 17:10:00, and PPL will use pagination to skip the offset: ... | head 10000 from 10318
How to configure metrics from logs
PPL library will use an expression to extract fields from logs and run an aggregation query to extract metrics after every fixed interval.
The text was updated successfully, but these errors were encountered:
Overview
OpenSearch is a common choice to store application logs. While the inverted index provides full-text search abilities that helps log searching, using OpenSearch as an observability solution comes with a few drawbacks:
This document will explore an alternate way for log storage using S3 to address these issues and bring down cost.
Data flow
Indexing
User sample workflow
Functional requirements
Non-functional requirements
Non-goals
Terms
Log chunk: S3 object size does not impact performance of queries with the same LIMIT, but will impact pagination performance. As a result, logs will be divided by a fixed time period (e.g. hour) and a maximum file size (e.g. 25MB compressed). Each compressed S3 object is a log chunk. Chunk size cannot be too small, otherwise it decreases compression rate and increases overhead when retrieving objects
S3 metadata index: each log chunk will correspond to a document in the S3 metadata index on OpenSearch, containing the S3 object URI, start and end timestamp of logs in the chunk
Implementation
How to query by time range
Each document has startTime and endTime, to get all S3 objects within a given time range (e.g. 2022-04-04 17:11:00 to 2022-04-04 19:43:00) would be
A sample response could include these objects
To exclude logs from 17:00:00 to 17:11:00 and 19:43:00 to 20:00:00, pagination and additional metadata would be needed. One implementation could be storing the latest log line number after every fixed interval in the object metadata. For example, using fixed interval of 10 minutes, metadata of apache-logs.17.1.log.gz could have
And 17:11:00 rounds down to 17:10:00, and PPL will use pagination to skip the offset:
... | head 10000 from 10318
How to configure metrics from logs
PPL library will use an expression to extract fields from logs and run an aggregation query to extract metrics after every fixed interval.
The text was updated successfully, but these errors were encountered: