“One Log” by Arnaud Bailly & Yann Schwartz
- Applications have input/output, the functional plane
- Also have runtime controls and accidental outputs, the operational plane
- accidental output is logs, tracing, monitoring
- [naming seems misleading, I’d prefer operational output]
Shaping the Log
- Make the accidental output:
- no longer an afterthought
- useful, available
- Log schema:
- Each log is an event, all output should conform to a schema that describes the event
- don’t string interpolate, use type/value pairs
- Structured log:
- log events are parsable, can be filtered, categorized
- Collect infra info
- e.g. query the container, infra, OS info
- Centralize info in one log location:
- application + infra same place
- Log a lot, view using tools instead of raw text
Log as a Language
[Nomenclature is confusing and the metaphor so far]
Words
- each event/log is considered a word
- we can group events/words
- certain events are similar enough in their nature to be considered in the same group; some details will defer
- e.g. events showing backup was performed: we ignore the size, timestamp, we just care this grouping is about the backup being performed successfully
- e.g. events showing a specific RPC failed: we ignore the exact IP, we just care this grouping is about a specific RPC failing
- we can count events/words
- e.g. how many events/words are in groups?
- we can consider monitoring based on this
Grammar (Relation Between Words)
- We can derive system behavior by looking at sequence of words/events
- Accidental output (e.g. structured logs) can be fed into decision-making parts of the program
- e.g. implement circuit breaker pattern based on the frequency of RPC-failed messages within a time span
- i.e. accidental output of component 1 become functional input of component 2
NLP on Logs
- Apply NLP on logs
- Detect relationship between words using Log2Vec
- e.g. high CPU word can correlate with a failure word somewhere else