MELT

Acronyms

MELT - Metrics, Events, Logs, Traces
USE Method - Utilisation, Saturation, Errors
RED Method - Rate, Errors, Duration
"Golden Signals" - Latency, Errors, Traffic, Saturation
"Core Web Vitals" - Largest Contentful Paint, First Input Delay, Cumulative Layout Shift

MELT

MELT refers to the four ways that systems can be observed, Metrics, Events, Logs and Traces.

Metrics

Metrics are numeric representations from a system, such as CPU Usage %, Concurrent User Sessions, Database Transactions per Second and so on. They are usually highly efficient to store and query as they are numbers instead of text and so don't require as much processing to present information.

Events

An event is a collated set of activities that happened at a point in time. For example a user logging into a web portal, a backup process starting, or an alert being sent from the system. Events and logs are closely related and can be very similar, my own differentiation is that a log is a more granular substep that happened as part of an event. As an example, an event may be that a vending machine purchase was made for $1.40. The individual logs might record the specific denominations that were inserted, any change given, transaction attempts to a payment provider, incrementing inventory and so on.

Logs

A log is a text-based record of something done by the system. This can have several levels of granularity and structure but will tend to provide a historic record of what happened in a system. Because logs can have varying structures and content, they usually require some level of processing to provide consistency before reporting.

Traces

A trace is an end-to-end transaction across multiple systems, particularly in distributed environments but can also be used in single-system software. A trace tracks the activity of a workflow across different components of the software and infrastructure stack and represents the full lifecycle of that transaction. For example, a user logon may have a trace from the load balancer to the web server, through to the third-party identity provider and the resulting page that is then generated and sent back to the user. This can help to identify areas across the whole service that may be causing problems, such as delays between the web server and database layer that may not be visible if looking at the two systems individually.

More Information

Splunk - MELT Explained: Metrics, Events, Logs & Traces

New Relic - Melt 101 - An introduction to the four essential telemetry data types