Structured Message Formats: Thoughts on What To Log

In general, logging and messaging aren’t the most glamorous aspects of software engineering, but necessary. Getting logging right, out of the gate, for your project is important. Specifically, what should be in a structured log message?

We’ve built a semi-opinionated library structured-messaging, that enables you to easily create structured logs using Winston.

Original logging dating back hundreds of years.

The post is not about persisting messages or aggregating all messages generated from different compute instances to a single logging destination. It is not about a logging library (see Winston, Bunyan, etc.). It is not about defining an industry-standard (see RFC5424 Syslog, etc.).

This post discusses why you should structure your log and event messages and what kind of information you may want to consider when logging.

Logging is Really Messaging

Logging and events both use messaging but for different purposes. Event messages are similar to"a user logged in” or “a user failed to login.” Logging messages are similar to “File ‘xys.json’ opened” or "An exception was thrown while trying to access a file."

Even though this post focuses on logging messages, we will also consider event messages.

Why Structured Format For Messages and Logging?

Logging information using an ad-hoc format makes it difficult for people to gain insight into what is happening in a computing environment: this is especially the case for data analysts who may need to use the logs for just such a purpose.

Providing a consistent message format enables:

  • systems like Apache Spark to easily consume and analyze logs.
  • Automation of routing messages in Kafka.
  • Reformating log messages for easy human review (JSON -> YAML).

Structured logging requires a data format. In this article, we’ll recommend using JSON even though other data formats could be considered.

Design Considerations

After reviewing different logging formats and chatting with people of different backgrounds and needs, here are some of our takeaways:

  • Local Development: Logging can’t be too noisy. So, minimizing the number of elements in the message is needed.
  • Local Development: Having a nice visual representation of the log messages is helpful: red for errors, green for info, yellow for warnings, etc.
  • Idempotence: Sending the same message twice doesn’t result in double counting.
  • Verification: In some situations, we may need to verify that the message has not changed during transit.

A Structured Message Format

Covered in detail later, the following is a complete message:

{
"id": "d96b195b-b24b-4981-8d28-46a68f07ff0a",
"level": "info",
"message": "Running SQL for Alan: 'SELECT * FROM types;'.",
"topics": ["log", "notification"],
"priority": "1",
"template": "Running SQL for %{data.name}: '%{data.query}'.",
"transactId": "63bd5a30-398a-4ebb-9e83-ec2c4e92aa89",
"sessionId": "5565d96b-2b1f-4889-8e66-5b994036d1bf",
"traceId": "e95d9220-62e1-42e5-a0c5-b158dab7f2ec",
"memberId": "someuser",
"time": {
"format": "iso8601",
"created": "2020-10-20T04:39:34+07:00",
"expires": "2020-12-20T09:00:34+07:00",
},
"pii": ["data.name"],
"dataSchema" : {
"name": "sql",
"ver": "1.0.0"
},
"data" : {
"query": "SELECT * FROM types;",
"name": "Alan",
},
"context" : {
"app": {
"env": "dev",
"name": "loginService",
"platform": "magicCrm",
"file": "/Users/me/proj/login-service/query.ts",
"line": 34,
},
"compute": {
"host": "localhost",
"shardId" : "9bc423dc-61f2-419c-9f53-8f417a74863e",
"processId": "4545",
},
"hostedBy": {
"name": "aws",
"___": "hosted by specific properties",
},
},
"hash": {
"algo": "Sha1",
"code": "4d90a79012905e89ea4473258e1874fcce12a593"
}
}

Starting With the Minimum

Running an application locally would require a totally different message format than running an application in a company with hundreds of applications located within multiple cloud providers. IT would need to know which cloud provider the log message came from. A developer running an application locally won’t care what platform the message came from.

However! We want our message format to be consistent irrelevant to whoever is consuming that message. So, we’ll start with a format that is small and build on that.

This is a valid display of the message: a developer may want to use it when debugging locally.

{
"id": "d96b195b-b24b-4981-8d28-46a68f07ff0a",
"level": "info",
"message": "Running SQL for Alan: 'SELECT * FROM types;'."
}

NOTE: The message isn’t displaying all the required fields, but the original message had all the required fields.

Structured Message Properties

id (required)

Every message generated is given a globally unique id. A system consuming a message can assure it only stores the message one time (should the same message somehow be sent twice). This means systems that consume messages can be idempotent.

Example: "id": “c7963876–135c-11eb-adc1–0242ac120002”

NOTE: No specific UUID type is recommended.

level (required)

A log level enables a log message to be filtered based on the message's importance.

Example: "level": "warn" , "level": "info"

NOTE: Different logging platforms use different log level enumerations: a few examples being Winston, Log4J, log4net, and python’s logging. It is essential to consider the differences because a large company may be using one or more of these logging libraries. This would negatively affect groups like IT, infrastructure, dev-ops, and data science.

topics (required)

A topic helps in routing messages. This is especially useful if you use services that route based on Topics: Kafka, Kinesis, etc.

At the minimum, a message should contain a log or event topic.

A topic can also be used to scope the message's purpose further, especially if it is an event.

Examples: "topics": ["event", "auth", "login"], "topics": ["event", "auth", "logout"]

It may even be easier to consume messages on the backend if you add the log level in the topics: "topics": ["log", "warn"],, "topics": ["log": "error"].

NOTE: We considered moving level to the topic property, but it’s generally expected that logs have alevel property.

priority (required)

Priority can further help with routing: enabling events to “jump” the processing queue. Different systems may have values that represent different priorities. For example, Microsoft’s message priority, Mail Kit, etc.

Examples: “priority": 1, “priority": 2

message (required)

A simple human-readable message. The message is the final formatted message requiring no further processing.

Examples: “message": "User logged out.”, “message": "Error opening file.”

template (optional)

The template field enables localization for error messages.

An original template such as:

“template”: “Running SQL for ${data.name}: ‘${data.query}’.”

could then be run through a translator resulting in :

“template”: “Ejecutando SQL para ${data.name}: ‘${data.query}’.”.

In these examples, we’re using Javascript’s template literals. The expression contains a string that defines the location in the message where the value should be pulled. The expression data.name will pull the name value from the data object of the message. The expression context.app.env will pull the env value from the appobject which is under the context object.

Further, pulling out the values that change for each message enables the message's reformatting based on the consumer. A developer's error message could be logged and then re-formatted to forward to the user of a front-end application.

transactId (required)

For transactions that span multiple services, an id that followed the transaction to completion is super helpful. The tarnsactId is generated at the transaction source (say a button click by a user in the UI) and propagates throughout the service.

The transactId can be used for debugging and also used to gain insight into the performance aspects of your system.

NOTE: If a transactId has not been provided, the logging service would generate one.

NOTE: A transactIddoes require extra effort on the part of developers. For example, a developer would need to pass a transactId to the client's service via an HTTP header (for example) and pass the id on to other services that may be called, etc.

Examples: "transactId": "d5d98834–443d-4ac4-b3e0–5935fe476b56"

sessionId (optional)

The sessionId A unique id used to manage a session.

Examples: "sessionId": "d9af77b4–1dfe-4a73-a8c2-f0c454e1dc70"

time (required)

The time object contains:

  • format (optional) — The date format. We’re using iso8601 with the UTC offset.
  • created (required) — The date and time the message was created.
  • expires (optional) —The date and time the message should be expired.

Example:

"time": {
"format": "iso8601",
"created": "2020-10-20T04:39:34+0700",
"expires": "2020-12-20T09:00:34+0700"
}

pii (optional)

There may be cases where information within the message is considered personally identifiable information. It isn’t recommended that PII is logged, but there may be cases where you really need to know the email address to debug an issue.

Providing a PII list enables a system to mangle the information based on who is viewing the data.

The PII property contains a string that defines the location in the message where the value should be obfuscated upon output. The expression data.email would obfuscate the email address.

Examples: "pii": ["data.email", "message"] would result in the following output:

{
level: "warning",
message: "***************************",
data: {
"email": "s******r@*******.com"
}
}

schema (optional)

The schema object defines the name and version of the schema that could be applied to the data property when dehydrating the data: turning data into a native object instance. This enables tools like JOI to validate the information located in the data property.

  • name (optional) — The name or type of schema located in the data property.
  • ver (required) — The version of the schema that is located in the data property.

Example:

"schema" : {
"name": "sql",
"ver": "1.0.0"
}

data (optional)

The data property provides an area in the message where you can put any data. For example, you could put any information that varies in the message property in the data and take advantage of the template property.

Example:

"data" : {
"query": "SELECT * FROM types;",
"name": "Alan",
}

context (required)

The context provided a picture of the situation (context) when the message was generated. This is probably the most useful section of the message when trying to track down bugs.

  • app.env (required)— The environment the software was running in when the message was logged. Examples: dev, stage, prod, etc.
  • app.name (optional) — The name of the application/service.
  • app.platform (optional) — The platform the application/service resides in. A platform is an aggregation of applications/services that are installed together.
  • app.file (optional) — The line number the message was sent from.
  • compute.host (optional) — The hostname of the application if it has one.
  • shardId (optional) — If the application runs in a cluster, then the shardId is used to identify a compute instance within the cluster.
  • processId (optional) — The id of the process that was running when the message was sent.
  • hostedBy.name (optional) — The service name or cloud platform name the message was sent from.
  • hostedBy.___(optional) — Properties specific to the host. For example, with AWS, you might include the region, the role the application was running under, etc.

Example:

"context" : {
"app": {
"env": "dev",
"name": "loginService",
"platform": "magicCrm",
"file": "/Users/me/proj/login-service/query.ts",
"line": 34,
},
"compute": {
"host": "localhost",
"shardId" : "9bc423dc-61f2-419c-9f53-8f417a74863e",
"processId": "4545",
},
"hostedBy": {
"name": "aws",
"___": "hosted by specific properties",
},
},

hash (optional)

In some situations, it may be necessary to verify that a message has not been tampered with. For situations like this, use the hash property. NOTE! The hash value should be calculated BEFORE the hash property is added to the message.

  • algo (required)— The algorithm used to generate the hash value. Some example algorithms are Md5, Sha1, Sha256, Sha384, Sha512, Crc32, Crc32b, Gost, Whirlpool, Ripemd160, and Crypt hash.
  • code (required) — The hash code value.

Example:

"hash": {
"algo": "Sha1",
"code": "4d90a79012905e89ea4473258e1874fcce12a593"
}

NOTE: The hash code is generated against a properly formatted JSON message. Property order also matters. As such, deserializing and then serializing the message may change the order of properties invalidating the hash code.

Conclusion

Hopefully, this post has given you some ideas about what types of information you would like to include in your logs.

Creator, entrepreneur, software architect, lecturer, and technologist. https://www.linkedin.com/in/erichosick/