Skip to content

Lowering log levels and outlining a guideline for log levels in general #2434

@celesteking

Description

@celesteking

Hey. Let's have a discussion on log levels over the HK codebase.
My point is that there should be a base "alert the sysadmin" log level established that important messages requiring sysadm attention should be logged with.

For example, mind the following use cases:

  • A plugin connects to some external service, e.g., clamav or SA plugin. Remote service (spamd, clamd, etc.) is unreachable because, e.g., it's down, but it should be up.
  • Sysadmin sends an internalcmd (flush the queue, delete the message, reconfigure, etc...) to HK daemon and gets a reply "check the logs". Command doesn't succeed.
  • A plugin is misbehaving because there's a bug in the code. It doesn't work as planned by the creator.

Those should be logged with ERROR level. This level means something is not right, but server operation continues. Some functionality may be lost, e.g., messages won't be scanned for spam, ES won't receive events, etc.

Next group:

  • Any user-supplied data. Like incorrectly crafted rcpt or envelope addresses. Message headers containing invalid data. Incorrect encoding, line feeds not ending with the correct ending.
  • Anything that is derived from user-supplied data. E.g., failed MX lookup of misspelled recipient domain. A bounce occurring because of regular reasons.
  • User restrictions. Like max message size exceeded, rate limit hit, access restrictions.

These should be logged with NOTICE(?) level.

Next group:

  • Something that affects basic functionality, possibly temporarily, but is very important. E.g., out of disk space situation, a plugin that sources routing/authentication information from redis and the redis server is down.

Log these with CRIT level.

Next group:

  • "Can't start" or abnormal termination events: Can't listen on IP, exception in base code (either on start or while in operation). Same as with CRIT, but permanent (won't resolve by itself).

Log with EMERG or ALERT level (possibly only once, as server probably won't be able to continue running).
Tell me what you think.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions