Inside Terraform: tfdiags - Error handling in Terraform

Disclaimer: I am working at HashiCorp (now IBM) as part of the Terraform Core team. The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.
Since I am involved in Terraform my opinions can sometimes be (unconsciously) biased. I hope you enjoy the post anyway.

This is part of my Inside Terraform series where I deep dive into different parts of Terraform and explain how they work under the hood. This is the fourth post in the series and it’s on a quite important topic: Error handling in Terraform. The topic does not require you to read the previous posts, this topic is mostly self-contained.

What is `tfdiags`?

The tfdiags package is responsible for handling diagnostics (errors & warnings) in Terraform. We use this package instead of the standard error interface in Go because we need more context and structure around errors in Terraform.

The package includes

types to represent diagnostics
helpers to enhance diagnostics with additional context (e.g. source code locations)
serializers to convert diagnostics to different formats (e.g. JSON for the CLI output or RPC friendly formats)
utilities to aggregate multiple diagnostics into a single error value
display helpers to format diagnostics for human consumption

Why do we need `tfdiags`?

Terraform is a programming language and therefore needs a robust error handling system. There are a lot of (sometimes unexpected) ways one can misconfigure Terraform and we need to have a reliable way to explain to the user what went wrong and how to fix it.

We also have a lot of different sources of errors / warnings to handle:

Providers can return errors when they fail to communicate with the underlying API or when the user misconfigures a resource.
Providers can have bugs that we need to handle gracefully.
The language itself can have errors when the user writes invalid code (e.g. syntax errors, type errors, reference errors, etc.).
The CLI can have errors when the user provides invalid input (e.g. invalid command line arguments, invalid configuration files, etc.).
The state management system can have errors when the state is corrupted or when there are conflicts between the state and the configuration.

All of this requires a structured way to represent and handle errors, which is where tfdiags comes in.

flowchart LR
H["HCL Parsing and Validation"] --> H1[hcl.Diagnostics] --> D["tfdiags.Diagnostics"]
P["Providers"] --> D
L["Language Validation"] --> D
CLI["CLI Input"] --> D
ST["State Management"] --> D

D -->|"Serialize"| JSON["JSON Output"]
D -->|"Serialize"| RPC["RPC Format"]
D -->|"Format"| HUM["Human-readable CLI Output"]

How to use `tfdiags`

As a general guideline, we almost always want to return tfdiags.Diagnostics from functions instead of the standard error interface and initiate a var diags tfdiags.Diagnostics in methods that we use diags = diags.Append(myNewDiag) to fill and that we return in the end. Append does a lot of heavy lifting in unwrapping / normalizing different diagnostic types and aggregating them into a single Diagnostics slice.

Another common pattern is first appending diagnostics from a called function and then checking the accumulated diagnostics for errors before proceeding:

import "github.com/hashicorp/terraform/internal/tfdiags"

func MyFunc() tfdiags.Diagnostics {
    var diags tfdiags.Diagnostics

    diags = diags.Append(OtherFunc())

    if diags.HasErrors() {
        return diags
    }

    // Continue with function logic if no errors

    return diags
}

Using tfdiags boils down to using the right error for the right job. The following diagram can help you decide which diagnostic type to use:

flowchart TD
START["Need to create a diagnostic"]
START -->|"No connection to source code?"| SL["tfdiags.Sourceless"]
START -->|"Error in a specific attribute / block?"| AV["tfdiags.AttributeValue + .InConfigBody(...)"]
START -->|"General error with source location?"| HCL["&hcl.Diagnostic (most common)"]

We will go over the ones you will most likely encounter when working with Terraform.

`tfdiags.Sourceless`: Errors with no connection to source code

This is the simplest way of creating an error and should be used if the error is not directly connected with any Terraform code, e.g. when a CLI flag is invalid or some other external error happens (e.g. connecting to a Terraform Cloud instance fails).

import "github.com/hashicorp/terraform/internal/tfdiags"

func MyFunc() tfdiags.Diagnostics {
    var diags tfdiags.Diagnostics

    if someErrorCondition {
        diags = diags.Append(tfdiags.Sourceless(
            tfdiags.Error, // Could also be tfdiags.Warning
            "An error occurred",
            "Detailed description of the error and how to fix it.",
        ))
    }

    return diags
}

`tfdiags.AttributeValue`: Errors connected to a specific attribute in the configuration

You got an error in a specific attribute / block of a top-level block (e.g. resource / variable / action) in the configuration? Using tfdiags.AttributeValue is the way to go. Just make sure to call .InConfigBody(...) on the returned diagnostics to attach the source code location of the attribute.

import (
    "github.com/hashicorp/terraform/internal/tfdiags"
    "github.com/hashicorp/terraform/internal/tfconfig"
)

func ValidateThisConfig(value cty.Value) tfdiags.Diagnostics {
    var diags tfdiags.Diagnostics

    // This would look very different in a real implementation
    // probably we would use cty's Transform / Walk functions to find issues
    // or we get the issues back from the provider.
    errorPaths := findPathsWithErrors()
    
    for _, path := range errorPaths {
        diags = diags.Append(
            tfdiags.AttributeValue(
                tfdiags.Error,
                "Invalid value",
                "The value provided for this attribute is invalid.",
                path, // This is a cty.Path
            )
        )
    }

    return diags
}

// The eval context is part of the graph / node evaluation system in Terraform
// we won't cover it in this post. But probably the next one :)
func (n *myVerySpecialNode) Execute (ctx EvalContext) tfdiags.Diagnostics {
    var diags tfdiags.Diagnostics

    // This takes an hcl.Body and evaluates it according to the schema and context
    // We saw this in the previous post about evaluation.
    configVal, _, valDiags := ctx.EvaluateBlock(n.config, n.schema, nil, n.keyData)

    // Our append and check pattern
    diags = diags.Append(valDiags)
    if diags.HasErrors() {
        return diags
    }

    // Now let's validate the value we got back
    valueDiags := ValidateThisConfig(configVal)
    // We know the diags are connected to the config body so we can attach
    // the source location here
    diags = diags.Append(valueDiags.InConfigBody(n.config, n.Addr.String()))
    // Because these diags are within this node that is connected to only this config we could
    // also call InConfigBody on diags directly. Both ways work in this case. And probably in
    // general as each diagnostic can only "elaborated" once.

    return diags
}

The process of eleborating diagnostics through the InConfigBody method is pretty interesting, here is my recently refactored / improved version, I think it’s a bit easier to understand than the previous one. We basically take the path from the AttributeValue diagnostic and drill into the hcl.Body to find the source location of the attribute that caused the error. Not super important for using tfdiags, but interesting nonetheless.

`&hcl.Diagnostic`: The working horse behind the scenes

In 90% of the cases this is the diagnostic you want to use. It is very flexible, you can add the source location yourself, and you can quite easily add extra information, which will be used to print extra context to the user.

import (
    "github.com/hashicorp/terraform/internal/tfdiags"
    "github.com/hashicorp/hcl/v2"
)


func MyFunc() tfdiags.Diagnostics {
    var diags tfdiags.Diagnostics

    if someErrorCondition {
        // Using a reference here is crucially important, otherwise Append will panic!
        diags = diags.Append(&hcl.Diagnostic{
            Severity: hcl.DiagError, // Could also be hcl.DiagWarning
            Summary:  "An error occurred",
            Detail:   "Detailed description of the error and how to fix it.",
            // Optionally add Subject, Context, and Extra information here
            // Normally you take the subject from the hcl.Expression that causes the
            // error and only fill in Context if you have a broader range to show 
            // (e.g. the whole block the expression is in) 
            
            // Subject is what will be displayed as the underlined code snippet
            Subject: &hcl.Range{
                Filename: "main.tf",
                Start:    hcl.Pos{Line: 10, Column: 5},
                End:      hcl.Pos{Line: 10, Column: 20},
            },
            // Context is the broader range that will be shown around 
            // the subject (not underlined); defaults to Subject if not set
            Context: &hcl.Range{
                Filename: "main.tf",
                Start:    hcl.Pos{Line: 9, Column: 1},
                End:      hcl.Pos{Line: 11, Column: 1},
            },
        })
    }

    return diags
}

Closing Remarks

There is more to tfdiags than this, but these are the most common patterns you will encounter when working with Terraform. I want to give you a quick overview what is also in the package just so you can look into it yourself if you are interested / need it:

tfdiags.Override: method that allows you to change the severity and messages of existing diagnostics or if you want to add extra information. Can be useful when you want to wrap external errors or need to broadly apply e.g. extras.
I haven’t covered extras as a whole. Think of them as little tags you can add to diagnostics to give more context. All extras are defined here. You would want to add an extra if there is a different UI handling you’d like to achieve.
Testing helpers: tfdiags.AssertDiagnosticsMatch and tfdiags.AssertNoDiagnostics are the most common ones and very useful when writing tests for functions that return diagnostics. Which are most of them in Terraform.

In the next post in this series we will look at Expanding for_each and count. This will prepare us for the graph work ahead. Stay tuned!

Inside Terraform: tfdiags - Error handling in Terraform

Content

What is tfdiags?

Why do we need tfdiags?

How to use tfdiags

tfdiags.Sourceless: Errors with no connection to source code

tfdiags.AttributeValue: Errors connected to a specific attribute in the configuration

&hcl.Diagnostic: The working horse behind the scenes