YAML Tutorial: Everything You Need to Get Started in Minutes

Written by: Erik Francis
8 min read

YAML Ain't Markup Language (YAML) is a data serialization language that is consistently listed as one of the most popular programming languages. It's often used as a format for configuration files, but its object serialization abilities make it a viable replacement for languages like JSON. This YAML tutorial will demonstrate the language syntax with a guide and some simple coding examples in Python. YAML has broad language support and maps easily into native data structures. It's also easy for humans to read, which is why it's a good choice for configuration. The YAML acronym was shorthand for Yet Another Markup Language. But the maintainers renamed it to YAML Ain't Markup Language to place more emphasis on its data-oriented features.

YAML Tutorial Quick Start: A Simple File

Let's take a look at a YAML file for a brief overview.

---
 doe: "a deer, a female deer"
 ray: "a drop of golden sun"
 pi: 3.14159
 xmas: true
 french-hens: 3
 calling-birds:
   - huey
   - dewey
   - louie
   - fred
 xmas-fifth-day:
   calling-birds: four
   french-hens: 3
   golden-rings: 5
   partridges:
     count: 1
     location: "a pear tree"
   turtle-doves: two

The file starts with three dashes. These dashes indicate the start of a new YAML document. YAML supports multiple documents, and compliant parsers will recognize each set of dashes as the beginning of a new one. Next, we see the construct that makes up most of a typical YAML document: a key-value pair. The “doe” is a key that points to a string value: “a deer, a female deer”. YAML supports more than just string values. The file starts with six key-value pairs. They have four different data types. “doe” and “ray” are strings. The “pi” is a floating-point number. The “xmas” is a boolean. The “french-hens” is an integer. You can enclose strings in single(‘) or double-quotes(“) or no quotes at all. YAML recognizes unquoted numerals as integers or floating point. The seventh item is an array. The “calling-birds” has four elements, each denoted by an opening dash. I indented the elements in “calling-birds” with two spaces. Indentation is how YAML denotes nesting. The number of spaces can vary from file to file, but tabs are not allowed. We'll look at how indentation works below. Finally, we see “xmas-fifth-day”, which has five more elements inside it, each of them indented. We can view “xmas-fifth-day” as a dictionary that contains two string, two integers, and another dictionary. YAML supports nesting of key-values, and mixing types. Before we take a deeper dive, let's look at how this document looks in JSON. I'll throw it in this handy JSON to YAML converter.

{
  "doe": "a deer, a female deer",
  "ray": "a drop of golden sun",
  "pi": 3.14159,
  "xmas": true,
  "french-hens": 3,
  "calling-birds": [
     "huey",
     "dewey",
     "louie",
     "fred"
  ],
  "xmas-fifth-day": {
  "calling-birds": "four",
  "french-hens": 3,
  "golden-rings": 5,
  "partridges": {
    "count": 1,
    "location": "a pear tree"
  },
  "turtle-doves": "two"
  }
}

JSON and YAML have similar capabilities, and you can convert most documents between the formats.

Outline Indentation and Whitespace

Whitespace is part of YAML's formatting. Unless otherwise indicated, newlines indicate the end of a field. You structure a YAML document with indentation. The indentation level can be one or more spaces. The specification forbids tabs because tools treat them differently. Consider this document. The items inside are indented with two spaces.

foo: bar
pleh: help
stuff:
  foo: bar
  bar: foo

Let's take a look at how a simple python script views this document. We'll save it as a file named foo.yaml. The PyYAML package will map a YAML file stream into a dictionary. We'll iterate through the outermost set of keys and values and print the key and the string representation of each value. You can find a processor for your favorite platform here.

Import yaml 

from yaml import load
try:
    from yaml import CLoader as Loader
except ImportError:
    from yaml import Loader

if __name__ == '__main__':

    stream = open("foo.yaml", 'r')
    dictionary = yaml.load(stream)
    for key, value in dictionary.items():
        print (key + " : " + str(value))

The output is:

foo : bar
pleh : help
stuff : {'foo': 'bar', 'bar': 'foo'}

When we tell python to print a dictionary as a string, it uses the inline syntax we'll see below. We can see from the output that our document is a python dictionary with two strings and another dictionary nested inside it. YAML's simple nesting gives us the power to build sophisticated objects. But that's only the beginning.

Comments

Comments begin with a pound sign. They can appear after a document value or take up an entire line.

___
# This is a full line comment
foo: bar # this is a comment, too

Comments are for humans. YAML processors will discard them.

YAML Data Types

Values in YAML's key-value pairs are scalar. They act like the scalar types in languages like Perl, Javascript, and Python. It's usually good enough to enclose strings in quotes, leave numbers unquoted, and let the parser figure it out. But that's only the tip of the iceberg. YAML is capable of a great deal more.

Key-Value Pairs and Dictionaries

The key-value is YAML's basic building block. Every item in a YAML document is a member of at least one dictionary. The key is always a string. The value is a scalar so that it can be any datatype. So, as we've already seen, the value can be a string, a number, or another dictionary.

Numeric types

YAML recognizes numeric types. We saw floating point and integers above. YAML supports several other numeric types. An integer can be decimal, hexadecimal, or octal.

---
 foo: 12345
 bar: 0x12d4
 plop: 023332

Let's run our python script on this document.

foo : 12345
bar : 4820
plop : 9946

As you expect, Ox indicates a value is hex, and a leading zero denotes an octal value. YAML supports both fixed and exponential floating point numbers.

---
foo: 1230.15
bar:  12.3015e+05

When we evaluate these entries we see:

foo : 1230.15
bar : 1230150.0

Finally, we can represent not-a-number (NAN) or infinity.

---
foo: .inf
bar: -.Inf
plop: .NAN

The value of foo is infinity. Bar is negative infinity, and plop is NAN.

Strings

YAML strings are Unicode. In most situations, you don't have to specify them in quotes.

---
foo: this is a normal string

Our test program processes this as:

foo: this is a normal string

But if we want escape sequences handled, we need to use double quotes.

---
foo: "this is not a normal string\n"
bar: this is not a normal string\n

YAML processes the first value as ending with a carriage return and linefeed. Since the second value is not quoted, YAML treats the \n as two characters.

foo: this is not a normal string
bar: this is not a normal string\n

YAML will not escape strings with single quotes, but the single quotes do avoid having string contents interpreted as document formatting. String values can span more than one line. With the fold (greater than) character, you can specify a string in a block.

bar: >
  this is not a normal string it
  spans more than
  one line
  see?

But it's interpreted without the newlines.

bar : this is not a normal string it spans more than one line see?

The block (pipe) character has a similar function, but YAML interprets the field exactly as is.

bar: |
  this is not a normal string it
  spans more than
  one line
  see?

So, we see the newlines where they are in the document.

bar : this is not a normal string it
spans more than
one line
see?

Nulls

You enter nulls with a tilde(~) or the unquoted null string literal.

---
foo: ~
bar: null

Our program prints:

foo : None
bar : None

Python's representation for null is None.

Booleans

YAML indicates boolean values with the keywords True, On and Yes for true. False is indicated with False, Off, or No.

---
foo: True
bar: False
light: On
TV: Off

Arrays

You can specify arrays or lists on a single line.

---
items: [ 1, 2, 3, 4, 5 ]
names: [ "one", "two", "three", "four" ]

Or, you can put them on multiple lines.

---
items:
  - 1
  - 2
  - 3
  - 4
  - 5
names:
  - "one"
  - "two"
  - "three"
  - "four"

The multiple line format is useful for lists that contain complex objects instead of scalars.

___
items:
  - things:
      thing1: huey
      things2: dewey
      thing3: louie
  - other things:
      key: value

An array can contain any valid YAML value. The values in a list do not have to be the same type.

Dictionaries

We covered dictionaries above, but there's more to them. Like arrays, you can put dictionaries inline. We saw this format above. It's how python prints dictionaries.

---
foo: { thing1: huey, thing2: louie, thing3: dewey }

We've seen them span lines before.

---
foo: bar
bar: foo

And, of course, they can be nested and hold any value.

---
foo:
  bar:
    - bar
    - rab
    - plop

Advanced Options

Chomp Modifiers

Multiline values may end with whitespace, and depending on how you want the document to be processed you might not want to preserve it. YAML has the strip chomp and preserve chomp operators. To save the last character, add a plus to the fold or block operators.

bar: >+
  this is not a normal string it
  spans more than
  one line
  see?

So, if the value ends with whitespace, like a newline, YAML will preserve it. To strip the character, use the strip operator.

bar: |-
  this is not a normal string it
  spans more than
  one line
  see?

Multiple documents

A document starts with three dashes and ends with three periods. Some YAML processors require the document start operator. The end operator is usually optional. For example, Java's Jackson will not process a YAML document without the start, but Python's PyYAML will. You'll usually use the end document operator when a file contains multiple documents. Let's modify our python code.

import yaml

from yaml import load
try:
    from yaml import CLoader as Loader
except ImportError:
    from yaml import Loader

if __name__ == '__main__':
    stream = open("foo.yaml", 'r')
    dictionary = yaml.load_all(stream, Loader)

    for doc in dictionary:
        print("New document:")
        for key, value in doc.items():
            print(key + " : " + str(value))
            if type(value) is list:
                print(str(len(value)))

PyYAML's load_all will process all of the documents in a stream. Now, let's process a compound document with it.

---
bar: foo
foo: bar
...
---
one: two
three: four

The script finds two YAML documents.

New document:
bar : foo
foo : bar
New document:
one : two
three : four

Conclusion

YAML is a powerful language that can be used for configuration files, messages between applications, and saving application state. We covered its most commonly used features, including how to use the built-in datatypes and structure complex documents. Some platforms support YAML's advanced features, including custom data types.

CloudBees provides numerous blog posts that discuss the benefits of using YAML to manage application and configuration data. If you're interested in learning more about YAML and its applications, we invite you to explore the many blog posts CloudBees has created on YAML-related technologies. Below are some of the blog posts discussing some of the essential YAML-related technologies:

These blog posts detail how YAML can help manage complex application and configuration data, providing a deeper understanding of the language and its benefits.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.