Day 22 : Advanced YAML Syntax #90DaysofDevOps
Ayushi Tiwari
Java Software Developer | Certified Microsoft Technology Associate (MTA)
Advanced YAML Syntax
Documents
A single YAML file can have more than one document. Each document can be interpreted as a separate YAML file which means multiple documents can contain the same/duplicate keys which are not allowed in the same document.
The beginning of a document is denoted by three hyphens —.
A YAML file with multiple documents would look like this, where each new document is indicated by ---.
---
# document 1
codename: YAML
name: YAML ain't markup language
release: 2001
---
# document 2
uses:
- configuration language
- data persistence
- internet messaging
- cross-language data sharing
---
# document 3
company: spacelift
domain:
- devops
- devsecops
tutorial:
- name: yaml
- type: awesome
- rank: 1
- born: 2001
author: omkarbirade
published: true
...
Finally, triple dots are used to end a document without starting a new one ...
Before we learn more about YAML, this is a good time to practice writing your own YAML file. They can be validated here.
Now that we have seen an online YAML parser in action, it’s time we learn about schemas and tags.
Schemas and Tags
Let’s take a moment to consider how YAML will interpret the given document. Is the sequence’s first literal a string or a boolean?
literals:
- true
- random
You are correct if you answer that the first item on the list is a boolean, and you are also correct if you answer that it is a string. The way it is resolved is determined by the YAML schema that the parser has implemented. But what exactly are schemas?
Schemas can be thought of as the way a parser resolves or understands nodes (values) present in a YAML file. There are primarily 3 default schemas:
Note: It is also possible to create your own custom schemas based on the above default schema.
So coming back to the original question, if the parser supports only the basic schema (FailSafe Schema), the first item will be evaluated as a string. Otherwise, it will be evaluated as a boolean.
This leads to the next question: What if we explicitly want a value to be parsed in a specific way?
Let’s say from the same example that we want the first true value to be parsed as a string instead of a boolean, even when the parser uses the JSON or the core schema.
This is where tags come into the picture. Tags can be thought of as types in YAML.?
Even though we explicitly didn’t mention the tags/types in any of the YAML snippets we saw so far, they are inferred automatically by the YAML parser. For instance, the maps have the tag/type as tag:yaml.org,2002:map, sequences are tag:yaml.org,2002:seq and strings are tag:yaml.org,2002:str?
The below snippet works perfectly fine, even when we specify the tags. It can be validated here.
---
# A sample yaml file
company: !!str spacelift
domain:
- !!str devops
- !!str devsecops
tutorial:
- name: !!str yaml
- type: !!str awesome
- rank: !!int 1
- born: !!int 2001
author: !!str omkarbirade
published: !!bool true
We can use these tags to explicitly specify a type. For our example, all we have to do is specify the type as a string, and the YAML parser will parse it as a string.
scalars:
- !!str true
- random
Anchors and Alias
With a lot of configuration, configuration files can become quite large.
In YAML files, anchors (&) and aliases (*) are used to avoid duplication. When writing large configurations in YAML, it is common for a specific configuration to be repeated. For example, the vars config is repeated for all three services in the following YAML snippet.
---
vars:
service1:
config:
env: prod
retries: 3
version: 4.8
service2:
config:
env: prod
retries: 3
version: 4.8
service3:
config:
env: prod
retries: 3
version: 4.8
...
As more and more things are repeated for large configuration files, this becomes tedious.
Anchors and aliases allow us to rewrite the same snippet without having to repeat any configuration.
Anchors (&) are used to define a chunk of configuration, and aliases are used to refer to that chunk at a different part of the configuration.
---
vars:
service1:
config: &service_config
env: prod
retries: 3
version: 4.8
service2:
config: *service_config
service3:
config: *service_config
...
Anchors and aliases here helped us cut down the repeated configuration.
But practically, configurations won’t be completely identical they would vary here and there. For instance, what if all the above services are running on different versions? Does this mean we have re-write and repeat the whole config again?
This is where overrides (<<:) come to the rescue. We can still use aliases and make the changes that we need.
---
vars:
service1:
config: &service_config
env: prod
retries: 3
version: 4.8
service2:
config:
<<: *service_config
version: 5
service3:
config:
<<: *service_config
version: 4.2
...
YAML files treat : , { , } , [ , ] , , , & , * , # , ? , | ,?--?, < , > , = , ! , % , @ , \, etc, as special characters. But what if these special characters are actually a part of the data/value? How do we escape them?
Special characters can be escaped in various different ways:
领英推荐
Entity Escapes
Unicode Escapes
Quoted Escapes
YAML vs JSON
How is YAML different from JSON? Let’s try to figure it out.
Check out the below code snippet of Kubernetes configuration written in JSON. Don’t pay attention to what it does just observe the file.
{
"description": "APIService represents a server for a particular GroupVersion. Name must be \"version.group\".",
"properties": {
"apiVersion": {
"description": "APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources",
"type": [
"string",
"null"
]
},
"kind": {
"description": "Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds",
"type": [
"string",
"null"
],
"enum": [
"APIService"
]
},
"metadata": {
"$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta"
},
"spec": {
"$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.kube-aggregator.pkg.apis.apiregistration.v1beta1.APIServiceSpec",
"description": "Spec contains information for locating and communicating with a server"
},
"status": {
"$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.kube-aggregator.pkg.apis.apiregistration.v1beta1.APIServiceStatus",
"description": "Status contains derived information about an API server"
}
},
"type": "object",
"x-kubernetes-group-version-kind": [
{
"group": "apiregistration.k8s.io",
"kind": "APIService",
"version": "v1beta1"
}
],
"$schema": "https://json-schema.org/schema#"
}
Doesn’t it look like a pure JSON file? Let’s see if we can validate it in our YAML parser.
It’s odd that the YAML parser didn’t report the file as invalid. Does this imply that JSON is also YAML?
YAML is, in fact, a superset of JSON. All JSON files are valid YAML files, but not the other way around.
Can we combine JSON and YAML? Is it still a valid YAML file? Let’s put this hypothesis to the test. Let us change some of the above snippet to make it look more like the YAML we are familiar with ??
description: "APIService represents a server for a particular GroupVersion. Name must be \"version.group\"."
"properties": {
"apiVersion": {
"description": "APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources",
"type": [
"string",
"null"
]
},
"kind": {
"description": "Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds",
"type": [
"string",
"null"
],
"enum": [
"APIService"
]
},
"metadata": {
"$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta"
},
"spec": {
"$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.kube-aggregator.pkg.apis.apiregistration.v1beta1.APIServiceSpec",
"description": "Spec contains information for locating and communicating with a server"
},
"status": {
"$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.kube-aggregator.pkg.apis.apiregistration.v1beta1.APIServiceStatus",
"description": "Status contains derived information about an API server"
}
}
"type": "object"
"x-kubernetes-group-version-kind": [
{
"group": "apiregistration.k8s.io",
"kind": "APIService",
"version": "v1beta1"
}
]
"$schema": "https://json-schema.org/schema#"
Notice that there isn’t a root JSON wrapper {} anymore, there are just maps at the root level, but most of it is still JSON. Validate the file once more in a YAML parser. It is a valid YAML file, but when we try to validate it in a JSON parser, it says it is invalid. That’s because the file is no longer JSON, but rather YAML. This demonstrates that YAML is, in fact, the superset of JSON.
Where is YAML Used?
We learned a lot about YAML and saw that it works great as a configuration language. Let us see it in action with some of the most famous tools.
Ansible
Ansible playbooks are used to automate repeated tasks that execute actions automatically.
Playbooks are expressed in YAML format and perform any action defined in plays.
Here is a simple Ansible playbook that installs Nginx, applies the specified template to replace the existing default Nginx landing page, and finally enables TCP access on port 80.
---
- hosts: all
become: yes
vars:
page_title: Spacelift
page_description: Spacelift is a sophisticated CI/CD platform for Terraform, CloudFormation, Pulumi, and Kubernetes.
tasks:
- name: Install Nginx
apt:
name: nginx
state: latest
- name: Apply Page Template
template:
src: files/spacelift-intro.j2
dest: /var/www/html/index.nginx-debian.html
- name: Allow all access to tcp port 80
ufw:
rule: allow
port: '80'
proto: tcp
Kubernetes
Kubernetes, also known as K8s, is an open-source system for automating the deployment, scaling, and management of containerized applications.
Kubernetes works based on a state model where it tries to reach the desired state from the current state in a declarative way. Kubernetes uses YAML files to define the Kubernetes object, which is applied to the cluster to create resources like pods, services, and deployments.
Here is a YAML file that describes a deployment that runs Nginx.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 2 # tells deployment to run 2 pods matching the template
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
Interesting Things About YAML
YAML works great as a configuration language, but it is important to be aware of certain challenges as well when using it.
The curious case of the Norway problem
Imagine listing the abbreviation of all the countries where it snows
countries:
- GB # Great britain
- IE # Ireland
- FR # France
- DE # Denmark
- NO # Norway
All looks good, right? But when you try to read this YAML file in python, we see NO being read False instead of ‘NO’
>>> from pyyaml import load
>>> load(the_configuration)
{'countries': ['GB', 'IE', 'FR', 'DE', False]}
So why does this happen?
Remember the core schema which interprets NULL | null the same way? The same schema interprets FALSE | F | NO the same way. So instead of parsing NO as a string, it parses it as a boolean. This is can be easily solved by quoting NO.
countries:
- GB # Great Britain
- IE # Ireland
- FR # France
- DE # Denmark
- 'NO' # Norway
But instead, to avoid any such kinds of surprises, we can use StrictYAML, which parses everything as a string by default.