Inventory & Data¶

What is inventory?

A pyinfra inventory contains hosts, groups and data. Hosts are where pyinfra will make changes (a server via SSH, a Docker container or the local machine). Groups are collections of one or more host. Data can be assigned to both groups and hosts.

Inventory Files¶

Inventory files contain groups of hosts. Groups are defined as a list or tuple. For example, this inventory creates two groups, app_servers and db_servers. Note that group names cannot start _:

app_servers = [
    "app-1.net",
    "app-2.net"
]

db_servers = (["db-1.net", "db-2.net", "db-3.net",], {})

If you save this file as inventory.py, you can then use it when executing pyinfra:

pyinfra inventory.py OPERATIONS...

Autogenerated Groups

A special group, all, will contain every host from the inventory file (whatever the file is called).

All the hosts are added to a group with the name of the inventory file (eg any hosts defined in inventories/production.py belong to group production).

inventories/

Files in the inventories/ directory are not automatically joined during inventory processing, rather this directory is a convention when wanting to store multiple related inventories. Code can be imported from files under inventories/ in to the main inventory.py if desired.

Host Data¶

Data can be assigned to individual hosts in the inventory by using a tuple (hostname, data_dict):

app_servers = [
    ("app-1.net", {"install_postgres": False}),
    ("db-1.net", {"install_postgres": True}),
]

This data can then be used in operations:

from pyinfra import host

if host.data.get("install_postgres"):
    apt.packages(
        packages=["postgresql-server"],
    )

Host Name¶

The name of the host can be accessed via the pyinfra.host.name attribute:

from pyinfra import host

print(host.name)  # prints "app-1.net" for example

Function-based Inventories (alpha)¶

In addition to inventory files, pyinfra can load an inventory from a Python function or module-level iterable passed on the command line as module.path.attribute (or module.path:attribute):

# Call a function that returns a dict of groups
pyinfra myproject.inventory.make_prod OPERATIONS...

# Or point at a module-level list/tuple of hosts
pyinfra myproject.inventory:HOSTS OPERATIONS...

The function must return a dict of group name to either a list of hosts or a (hosts, data) tuple:

# myproject/inventory.py

def make_prod():
    return {
        "app_servers": (
            ["app-1.net", "app-2.net"],
            {"app_user": "myuser", "app_dir": "/opt/pyinfra"},
        ),
        "db_servers": ["db-1.net", "db-2.net"],
    }

group_data/ is not loaded for function-based inventories

The group_data/ directory convention described below is tied to the location of an inventory file. It is not loaded when the inventory comes from a function or module attribute — there is no on-disk inventory to anchor the lookup against.

Provide per-group data via the (hosts, data) tuple in the function's return value, or compose the data inside the function itself (e.g. by reading files, calling APIs, or importing Python modules). The function-based inventory loader is also still in alpha and will log a warning at startup.

Project Layout¶

There is no enforced project structure — pyinfra works with anything from a single deploy.py file to a multi-directory tree. The CLI only auto-loads two things by convention:

group_data/<group>.py (relative to the current directory or the inventory file's directory) — see Group Data Files below
An inventory file under inventories/<name>.py puts all its hosts into a group named <name>

Beyond that, the following layout is what most non-trivial pyinfra projects converge on:

my-project/
├── inventory.py             # or inventories/{dev,prod,...}.py
├── group_data/
│   ├── all.py               # defaults shared by every host
│   └── <group_name>.py      # data scoped to a single group
├── deploy.py                # entrypoint operations (any filename works)
├── tasks/                   # reusable snippets, included via local.include()
│   └── install_postgres.py
├── templates/               # Jinja2 templates rendered by files.template()
│   └── nginx.conf.j2
├── files/                   # static files uploaded via files.put()
│   └── motd
└── my_project/              # your own Python package
    ├── __init__.py
    ├── facts.py             # custom FactBase / ShortFactBase subclasses
    └── operations.py        # custom @operation functions

A few notes:

Custom facts and operations are just Python modules. They aren't auto-discovered — import them where you need them: from my_project.facts import SwapEnabled, then host.get_fact(SwapEnabled). The facts.py / operations.py filenames are convention only.
tasks/, templates/ and files/ are pure convention — they're directories your own code refers to (local.include("tasks/install_postgres.py"), files.template("templates/nginx.conf.j2", ...)). pyinfra doesn't treat them specially.
For shared/reusable deploys (e.g. install Docker, configure Prometheus) the convention is to wrap them in a function decorated with @deploy and package the whole thing as an installable Python package. See Packaging Deploys.

Groups¶

It is often useful to access the list of groups a host belongs to in operation code, this can be done via the pyinfra.host.groups attribute:

from pyinfra import host

if "app_servers" in host.groups:
    server.shell(...)

Group Data Files¶

Group data can be stored in separate files under the group_data directory (there's also a --group-data $DIR flag). Files will be loaded that match group_data/<group_name>.py, and all hosts in any matching group will receive variables defined in the file as data:

app_user = "myuser"
app_dir = "/opt/pyinfra"

The special file group_data/all.py is loaded for the autogenerated all group, so any variables defined there become defaults available on every host. It is the lowest-priority source of data and the natural place for project-wide defaults that individual groups or hosts can later override (see Data Hierarchy below).

These variables can then be used in operations:

from pyinfra import host

git.repo(
    src="git@github.com:Fizzadar/pyinfra.git",
    dest=host.data.app_dir,
    user=host.data.app_user,
)

Note

The group_data directory is relative to the current working directory. This can be changed at runtime via the --chdir flag.

Note

group_data/ is only loaded for file-based inventories. When the inventory is provided as a Python function or module attribute (see Function-based Inventories (alpha)), supply group data via the (hosts, data) tuple in the function's return value instead.

Data Hierarchy¶

The same keys can be defined for host and group data - this means we can set a default in all.py and override it on a group or host basis. When accessing data, the first match in the following is returned:

"Override" data passed in via CLI args
Host data as defined in the inventory file
Normal group data
all group data

Note

pyinfra contains a debug-inventory command which can be used to explore the data output per-host for a given inventory/deploy, ie pyinfra inventory.py debug-inventory.

Connecting with Data¶

Data can be used to configure connectors. For example, setting SSH connection details can be done like so:

ssh_user = "ubuntu"
ssh_key = "~/.ssh/some_key"
ssh_key_password = "password for key"

The Connectors Index contains full details of which data keys are available in each connector.

Global Arguments with Data¶

Data can also provide default values for global arguments, for example:

_sudo = True
_sudo_user = "pyinfra"

This can be set during a deploy by updating host.data:

host.data._sudo_user = 'apache'

External Sources for Data¶

Because pyinfra is configured in Python, you can pull in data from pretty much anywhere just using other Python packages.

Environment Variables¶

config.INHERIT_ENV lets you forward specific environment variables from the machine running pyinfra to all remote operations. This is useful for tools that authenticate via environment variables (SOPS, cloud CLIs, etc.):

# deploy.py
from pyinfra import config

config.INHERIT_ENV = ["SOPS_AGE_KEY_FILE", "AWS_PROFILE", "GITLAB_TOKEN"]

The priority for environment variables passed to operations is (lowest to highest):

config.INHERIT_ENV — inherited from the caller's os.environ
config.ENV — explicitly set globally in the deploy script
_env per-operation argument — overrides for a single operation