Bazel Primer

Introduction 1

Bazel is an open-source build/test framework similar to Maven, Make, and Gradle.

It features:

This post is a reading notes about the official documentation on Bazel version 3.4.0. You can skip these intros and jump directly to the sample repo to get started.

Bazel Setup

Follow the instructions here to install the latest release for your system.

Concepts

In general, Bazel builds software from source code organized in a directory called a workspace.

Source files in the workspace are organized in a nested hierarchy of packages.

Each package is a directory containing a set of related source files + one BUILD file for that package.

A simple example of a C++ project structure for one package is shown below:

.
├── README.md
├── WORKSPACE
└── main
    ├── BUILD
    └── hello-world.cc

Workspace

A workspace is a directory containing your source files and symbolic links to other directories that contain the build output.

Have a look at the following project structure after bazel built the target:

.
├── README.md
├── WORKSPACE
├── bazel-bin -> /private/var/tmp/_bazel_mxin/a122b7b4d9e8cf33d3804073143b4e06/execroot/__main__/bazel-out/darwin-fastbuild/bin
├── bazel-out -> /private/var/tmp/_bazel_mxin/a122b7b4d9e8cf33d3804073143b4e06/execroot/__main__/bazel-out
├── bazel-stage1 -> /private/var/tmp/_bazel_mxin/a122b7b4d9e8cf33d3804073143b4e06/execroot/__main__
├── bazel-testlogs -> /private/var/tmp/_bazel_mxin/a122b7b4d9e8cf33d3804073143b4e06/execroot/__main__/bazel-out/darwin-fastbuild/testlogs
└── main
    ├── BUILD
    └── hello-world.cc

Note the new symbolic links created from that build.

Bazel identify a directory as a workspace root by searching for a file named WORKSPACE or WORKSPACE.bazel. It may be empty or may contain references to external dependencies required to build the outputs. If both WORKSPACE and WORKSPACE.bazel exist, Bazel will ignore the WORKSPACE file.

If there’s another subdirectory under the workspace root and it contains a file called WORKSPACE, Bazel simply ignores them. In other words, Bazel does not support nested workspaces.

Packages

As mentioned earlier, source files usually organized in nested hierarchy called packages.

Conceptually, a package is

In reality (ps: joking), it is a subdirectory containing a BUILD or BUILD.bazel file beneath the workspace root. A package includes all files + all subdirectories beneath the package root, except those themselves contain a BUILD (or BUILD.bazel), which become subpackages in this case.

For example, the below directory tree contains two packages: my/app and its subpackage my/app/test.

src/my/app/BUILD
src/my/app/app.cc
src/my/app/data/input.txt
src/my/app/tests/BUILD
src/my/app/tests/test.cc

Repositories

In the above introduction of packages, we mentioned repository, so what is it? We know GitHub repos, and it’s a way of organizing source code. Bazel repository is a similar concept.

Bazel defines the root of the main repository as the directory containing the WORKSPACE file, also called @.

We can have dependent external repositories like googletest and these external repos are defined in the main repo’s WORKSPACE file using workspace rules.

Note that external repos are repos themselves, which means they have their own WORKSPACE file as well! However, these WORKSPACE files are ignored by Bazel and hence those transitively dependent repos are not added automatically.

Targets

Within a package, we define elements as targets. The name of a target is referred as its label.

Target categories include:

Files

We can further divide files as:

Rules

A rule specifies the relationship between inputs and outputs and the necessary steps to derive the outputs from the inputs.

Attributes

Each rule has a set of attributes and the applicable attributes for a given rule and the significance/semantics of each attribute are a function of the rule’s class. Each attribute has a name and a type.

For example, common attribute types are:

Not all attributes need to be specified in every rule (i.e. some attributes are optional). Attributes thus form a dictionary from keys (names) to optional, typed values.

Below we introduce several common attributes.

name attribute

Every rule has a name attribute of type string and must be syntactically valid target name as explained below (labels section).

In some cases, a rule’s name is somewhat arbitrary such as for genrules.

In other cases, the name is significant. For example, for *_binary and *_test rules, the name attribute determines the produced executable’s name by the build.

srcs attribute

This attribute has type list of labels, which means its value, if present, is a list of labels with each being the name of a target that is an input to this rule.

outs attribute

This attribute has type list of output labels. It is similar to the srcs attribute but differs in two significant ways:

  1. due to the invariant that the outputs of a rule belong to the same package as the rule itself (mentioned earlier), output labels cannot include a package component and must be in one of the “relative” forms (discussed below in the labels section)
  2. the relationship implied by an (ordinary) label attribute is inverse to that implied by an output label: a rule depends on its srcs, whereas a rule is depended on by its outs.

The two types of label attributes (srcs and outs) thus assign direction to the edges b/w targets, giving rise to a dependency graph (DAG over targets, a.k.a target graph or build dependency graph), which is the domain over which the Bazel Query tool operates.

Inputs

The inputs may be source files, generated files, or even other rules. Allowing generated files as the inputs means outputs of one rule may be the inputs to another rule, thus allowing rule chaining. Allowing other rules to be the inputs of one rule is more complex and language/rule-dependent.

For example, a C++ library rule A may have another C++ library rule B as input. The effect of this dependency is that B’s header files are available to A during compilation, B’s symbols are available to A during linking, and B’s runtime data is available to A during execution.

Note that a rule’s inputs may come from another package.

Outputs

The outputs are usually generated files and these files are always belong to the same package as the rule itself.

Class (or Categories)

A rule can be of one of many different kinds or classes based on the output type. Such as rules that produce compiled executables and libs, test executables and other supported outputs.

Package groups

A package group is a set of packages whose purpose is to limit accessibility of certain rules.

It is defined by the package_group function and does not generate nor consume files.

Labels

As mentioned in the targets intro above, a target’s name is its label and the label uniquely identifies the target.

A typical label in canonical form looks like:

@myrepo//my/app/main:app_binary

Note that @myrepo is the repo’s identifier.

Usually a label refers to a target in the same repo, and hence we can omit the repo identifier and written it as:

//my/app/main:app_binary

A label starts with // and consists of two parts separated by a ::

A label’s second part (i.e. the target name) can be omitted if the target name is the same as the last component of the package name. Such short-form labels are just an abbreviation and these two forms are equivalent.

For example, if we have label //my/app:app, we can also write it as //my/app.


Quick quiz:

What are the types of the following representations:


We can shorten the label identifier even further! Within the BUILD file for package my/app, we can omit the package-name part of labels for this package’s targets, similar to relative paths…

For example, if we have targets //my/app, //my/app:app_binary, we can refer to them in the file my/app/BUILD as

Don’t be confused with all these forms of representations! Remember to be consistent with your styles of using labels.

Usually the colon : is omitted for file targets, but retained for rule targets. This allows us to reference files by their unadorned name relative to the package directory in the package’s BUILD file, e.g.

generate.cc
testdata/input.txt

If you want to reference targets outside current package in the BUILD file, you need to refer to them using their complete label.

For example, with another package named my/test and you want to refer a file in the package my/app in my/test’s BUILD file, you need to use //my/app:generate.cc.

If you refer to a target with incorrect label, you may get errors like crosses a package boundary.

Labels starting with @// are references to the main repo and still work even from external repos.

Therefore @//a/b/c is different from //a/b/c when referenced from an external repo. The former refers back to the main repo while the latter looks for target //a/b/c in the current external repo itself.

Such nuance difference can be especially important when you write rules in the main repo that refer to targets in the main repo, but these rules will be used from external repos.

I know the label syntax is strict, but Bazel intentionally enforces that to many reasons. The precise details can be found here.

The BUILD files

In the above sections, we discussed packages, targets, labels, build dependency graph abstractly. They are building blocks of Bazel and can be found in a BUILD file.

A BUILD file defines a package and is interpreted as a sequential list of statements by using the imperative language called Starlark.

By saying “sequential list”, we emphasize the order does matter, especially for variables. Variables must be defined before they are used.

In the meantime, the relative order of rule declarations is immaterial and all that matter is which rules were declared and with what value by the time package evaluation completes.

So, in simple BUILD files that consist only of rule declarations, these declarations can be re-ordered freely without changing the behavior.

Limitations

Best practices

Bazel extensions

Bazel extensions are files ending in .bzl.

As mentioned in the BUILD file limitations, such files can be used to load new rules, functions or constants. Use load statement in the BUILD file to import a symbol from an extension.

E.g. The following code loads the file foo/bar/file.bzl and add the some_library symbol to the environment.

load("//foo/bar:file.bzl", "some_library")

load also supports additional arguments to import multiple symbols.

Limitations of load statement:

Another typical usage of load is to assign different names (i.e. aliases) to the imported symbols:

E.g.

load("//foo/bar:file.bzl", library_alias = "some_library")

# multiple symbols and a mix of aliases and regular symbol names
load(":my_rules.bzl", "some_rule", nice_alias = "some_other_rule")

In a .bzl file, symbols starting with _ are not exported and thus cannot be loaded from another file.

Build rules

Majority of Bazel build rules come in families and grouped by language. For example, cc_binary, cc_library and cc_test are the build rules for C++ binaries, libraries, and tests.

As you can imagine, the naming schema for other languages is similar: with a different prefix that identifying that language. E.g. java_* for Java. The suffix identifies the feature of that rule:

Dependencies

We discussed dependency graph in the above sections, and it models the depends on relationship among targets.

A target A depends on a target B if B is needed by A at build or execution time.

With the dependency graph defined, we further define a target’s direct dependencies as those direct neighbors in the dependency graph, i.e. targets reachable by a path of length 1 in the DAG. Similarly, a target’s transitive dependencies are those targets on which it depends via a path through the graph.

In the context of builds, there are two types of dependency graphs:

In order to have a correct build, the actual dependency graph (denoted by Α) must be a subgraph of the declared dependency graph (denoted by D) (i.e. every pair of directly-connected nodes in A must also be directly connected in D). We therefore say D is an overapproximation of A.

What all these mean is that BUILD file writers should try to make D as close to A as possible, and thus every rule must explicitly declare all of its actual direct dependencies to the build system, and no more.

Types of dependencies

Most build rules have 3 attributes for specifying different kinds of generic dependencies: srcs, deps, and data. Other attributes also exist for rule-specific kinds of dependencies e.g. compiler, resources, etc.

Example project

I tried to re-build our previous post Value-Parameterized GTest with Bazel, and you can find the source code here. It’s interesting to compare these two branches (master branch uses cmake while bazel branch uses bazel) and appreciate the elegance when we adopted Bazel.

  1. https://www.bazel.build