Introduction 1
Bazel is an open-source build/test framework similar to Maven, Make, and Gradle.
It features:
- Human-readable, high-level build language
- Fast and reliable via caching
- Scalable
- Extensible for other language or framework
This post is a reading notes about the official documentation on Bazel version 3.4.0. You can skip these intros and jump directly to the sample repo to get started.
Bazel Setup
Follow the instructions here to install the latest release for your system.
Concepts
In general, Bazel builds software from source code organized in a directory called a workspace.
Source files in the workspace are organized in a nested hierarchy of packages.
Each package is a directory containing a set of related source files + one BUILD file for that package.
A simple example of a C++ project structure for one package is shown below:
.
├── README.md
├── WORKSPACE
└── main
├── BUILD
└── hello-world.cc
Workspace
A workspace is a directory containing your source files and symbolic links to other directories that contain the build output.
Have a look at the following project structure after bazel built the target:
.
├── README.md
├── WORKSPACE
├── bazel-bin -> /private/var/tmp/_bazel_mxin/a122b7b4d9e8cf33d3804073143b4e06/execroot/__main__/bazel-out/darwin-fastbuild/bin
├── bazel-out -> /private/var/tmp/_bazel_mxin/a122b7b4d9e8cf33d3804073143b4e06/execroot/__main__/bazel-out
├── bazel-stage1 -> /private/var/tmp/_bazel_mxin/a122b7b4d9e8cf33d3804073143b4e06/execroot/__main__
├── bazel-testlogs -> /private/var/tmp/_bazel_mxin/a122b7b4d9e8cf33d3804073143b4e06/execroot/__main__/bazel-out/darwin-fastbuild/testlogs
└── main
├── BUILD
└── hello-world.cc
Note the new symbolic links created from that build.
Bazel identify a directory as a workspace root by searching for a file named WORKSPACE
or WORKSPACE.bazel
. It may be empty or may contain references to external dependencies required to build the outputs. If both WORKSPACE
and WORKSPACE.bazel
exist, Bazel will ignore the WORKSPACE
file.
If there’s another subdirectory under the workspace root and it contains a file called WORKSPACE, Bazel simply ignores them. In other words, Bazel does not support nested workspaces.
Packages
As mentioned earlier, source files usually organized in nested hierarchy called packages.
Conceptually, a package is
- the primary unit of code organization in a repository
- a collection of logically related files
- a specification of the dependencies among these files
In reality (ps: joking), it is a subdirectory containing a BUILD
or BUILD.bazel
file beneath the workspace root. A package includes all files + all subdirectories beneath the package root, except those themselves contain a BUILD
(or BUILD.bazel
), which become subpackages in this case.
For example, the below directory tree contains two packages: my/app
and its subpackage my/app/test
.
src/my/app/BUILD
src/my/app/app.cc
src/my/app/data/input.txt
src/my/app/tests/BUILD
src/my/app/tests/test.cc
Repositories
In the above introduction of packages, we mentioned repository, so what is it? We know GitHub repos, and it’s a way of organizing source code. Bazel repository is a similar concept.
Bazel defines the root of the main repository as the directory containing the WORKSPACE
file, also called @
.
We can have dependent external repositories like googletest and these external repos are defined in the main repo’s WORKSPACE
file using workspace rules.
Note that external repos are repos themselves, which means they have their own WORKSPACE
file as well! However, these WORKSPACE
files are ignored by Bazel and hence those transitively dependent repos are not added automatically.
Targets
Within a package, we define elements as targets. The name of a target is referred as its label.
Target categories include:
- files
- rules
- package groups (less nemerous)
Files
We can further divide files as:
- Source files
- usually written by the efforts of people and checked in to the repo
- Generated files (or Derived files)
- not checked in to the repo but are generated by the build tool from source files according to specific rules
Rules
A rule specifies the relationship between inputs and outputs and the necessary steps to derive the outputs from the inputs.
Attributes
Each rule has a set of attributes and the applicable attributes for a given rule and the significance/semantics of each attribute are a function of the rule’s class. Each attribute has a name
and a type
.
For example, common attribute types are:
- integers
- label
- list of labels
- string
- list of strings
- output label
- list of output labels
Not all attributes need to be specified in every rule (i.e. some attributes are optional). Attributes thus form a dictionary from keys (names) to optional, typed values.
Below we introduce several common attributes.
name
attribute
Every rule has a name
attribute of type string
and must be syntactically valid target name as explained below (labels section).
In some cases, a rule’s name
is somewhat arbitrary such as for genrules
.
In other cases, the name
is significant. For example, for *_binary
and *_test
rules, the name
attribute determines the produced executable’s name by the build.
srcs
attribute
This attribute has type list of labels
, which means its value, if present, is a list of labels with each being the name of a target that is an input to this rule.
outs
attribute
This attribute has type list of output labels
. It is similar to the srcs
attribute but differs in two significant ways:
- due to the invariant that the outputs of a rule belong to the same package as the rule itself (mentioned earlier), output labels cannot include a package component and must be in one of the “relative” forms (discussed below in the labels section)
- the relationship implied by an (ordinary) label attribute is inverse to that implied by an output label: a rule depends on its
srcs
, whereas a rule is depended on by itsouts
.
The two types of label attributes (
srcs
andouts
) thus assign direction to the edges b/w targets, giving rise to a dependency graph (DAG over targets, a.k.atarget graph
orbuild dependency graph
), which is the domain over which the Bazel Query tool operates.
Inputs
The inputs may be source files, generated files, or even other rules. Allowing generated files as the inputs means outputs of one rule may be the inputs to another rule, thus allowing rule chaining. Allowing other rules to be the inputs of one rule is more complex and language/rule-dependent.
For example, a C++ library rule A
may have another C++ library rule B
as input. The effect of this dependency is that B
’s header files are available to A
during compilation, B
’s symbols are available to A
during linking, and B
’s runtime data is available to A
during execution.
Note that a rule’s inputs may come from another package.
Outputs
The outputs are usually generated files and these files are always belong to the same package as the rule itself.
Class (or Categories)
A rule can be of one of many different kinds or classes based on the output type. Such as rules that produce compiled executables and libs, test executables and other supported outputs.
Package groups
A package group is a set of packages whose purpose is to limit accessibility of certain rules.
It is defined by the package_group
function and does not generate nor consume files.
Labels
As mentioned in the targets intro above, a target’s name is its label and the label uniquely identifies the target.
A typical label in canonical form looks like:
@myrepo//my/app/main:app_binary
Note that @myrepo
is the repo’s identifier.
Usually a label refers to a target in the same repo, and hence we can omit the repo identifier and written it as:
//my/app/main:app_binary
A label starts with //
and consists of two parts separated by a :
:
- package name
- e.g.
my/app/main
in the above example
- e.g.
- target name
- e.g.
app_binary
in the above example
- e.g.
A label’s second part (i.e. the target name) can be omitted if the target name is the same as the last component of the package name. Such short-form labels are just an abbreviation and these two forms are equivalent.
For example, if we have label //my/app:app
, we can also write it as //my/app
.
Quick quiz:
What are the types of the following representations:
my/app
- a package named
my/app
- a package named
//my/app
- a target under
my/app
package, with its label in short-form and target name is assumed to beapp
- a target under
//my/app:app
- a target under
my/app
package, with target nameapp
- a target under
@myrepo//my/app/main:app_binary
- a target under repo
myrepo
, packagemy/app/main
, target nameapp_binary
- a target under repo
We can shorten the label identifier even further! Within the BUILD
file for package my/app
, we can omit the package-name part of labels for this package’s targets, similar to relative paths…
For example, if we have targets //my/app
, //my/app:app_binary
, we can refer to them in the file my/app/BUILD
as
//my/app:app
or//my/app
or:app
orapp
//my/app:app_binary
or:app_binary
orapp_binary
Don’t be confused with all these forms of representations! Remember to be consistent with your styles of using labels.
Usually the colon :
is omitted for file
targets, but retained for rule
targets. This allows us to reference files by their unadorned name relative to the package directory in the package’s BUILD
file, e.g.
generate.cc
testdata/input.txt
If you want to reference targets outside current package in the BUILD
file, you need to refer to them using their complete label.
For example, with another package named my/test
and you want to refer a file in the package my/app
in my/test
’s BUILD
file, you need to use //my/app:generate.cc
.
If you refer to a target with incorrect label, you may get errors like crosses a package boundary
.
Labels starting with @//
are references to the main repo and still work even from external repos.
Therefore @//a/b/c
is different from //a/b/c
when referenced from an external repo. The former refers back to the main repo while the latter looks for target //a/b/c
in the current external repo itself.
Such nuance difference can be especially important when you write rules in the main repo that refer to targets in the main repo, but these rules will be used from external repos.
I know the label syntax is strict, but Bazel intentionally enforces that to many reasons. The precise details can be found here.
The BUILD
files
In the above sections, we discussed packages, targets, labels, build dependency graph abstractly. They are building blocks of Bazel and can be found in a BUILD
file.
A BUILD
file defines a package and is interpreted as a sequential list of statements by using the imperative language called Starlark.
By saying “sequential list”, we emphasize the order
does matter, especially for variables. Variables must be defined before they are used.
In the meantime, the relative order of rule declarations is immaterial and all that matter is which rules were declared and with what value by the time package evaluation completes.
So, in simple BUILD
files that consist only of rule declarations, these declarations can be re-ordered freely without changing the behavior.
Limitations
- no function definition,
for
statements orif
statements to encourage a clean separation b/w code and data- functions should be declared in
.bzl
files instead
- functions should be declared in
- no
*args
or**kwargs
arguments- have to list all the arguments explicitly
- unable to perform arbitrary I/O
- hence the interpretation of
BUILD
files is hermetic i.e. dependent only on a known set of inputs, which is essential for ensuring that builds are reproducible
- hence the interpretation of
- should be written using only ASCII characters
Best practices
- use comments liberally to document the role of each build target, whether or not it is intended for public use and to document the role of the package itself
- since
BUILD
files need to be updated whenever the dependencies of the underlying code change, and are typically maintained by multiple people on a team
- since
Bazel extensions
Bazel extensions are files ending in .bzl
.
As mentioned in the BUILD
file limitations, such files can be used to load new rules, functions or constants. Use load
statement in the BUILD
file to import a symbol from an extension.
E.g. The following code loads the file foo/bar/file.bzl
and add the some_library
symbol to the environment.
load("//foo/bar:file.bzl", "some_library")
load
also supports additional arguments to import multiple symbols.
Limitations of load
statement:
- arguments must be string literals (i.e. no variables)
load
statements must appear at the top-level (i.e. cannot be in function body)- the first argument is a label (discussed above) identifying the
.bzl
file (i.e. a file target). If it is a relative label,- it is resolved w.r.t the package (not directory) containing the
.bzl
file. - it should use a leading
:
- it is resolved w.r.t the package (not directory) containing the
Another typical usage of load
is to assign different names (i.e. aliases) to the imported symbols:
E.g.
load("//foo/bar:file.bzl", library_alias = "some_library")
# multiple symbols and a mix of aliases and regular symbol names
load(":my_rules.bzl", "some_rule", nice_alias = "some_other_rule")
In a .bzl
file, symbols starting with _
are not exported and thus cannot be loaded from another file.
Build rules
Majority of Bazel build rules come in families and grouped by language. For example, cc_binary
, cc_library
and cc_test
are the build rules for C++ binaries, libraries, and tests.
As you can imagine, the naming schema for other languages is similar: with a different prefix that identifying that language. E.g. java_*
for Java. The suffix identifies the feature of that rule:
*_binary
rules build executables. The executable will be put in the build tool’s binary output tree w.r.t the rule’s label, so//my:program
will appear at$(BINDIR)/my/program
.*_test
rules are a specialization of a*_binary
rule and is used for automated testing.- tests return 0 on success
- it can only open files that beneath its runfiles tree at runtime
*_library
rules specify separately-compiled modules in the given programming language. Libraries can depend on other libs, and binaries and tests can depend on libs.
Dependencies
We discussed dependency graph in the above sections, and it models the depends on relationship among targets.
A target A
depends on a target B
if B
is needed by A
at build or execution time.
With the dependency graph defined, we further define a target’s direct dependencies as those direct neighbors in the dependency graph, i.e. targets reachable by a path of length 1 in the DAG. Similarly, a target’s transitive dependencies are those targets on which it depends via a path through the graph.
In the context of builds, there are two types of dependency graphs:
- the graph of actual dependencies
- a target
X
is actually dependent on targetY
if and only ifY
must be present, built and up-to-date in order forX
to be built correctly.- built could mean generated, processed, compiled, linked, archived, compressed, executed, or any other kinds of tasks that routinely occur during a build.
- a target
- the graph of declared dependencies
- a target
X
has a declared dependency on targetY
if and only if there’s a dependency edge fromX
toY
in the package ofX
.
- a target
In order to have a correct build, the actual dependency graph (denoted by Α
) must be a subgraph of the declared dependency graph (denoted by D
) (i.e. every pair of directly-connected nodes in A
must also be directly connected in D
). We therefore say D
is an overapproximation of A
.
What all these mean is that BUILD
file writers should try to make D
as close to A
as possible, and thus every rule must explicitly declare all of its actual direct dependencies to the build system, and no more.
Types of dependencies
Most build rules have 3 attributes for specifying different kinds of generic dependencies: srcs
, deps
, and data
. Other attributes also exist for rule-specific kinds of dependencies e.g. compiler
, resources
, etc.
-
srcs
dependencies- represent files directly consumed by the rule or rules that output source files
-
deps
dependencies- rule pointing to separately-compiled modules providing header files, symbols, libraries, data, etc.
-
data
dependencies-
the build system runs tests in an isolated directory where only files listed as
data
are available -
E.g.
# I need a config file from a directory named env: java_binary( name = "setenv", ... data = glob(["testdata/**"]), ) # I need test data from another directory sh_test( name = "regtest", srcs = ["regtest.sh"], data = [ "//data:file1.txt", "//data:file2.txt", ... ], )
-
Example project
I tried to re-build our previous post Value-Parameterized GTest with Bazel, and you can find the source code here. It’s interesting to compare these two branches (master
branch uses cmake while bazel
branch uses bazel) and appreciate the elegance when we adopted Bazel.