Buck: What Makes Buck so Fast?

What Makes Buck so Fast?

Buck exploits a number of strategies to reduce build times.

Buck builds dependencies in parallel

Buck is designed so that any input files required by a build target must be specified in the build rule for that target. Therefore, we know that the directed acyclic graph (DAG) that Buck constructs from the build rule is an accurate reflection of the build's dependencies, and that once a rule's dependencies are satisfied, the target for that rule can be built.

Having a DAG makes it straightforward for rules to be built in parallel, which can dramatically reduce build times. Buck starts with the leaf nodes of the graph, that is, targets that have no dependencies. Buck adds these to a queue of targets to build. When a thread is available, Buck removes a target from the front of the queue and builds it. Assuming the target builds successfully, Buck notifies all of the rules that depend on that target. When all of a rules dependencies have been satisfied, Buck adds that rule's target to the build queue. Computation proceeds in this manner until all of the nodes in the graph have been built. This execution model means that breaking modules into finer dependencies creates opportunities for increased parallelism, which improves throughput.

Graph enhancement increases rule granularity

Frequently, the granularity at which users declare build rules is different from the granularity at which we want the build system to model them. Users want coarse-grained rules, such as android_binary, for simplicity. However, the build system wants fine-grained rules, such as aapt package and dex merge, that allow for parallelism and more granular caching (see caching below).

Internally, Buck uses a mechanism called graph enhancement which transforms the target graph, specified by the build rules, into an action graph, which is the DAG that Buck actually uses for building. Graph enhancement can add new synthetic rules to break a monolithic task, such as android_binary into independent subtasks, each of which might have only a subset of the original task's dependencies. So, for example, dex merging would not depend on running a full aapt package. Graph enhancement can also move dependency edges, so that compiling Android libraries does not depend on dexing their dependencies.

Note: Adding or removing dependencies from your build causes Buck to rebuild the action graph, which can significantly increase the time required for your next build. However, changing the contents of a dependency, such as a source file, does not cause Buck to rebuild the action graph.

Buck caches build artifacts

A build rule—together with other aspects of the build environment—specify all of the inputs that can affect the rule's output. Therefore, we can combine that information into a hash that represents the totality of those inputs. This hash is used as a cache key where the associated value in the cache is the output produced by the rule. (See .buckconfig for information on how to set up a cache.) All of the following information contribute to the cache key for a build rule:

  • The values of the arguments used to define the build rule in the build file.
  • The contents of any file arguments for the build rule.
  • The cache key for each of the rule's deps.
  • The version of Buck being used to build the rule. (This means that upgrading Buck to a new version invalidates all of the cache keys generated by the old version.)
  • The components of the toolchain that are used to build the rule, including their configurations, such as compiler or linker flags.

When Buck determines whether to build the target for a rule, the first thing it does is compute the cache key for the rule. If there is a hit in any of the caches specified in .buckconfig, Buck fetches the rule's output from the cache instead of building the rule locally. For outputs that are expensive to compute, this is a substantial savings. This caching can also make it fast to rebuild when switching between branches in a DVCS such as Git or Mercurial—assuming that relatively few files differ between branches.

If you are using a continuous integration (CI) system, such as Jenkins, you should configure your CI builds to populate a cache that can be read by local builds. That way, when a developer syncs to a revision that has already been built on your CI system, a local build with buck build can pull build artifacts from the cache. In order for this strategy to work, the cache key computed by Buck on the CI system must match the key computed by Buck on the developer's local computer.

The importance of deterministic builds

In order to take full advantage of caching, all the factors that affect the output of the build should be kept safe from unintended changes. The build should be deterministic in the sense that it should reliably produce identical output across different build servers or different developers' computers. For this reason, we recommend that you put the Buck configuration file, .buckconfig, under source control. In addition, we recommend that you also put the components of the toolchains used to build the outputs, such as compilers and linkers, under source control; this way, you can ensure that all developers on the project are using the same versions of these tools.

Note that Buck reparses the build files in a project if it detects certain changes in the build's configuration. Such a configuration change could be a change in the .buckconfig file itself or the result of specifying the --config command-line option.

If a Java library's API doesn't change, code that uses the library doesn't need to be rebuilt

Developers often modify Java code in ways that do not affect the code's externally-visible API. For example, adding or removing private methods, or modifying the implementation of existing methods—regardless of their visibility—does not change the API exposed by the Java file.

When Buck builds a java_library rule, it also computes its API. Normally, modifying a private method in a java_library would cause it and all rules that depend on it to be rebuilt because the change in cache keys would propagate up the DAG. However, Buck has special logic for a java_library where, if the .java input files have not changed since the previous build, and the API for each of its Java dependencies has not changed since the previous build, then the java_library will not be recompiled. This is valid because we know that neither the input .java files nor the API against which they would be compiled has changed, so the result would be the same if the rule were rebuilt. This localizes how much Java code needs to be recompiled in response to a change, again reducing build times.

Rules can calculate their own "ABI" keys

As a generalization of the Java library API optimization, every rule type has the freedom to determine whether or not to rebuild itself based on information about the state of its dependencies. For example, when editing a file in an android_resource rule, we don't need to recompile all dependent resources and libraries if the set of exposed symbols doesn't change (for example, if we just changed a padding value). If we recompile an android_library due to a dependency change, but the resulting classes are identical, we don't need to re-run DX.

This mechanism is fairly general. When the build engine is preparing to build a rule, in addition to the normal cache key, it generates a key that excludes the keys of the dependencies. This is combined with a key that the rule generates by hashing whatever parts of its dependencies it considers "visible". Usually, the dependency will help with this process by outputting the relevant information (like the Java API or hash of all classes) to a single small file. If both keys match the values from the last build, then there is no need to rebuild.

Note that this optimization is currently separate from the distributed cache. We'd like to combine them so that the cache can be used to fetch rules built by a continuous integration server as long as the source files and visible parts of the dependencies match.

Buck prefers to use first-order dependencies

By default, Buck uses first-order dependencies when compiling Java. This means that compilation can only see explicitly declared dependencies, not other libraries that your dependencies depend on.

We recommend keeping the default, however. First-order dependencies dramatically shrink the set of APIs that your library is exposed to, which dramatically reduces the scope of changes that will force your library to be rebuilt.

Fast Dex merging for Android

Other build tools use also Android's DX merge support to merge your main program's dex file with third-party libraries. However, Buck's support for fine-grained libraries allows dex merging to work at a much finer granularity.

Buck also includes a customized version of DX that includes significant performance improvements. It uses a faster algorithm for merging many dex files. It also has support for running multiple copies of DX concurrently within a single long-lived buckd process, which eliminates most of DX's start-up time.

As a result, when editing a small module and performing an incremental build, we frequently see less than 1 second spent generating classes.dex.

Dependency files to trim overspecified inputs

Buck's low-level build rules specify all inputs—such as source files or outputs from other build rules—that may be used when the build rule is executed, so that changes to any of these inputs result in a new cache key and therefore trigger rebuilds. However, in practice, it's not uncommon for these build rules to over-specify their inputs. A good example is the C/C++ compilation rules Buck that generates. C/C++ compilation rules specify as inputs all headers found from the transitive closure of C/C++ library dependencies, even though in many cases only a small subset of these headers are actually used. For example, a C/C++ source might use only one of many headers exported by a C/C++ library dependency. However, there's not enough information available before running the step to know if any given input is used, and so all inputs must be considered, which can lead to unnecessary rebuilding.

In some cases, we can figure out the exact subset of the listed inputs which were actually used after the build. In C/C++, compilers such as gcc provide a -M option which produces a dependency file which identifies the exact headers that were actually used during compilation. For supported rules, Buck uses this dependency file before the build, to try avoid an unnecessary rebuild:

  • If no dependency file is initially available before the build, Buck runs the rule as normal and produces a dependency file for, which lists the inputs that were used. The dependency file is then available for subsequent builds.
  • If the dependency file is available before the build, Buck reads the file and uses it to filter out unused inputs when constructing it's rule key.

Note that dependency files are used only if the standard rule key—which considers all inputs—doesn't match.