Buck: What Makes Buck so Fast?

What Makes Buck so Fast?

Buck exploits a number of strategies to reduce build times:

A build rule knows all of the inputs that can affect its output

Buck is designed so that anything that can affect the output of a build rule must be specified as an input to the build rule: hidden state is not allowed. (This is also important for ensuring that results are consistent and reproducible for all developers.) Therefore, we can be sure that once a rule's deps are satisfied, the rule itself can be built. This gives us confidence that the DAG that results from build rules and their deps is true: all dependencies are captured in the graph.

Having a DAG makes it straightforward for rules to be built in parallel, which can dramatically reduce build times. The execution model for Buck is very simple: starting with the leaf nodes of the graph, add them to a queue of rules to be built. When a thread is available, a rule is removed from the queue, and built. Assuming it is built successfully, it notifies all of the rules that depend on it that it is done. When a rule gets such a notification, it checks whether all its dependencies have been satisfied, and if so, it gets added to the queue. Computation proceeds in this manner until all of the nodes in the graph have gone through the queue. Therefore, breaking modules into finer dependencies creates opportunities for increased parallelism, improving throughput.

Buck can store the outputs it generates in a cache

A build rule knows all of the inputs that can affect its output, and therefore it can combine that information into a hash that represents the total input. This hash is used as a cache key where the associated value in the cache is the output produced by the rule. (See .buckconfig for information on how to set up a cache.) The following information contributes to the cache key for a build rule:

  • The values of the arguments used to define the build rule in the build file.
  • The contents of any file arguments for the build rule.
  • The version of Buck being used to build the rule. (This means that upgrading Buck to a new version invalidates all of the cache keys generated by the old version.)
  • The cache key for each of the rule's deps.

When Buck begins to build a build rule, the first thing it does is compute the cache key for the rule. If there is a hit in any of the caches specified in .buckconfig, then it will fetch the rule's output from the cache instead of building the rule locally. For outputs that are expensive to compute, this is a substantial savings. It also makes it fast to rebuild when switching between branches in a DVCS such as Git or Mercurial.

Because Buck uses the cache key to determine whether to rebuild a rule, you should never have to run buck clean. If anything that could affect the output of the rule changes, then the cache key should change, as well. Because the change in input will cause a cache miss, Buck will rebuild the rule, overwriting its old outputs. Since out-of-date outputs are guaranteed to be overwritten, there is no reason to clean the build.

If you are using some sort of continuous integration (CI) system, you will likely want your CI builds to populate a cache that can be read by your local builds. That way, when a developer syncs to a revision that has already been built on your CI system, running buck build should not build anything locally, as all outputs should be able to be pulled from the cache. This works because the cache key computed by Buck when run on the CI system should match the key computed by Buck on your local machine.

If a Java library's API doesn't change, code that uses the library doesn't need to be rebuilt

Oftentimes, a developer will modify Java code in a way that does not affect its externally-visible API. For example, adding or removing private methods, as well as modifying the implementation of existing methods (regardless of their visibility), does not change the API of a Java file.

When Buck builds a java_library rule, it also computes its API. Normally, modifying a private method in a java_library would cause it and all rules that depend on it to be rebuilt because the change in cache keys would propagate up the DAG. However, Buck has special logic for a java_library where, if the .java input files have not changed since the previous build, and the API for each of its Java dependencies has not changed since the previous build, then the java_library will not be recompiled. This is valid because we know that neither the input .java files nor the API against which they would be compiled has changed, so the result would be the same if the rule were rebuilt. This localizes how much Java code needs to be recompiled in response to a change, again reducing build times.

Rules can calculate their own "ABI" keys

As a generalization of the Java library API optimization, every rule type has the freedom to determine whether or not to rebuild itself based on information about the state of its dependencies. For example, when editing a file in an android_resource rule, we don't need to recompile all dependent resources and libraries if the set of exposed symbols doesn't change (for example, if we just changed a padding value). If we recompile an android_library due to a dependency change, but the resulting classes are identical, we don't need to re-run DX.

This mechanism is fairly general. When the build engine is preparing to build a rule, in addition to the normal cache key, it generates a key that excludes the keys of the dependencies. This is combined with a key that the rule generates by hashing whatever parts of its dependencies it considers "visible". Usually, the dependency will help with this process by outputting the relevant information (like the Java API or hash of all classes) to a single small file. If both keys match the values from the last build, then there is no need to rebuild.

Note that this optimization is currently separate from the distributed cache. We'd like to combine them so that the cache can be used to fetch rules built by a continuous integration server as long as the source files and visible parts of the dependencies match.

Buck prefers to use first-order dependencies

By default, Buck uses first-order dependencies when compiling Java. This means that compilation can only see explicitly declared dependencies, not other libraries that your dependencies depend on.

We recommend keeping the default, however. First-order dependencies dramatically shrink the set of APIs that your library is exposed to, which dramatically reduces the scope of changes that will force your library to be rebuilt.

Fast Dex merging for Android

Other build tools use also Android's DX merge support to merge your main program's dex file with third-party libraries. However, Buck's support for fine-grained libraries allows dex merging to work at a much higher granularity.

Buck also includes a customized version of DX that includes significant performance improvements. It uses a faster algorithm for merging many dex files. It also has support for running multiple copies of DX concurrently within a single long-lived buckd process, which eliminates most of DX's start-up time.

As a result, when editing a small module and performing an incremental build, we frequently see less than 1 second spent generating classes.dex.

Graph enhancement for increased rule granularity

Frequently, the granularity at which we expect users to declare build rules is very different from the granularity at which we want the build system to model them. Users want coarse-grained rules for simplicity (like "android_binary"), but the build system wants fine-grained rules (like "aapt package" and "dex merge") to allow for parallelism and fine-grained caching.

Internally, Buck uses a mechanism called "graph enhancement" that allows its internal "action graph" (the DAG used for building) to be different from what the user declared (internally called the "target graph"). Graph enhancement can add new synthetic rules to break a monolithic task (like android_binary) into independent subtasks that might only have a subset of the original dependencies, so dex merging does not depend on running a full aapt package. It can also move dependency edges, so compiling Android libraries does not depend on dexing their dependencies.

Dependency files to trim overspecified inputs

Buck's low-level build rules specify all inputs (e.g. sources, or outputs from other rules) that may be used when the build rule is executed, so that changes to any of these inputs will trigger rebuilds and use a new cache key. However, in practice, it's not uncommon for these build rules to over-specify their inputs. A good example of this is the C/C++ compilation rules Buck generates. C/C++ compilation rules specify all headers found from the transitive closure of C/C++ library dependencies as inputs, even though in almost all cases only a small subset of these headers are actually used (e.g. a C/C++ source may only use one of many headers exported by a C/C++ library dependency). However, there's not enough information available before running the step to know if any given input is used, and so all inputs must be considered, which can lead to substantial unnecessary rebuilding.

In some cases, we can figure out the exact subset of the listed inputs which were actually used after the build. In C/C++, compilers like gcc provide a -M option which produces a dependency file which contains the exact headers that were actually used by during compilation. For supported rules, Buck uses this dependency file before the build, to try avoid an unnecessary rebuild:

  • If no dependency file is initially available before the build, the rule is run as normal and also produces a dependency file for next time, which lists the inputs that were used.
  • If the dependency file is available before the build, Buck will read this in and use this to filter out unused inputs when constructing it's rule key.
It should be noted that dependency files are only used if the standard rule key (which considers all inputs) doesn't match.