Buck: worker_tool()

worker_tool()

This is liable to change in the future.

Some external tools have high startup costs. To amortize that cost over the whole build rather than paying for it in each rule invocation, the worker_tool() rule can be used in conjunction with genrule. Buck will start the external tool once and then reuse it by communicating to it over stdin and stdout using a simple JSON protocol.

A worker_tool rule can be referenced in the cmd parameter of a genrule by using a macro:
$(worker //path/to:target)

Arguments

  • name (required) #

    The name of the rule.

  • exe (required) #

    A build target for a rule that outputs an executable, such as an sh_binary. Buck will only run this executable once per build.

  • args (defaults to None) #

    A string of args that will be passed to the executable represented by exe on initial startup.

  • max_workers (defaults to 1) #

    The maximum number of workers of this type that Buck will start. Use -1 to allow the creation of as many workers as necessary.

  • env (defaults to None) #

    A map of environment variables that will be passed to the executable represented by exe on initial startup.

  • persistent (defaults to False) #

    If set to true, buck will not restart the tool unless the tool itself changes. This means the tool will persist across multiple buck commands without being shut down and may see the same rule being built more than once. Be careful when using this with tools that don't expect to process the same input (with different contents) twice!

  • visibility (defaults to []) #

    List of build target patterns that identify the build rules that can include this rule in its deps.

  • licenses (defaults to []) #

    Set of license files for this library. To get the list of license files for a given build rule and all of its dependencies, you can use buck query.

  • labels (defaults to []) #

    Set of arbitrary strings which allow you to annotate a build rule with tags that can be searched for over an entire dependency tree using buck query attrfilter.

Examples

worker_tool(
  name = 'ExternalToolWorker',
  exe = ':ExternalTool',
  args = '--arg1 --arg2'
)

sh_binary(
  name = 'ExternalTool',
  main = 'external_tool.sh',
)

genrule(
  name = 'TransformA',
  out = 'OutputA.txt',
  cmd = '$(worker :ExternalToolWorker) argA',
)

genrule(
  name = 'TransformB',
  out = 'OutputB.txt',
  cmd = '$(worker :ExternalToolWorker) argB',
)

genrule(
  name = 'TransformC',
  out = 'OutputC.txt',
  cmd = '$(worker :ExternalToolWorker) argC',
)

When doing a buck build on all three of the above genrules, Buck will first create the worker process by invoking:
./external_tool.sh --arg1 --arg2
Buck will then communicate with this process using JSON over stdin, starting with a handshake:

[
  {
    id: 0,
    type: 'handshake',
    protocol_version: '0',
    capabilities: []
  }

Buck will then wait for the tool to reply on stdout:

[
  {
    id: 0,
    type: 'handshake',
    protocol_version: '0',
    capabilities: []
  }

Then, when building the first genrule, Buck will write to stdin:

  ,{
    id: 1,
    type: 'command',
    args_path: '/tmp/1.args',
    stdout_path: '/tmp/1.out',
    stderr_path: '/tmp/1.err',
  }

The file /tmp/1.args would contain argA. The tool should then perform the necessary work for this job and then write the job's output to the files supplied by Buck. Once the job is done, the tool should reply to Buck on stdout with:

  ,{
    id: 1,
    type: 'result',
    exit_code: 0
  }

Once Buck hears back from the first genrule's job, it will submit the second genrule's job in the same fashion and await the response. When the build is all finished Buck will close the JSON by writing to stdin:

]

which signals to the tool that it should close itself after replying on stdout:

]

In this example, Buck is guaranteed to invoke
./external_tool.sh --arg1 --arg2
just once during the build. The three jobs corresponding to the three genrules are submitted synchronously to the single worker process.

Note that the id values in the messages are not necessarily increasing or sequential, but they have to match between the request message and the response message of a given job as well as in the initial handshake.

If the tool receives a message type it cannot interpret it should answer with:

{
  id: <n>,
  type: 'error',
  exit_code: 1
}

If the tool receives a message type it can interpret, but the other attributes of the message are in an inconsistent state, it should answer with:

{
  id: <n>,
  type: 'error',
  exit_code: 2
}