docs/design: Move the run doc to github.

This ticks another box in #1869. Co-Authored-By: arxanas <me@waleedkhan.name> Co-Authored-By: hooper <hooper@google.com> Co-Authored-By: martinvonz <martinvonz@google.com>
2023-09-06 00:49:59 +02:00 · 2023-09-06 00:49:59 +02:00 · 5b729a90a9
commit 5b729a90a9
parent 2c74fa8c7c
2 changed files with 278 additions and 0 deletions
--- a/docs/design/run.md
+++ b/docs/design/run.md
@ -0,0 +1,277 @@
+# Introducing JJ run
+
+Authors: [Philip Metzger](mailto:philipmetzger@bluewin.ch), [Martin von Zweigberk](mailto:martinvonz@google.com), [Danny Hooper](mailto:hooper@google.com), [Waleed Khan](mailto:me@waleedkhan.name)
+
+Initial Version, 10.12.2022 (view full history [here](https://docs.google.com/document/d/14BiAoEEy_e-BRPHYpXRFjvHMfgYVKh-pKWzzTDi-v-g/edit))
+
+
+**Summary:** This Document documents the design of a new `run` command for 
+Jujutsu which will be used to seamlessly integrate with build systems, linters
+and formatters. This is achieved by running a user-provided command or script 
+across multiple revisions. For more details, read the 
+[Use-Cases of jj run](#Use-Cases-of-jj-run).
+
+## Preface
+
+The goal of this Design Document is to specify the correct behavior of `jj run`.
+The points we decide on here I (Philip Metzger) will try to implement. There 
+exists some prior work in other DVCS:
+* `git test`: part of [git-branchless]. Similar to this proposal for `jj run`. 
+* `hg run`: Google's internal Mercurial extension. Similar to this proposal for
+`jj run`.
+Details not available. 
+* `hg fix`: Google's open source Mercurial extension: [source code][fix-src]. A
+more specialized approach to rewriting file content without full context of the
+working directory. 
+* `git rebase -x`: runs commands opportunistically as part of rebase. 
+* `git bisect run`: run a command to determine which commit introduced a bug.
+
+## Context and Scope
+
+The initial need for some kind of command runner integrated in the VCS, surfaced
+in a [github discussion][pre-commit]. In a [discussion on discord][hooks] about
+the git-hook model, there was consensus about not repeating their mistakes.
+
+For `jj run` there is prior art in Mercurial, git branchless and Google's 
+internal Mercurial. Currently git-branchless `git test` and `hg fix` implement
+some kind of command runner. While the Google internal `hg run` works in 
+conjunction with CitC (Clients in the Cloud) which allows it to lazily apply
+the current command to any affected file. The base Jujutsu backend does not
+have a fancy virtual filesystem supporting it, so we can't apply this 
+optimization. 
+
+## Goals and Non-Goals
+
+### Goals
+
+* We should be able to apply the command to any revision, published or unpublished.
+* We should be able to parallelize running the actual command, while preserving a
+good console output.
+* The run command should be able to work in the working copy. 
+* There should exist some way to signal hard failure. 
+* The command should build enough infrastructure for `jj test`, `jj fix` and 
+`jj format`.
+* The main goal is to be good enough, as we can always expand the functionality 
+in the future.
+
+### Non-Goals
+
+* While we should build a base for `jj test`, `jj format` and `jj fix`, we 
+shouldn't mash their use-cases into `jj run`.
+* The command shouldn't be too smart, as too many assumptions about workflows 
+makes the command confusing for users. 
+* The smart caching of outputs, as user input commands can be unpredictable.
+* Fine grained user facing configuration, as it's unwarranted complexity.
+* A `fix` subcommand as it cuts too much design space.
+
+## Use-Cases of jj run
+
+**Linting and Formatting:**
+
+- `jj run 'pre-commit run' -r $revset`
+- `jj run 'cargo clippy' -r $revset`
+- `jj run 'cargo +nightly fmt'`
+
+**Large scale changes across repositories, local and remote:**
+
+- `jj run 'sed s/some/test' -r 'draft() & ~remote_branches()'`
+- `jj run '$rewrite-tool' -r '$revset'`
+
+**Build systems:**
+
+- `jj run 'bazel build //some/target:somewhere'`
+- `jj run 'ninja check-lld'`
+
+Some of these use-cases should get a specialized command, as this allows 
+further optimization. A command could be `jj format`, which runs a list of 
+formatters over a subset of a file in a revision. Another command could be 
+`jj fix`, which runs a command like `rustfmt --fix` or `cargo clippy --fix` over
+a subset of a file in a revision.
+
+## Design
+
+### Base Design 
+
+All the work will be done in the `.jj/` directory. This allows us to hide all 
+complexity from the users, while preserving the user's current workspace.
+
+We will copy the approach from git-branchless's `git test` of creating a 
+temporary working copy for each parallel command. The working copies will be 
+reused between `jj run` invocations. They will also be reused within `jj run` 
+invocation if there are more commits to run on than there are parallel jobs.
+
+We will leave ignored files in the temporary directory between runs. That 
+enables incremental builds (e.g by letting cargo reuse its `target/` directory).
+However, it also means that runs potentially become less reproducible. We will 
+provide a flag for removing ignored files from the temporary working copies to
+address that. 
+
+Another problem with leaving ignored files in the temporary directories is that
+they take up space. That is especially problematic in the case of cargo (the 
+`target/` directory often takes up tens of GBs). The same flag for cleaning up
+ignored files can be used to address that. We may want to also have a flag for 
+cleaning up temporary working copies *after* running the command. 
+
+An early version of the command will directly use [Treestate] to 
+to manage the temporary working copies. That means that running `jj` inside the 
+temporary working copies will not work . We can later extend that to use a full
+[Workspace]. To prevent operations in the working copies from 
+impacting the repo, we can use a separate [OpHeadsStore] for it.
+
+### Modifying the Working Copy
+
+Since the subprocesses will run in temporary working copies by default, they 
+won't interfere with the user's working copy. The user can therefore continue
+to work in it while `jj run` is running. 
+
+We want subprocesses to be able to make changes to the repo by updating their
+assigned working copy. Let's say the user runs `jj run` on just commits A and 
+B, where B's parent is A. Any changes made on top of A would be squashed into 
+A, forming A'. Similarly B' would be formed by squasing it into B. We can then
+either do a normal rebase of B' onto A', or we can simply update its parent to
+A'. The former is useful, e.g when the subprocess only makes a partial update
+of the tree based on the parent commit. In addition to these two modes, we may 
+want to have an option to ignore any changes made in the subprocess's working 
+copy.
+
+### Modifying the Repo
+
+Once we give the subprocess access to a fork of the repo via separate 
+[OpHeadsStore], it will be able to create new operations in its fork.
+If the user runs `jj run -r foo` and the subprocess checks out another commit,
+it's not clear what that should do. We should probably just verify that the 
+working-copy commit's parents are unchanged after the subprocess returns. Any
+operations created by the subprocess will be ignored. 
+
+### Rewriting the revisions 
+
+We should handle public and private revisions differently. We choose to operate
+on an immutable history by default.
+
+### Public revisions
+
+For published revisions, we will not allow `jj run` to modify them and then 
+immediately error out, as published history should be immutable. We may want to
+support a `--force` flag for an override but it won't be available in the first
+iteration of the command. 
+
+### Private/Draft revisions
+
+For private/draft revisions, we just amend the changes, as Jujutsu usually does. 
+We also expose the actual behavior as a command option.
+
+## Execution order/parallelism
+
+It may be useful to execute commands in topological order. For example, 
+commands with costs proportional to incremental changes, like build systems. 
+There may also be other revelant heuristics, but topological order is an easy
+and effective way to start. 
+
+Parallel execution of commands on different commits may choose to schedule 
+commits to still reduce incremental changes in the working copy used by each
+execution slot/"thread". However, running the command on all commits 
+concurrently should be possible if desired. 
+
+Executing commands in topological order allows for more meaningful use of any 
+potential features that stop execution "at the first failure". For example, 
+when running tests on a chain of commits, it might be useful to proceed in 
+topological/chronological order, and stop on the first failure, because it 
+might imply that the remaining executions will be undesirable because they will
+also fail.
+
+## Dealing with failure
+
+It will be useful to have multiple strategies to deal with failures on a single
+or multiple revisions. The reason for these strategies is to allow customized
+conflict handling. These strategies then can be exposed in the ui with a 
+matching command.
+
+**Continue:** If any subprocess fails, we will continue the work on child 
+revisions. Notify the user on exit about the failed revisions. 
+
+**Stop:** Signal a fatal failure and cancel any scheduled work that has not
+yet started running, but let any already started subprocess finish. Notify the
+user about the failed command and display the generated error from the 
+subprocess. 
+
+**Fatal:** Signal a fatal failure and immediately stop processing and kill any 
+running processes. Notify the user that we failed to apply the command to the 
+specific revision. 
+
+We will leave any affected commit in its current state, if any subprocess fails.
+This allows us provide a better user experience, as leaving revisions in an 
+undesirable state, e.g partially formatted, may confuse users.
+
+## Resource constraints
+
+It will be useful to constrain the execution to prevent resource exhaustion. 
+Relevant resources could include:
+- CPU and memory available on the machine running the commands. `jj run` can
+provide some simple mitigations like limiting parallelism to "number of CPUs" 
+by default, and limiting parallelism by dividing "available memory" by some 
+estimate or measurement of per-invocation memory use of the commands.
+- External resources that are not immediately known to jj. For example, 
+commands run in parallel may wish to limit the total number of connections
+to a server. We might choose to defer any handling of this to the 
+implementation of the command being invoked, instead of trying to 
+communicate that information to jj.
+
+
+## Command Options
+
+The base command of any jj command should be usable. By default `jj run` works 
+on the `@` the current working copy.
+* --command, explicit name of the first argument
+* -x, for git compatibility (may alias another command)
+* -j, --jobs, the amount of parallelism to use
+* -k, --keep-going, continue on failure (may alias another command)
+* --show, display the diff for an affected revision
+* --dry-run, do the command execution without doing any work, logging all 
+intended files and arguments
+* --rebase, rebase all parents on the consulitng diff (may alias another 
+command)
+* --reparent, change the parent of an effected revision to the new change 
+(may alias another command)
+* --clean, remove existing workspaces and remove the ignored files
+* --readonly, ignore changes across multiple run invocations
+* --error-strategy=`continue|stop|fatal`, see [Dealing with failure](#Dealing-with-failure)
+
+### Integrating with other commands
+
+`jj log`: No special handling needed
+`jj diff`: No special handling needed
+`jj st`: For now reprint the final output of `jj run`
+`jj op log`: No special handling needed, but awaits further discussion in 
+[#963][issue]
+`jj undo/jj op undo`: No special handling needed
+
+
+## Open Points
+
+Should the command be backend specific?  
+How do we manage the Processes which the command will spawn?  
+Configuration options, User and Repository Wide?
+
+## Future possibilities
+
+- We could rewrite the file in memory, which is a neat optimization  
+- Exposing some internal state, to allow preciser resource constraints  
+- Integration options for virtual filesystems, which allow them to cache the 
+needed working copies.  
+- A Jujutsu wide concept for a cached working copy, as they could be expensive
+to materialize.  
+- Customized failure messages, this maybe useful for bots, it could be similar 
+to Bazel's `select(..., message = "arch not supported for $project")`.
+- Make `jj run` asynchronous by spawning a `main` process, directly return to the
+user and incrementally updating the output of `jj st`. 
+
+
+
+[git-branchless]: https://github.com/arxanas/git-branchless
+[issue]: https://github.com/martinvonz/jj/issues/963 
+[fix-src]: https://repo.mercurial-scm.org/hg/file/tip/hgext/fix.py
+[hooks]: https://discord.com/channels/968932220549103686/969829516539228222/1047958933161119795
+[OpHeadsStore]: https://github.com/martinvonz/jj/blob/main/lib/src/op_heads_store.rs
+[pre-commit]: https://github.com/martinvonz/jj/issues/405
+[Treestate]: https://github.com/martinvonz/jj/blob/af85f552b676d66ed0e9ae0d401cd0c4ffbbeb21/lib/src/working_copy.rs#L117
+[Workspace]: https://github.com/martinvonz/jj/blob/af85f552b676d66ed0e9ae0d401cd0c4ffbbeb21/lib/src/workspace.rs#L54
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -76,6 +76,7 @@ nav:
 - 'Design docs': 
    - 'git-submodules': 'design/git-submodules.md'
    - 'git-submodule-storage': 'design/git-submodule-storage.md'
+    - 'JJ run': 'design/run.md'
    - 'Tracking branches': 'design/tracking-branches.md'