Now we have a separate map for "git" tracking remote, we can always preserve
the last imported/exported git_refs. The option to restore git-tracking refs
has been removed. Perhaps, --what can be reorganized as --local and --remote
<NAME>.
The commit backend at Google is cloud-based (and so are the other
backends); it reads and writes commits from/to a server, which stores
them in a database. That makes latency much higher than for disk-based
backends. To reduce the latency, we have a local daemon process that
caches and prefetches objects. There are still many cases where
latency is high, such as when diffing two uncached commits. We can
improve that by changing some of our (jj's) algorithms to read many
objects concurrently from the backend. In the case of tree-diffing, we
can fetch one level (depth) of the tree at a time. There are several
ways of doing that:
* Make the backend methods `async`
* Use many threads for reading from the backend
* Add backend methods for batch reading
I don't think we typically need CPU parallelism, so it's wasteful to
have hundreds of threads running in order to fetch hundreds of objects
in parallel (especially when using a synchronous backend like the Git
backend). Batching would work well for the tree-diffing case, but it's
not as composable as `async`. For example, if we wanted to fetch some
commits at the same time as we were doing a diff, it's hard to see how
to do that with batching. Using async seems like our best bet.
I didn't make the backend interface's write functions async because
writes are already async with the daemon we have at Google. That
daemon will hash the object and immediately return, and then send the
object to the server in the background. I think any cloud-based
solution will need a similar daemon process. However, we may need to
reconsider this if/when jj gets used on a server with a custom backend
that writes directly to a database (i.e. no async daemon in between).
I've tried to measure the performance impact. That's the largest
difference I've been able to measure was on `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo,
which increases from 749 ms to 773 ms (3.3%). In most cases I've
tested, there's no measurable difference. I've tried diffing from the
root commit, as well as `jj --ignore-working-copy log --no-graph -r
'::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a
commit-heavy load).
Before this patch, when updating to a commit that has a file that's
currently an ignored file on disk, jj would crash. After this patch,
we instead leave the conflicting files or directories on disk. We
print a helpful message about how to inspect the differences between
the intended working copy and the actual working copy, and how to
discard the unintended changes.
Closes#976.
On my Debian laptop, openssl_init() takes ~30ms to load the default CA
certificates serialized in PEM format, and the cost is added to each jj
invocation. This change saves 20s (of 50s) on my machine.
% wc -l /usr/lib/ssl/cert.pem
3517 /usr/lib/ssl/cert.pem
It's about time we make the working copy a pluggable backend like we
have for the other storage. We will use it at Google for at least two
reasons:
* To support our virtual file system. That will be a completely
separate working copy backend, which will interact with the virtual
file system to update and snapshot the working copy.
* On local disk, we need to tell our build system where to find the
paths that are not in the sparse patterns. We plan to do that by
wrapping the standard local working copy backend (the one moved in
this commit), writing a symlink that points to the mainline commit
where the "background" files can be read from.
Let's start by renaming the exising implementation to
`local_working_copy`.
I've added a boolean flag to the store to ensure that the migration never runs
more than once after the view gets "op restore"-d. I'll probably reorganize the
branches structure to support non-tracking branches later, but updating the
storage format in a single commit would be too involved.
If jj is downgraded, these "git" remote refs would be exported to the Git repo.
Users might have to remove them manually.
I have used the tree-level conflict format for several weeks without
problem (after the fix in 51b5d168ae). Now - right after the 0.10.0
release - seems like a good time to enable the config by default.
I enabled the config in our default configs in the CLI crate to reduce
impact on tests (compared to changing the default in `settings.rs`).
As we can set HEAD to an arbitrary ref by using .reference_symbolic(), we don't
have to manage a ref that can also be valid as a branch name.
Fixes#1495
I'll add a workaround for the root parent issue #1495 there. We can pass in
the wc parent id instead of the wc_commit object, but we might want to use
wc_commit.id() to generate a unique placeholder ref name.
While debugging git issues, I often ended up creating a deadlock by adding
debug prints. It's also not obvious that git::export_refs() works even if the
git_repo() has already been locked, whereas git::import_refs() wouldn't. Let's
consolidate lock handling to the backend implementation.
Apparently, it gets too verbose if the remote history is actively rewritten.
Let's summarize the output for now. The plan is to show the list of moved refs
instead of the full list of abandoned commits.
The codespell GitHub action fails because of the typo. I don't know
why it started failing now. The comment is 8 months old and the
codespell action hasn't been updated in 5 months.
The problem is that the first non-working-copy commit moves the unborn current
branch to that commit, but jj doesn't "export" the moved branch. Therefore,
the next jj invocation notices the "external" ref change, which was actually
made by jj.
I'm not sure why we play nice by setting the "current" HEAD, but I *think* it's
okay to set the "new" HEAD and reset to the same commit to clear Git index.
This will probably help to understand why you've got conflicts after fetching.
Maybe we can also report changed local refs.
I think the stats should be redirected to stderr, but we have many other similar
messages printed to stdout. I'll probably fix them all at once later.
I think most users who change the set of immutable heads away from
`trunk() | tags()` are going to also want to change the default log
revset to include the newly mutable commit and to exclude the newly
immutable commits. So let's update the default log revset to use
`immutable_heads()` instead.
`test_templater` changed because we have overridden the set of
immutable commits there so `jj log` now includes the remote branch.
`jj split` with no arguments operates interactively, but I am nonetheless constantly running `jj split -i` because I expect an `--interactive` flag to exist for consistency.
However, `jj split <paths>` before this commit always operates non-interactively, so this commit has the nice practical effect that you can restrict your interactive splitting to a certain set of paths.
All non-test callers already have a `Merge` object, so let's pass that
instead. We thereby simplify the callers a little, and we enforce the
"adds.len() == removes.len() + 1" constraint in the type.
When there's a single parent, we can determine if a commit is empty by
just comparing the tree ids. Also, when using tree-level conflicts, we
don't need to read the trees to determine if there's a conflict. This
patch adds both of those fast paths, speeding up `jj log -r ::main`
from 317 ms to 227 ms (-28.4%). It has much larger impact with our
cloud-based backend at Google (~5x faster).
I made the same fix in the revset engine and the Git push code (thanks
to Yuya for the suggestion).
This adds a new `revset-aliases.immutable_heads()s` config for
defining the set of immutable commits. The set is defined as the
configured revset, as well as its ancestors, and the root commit
commit (even if the configured set is empty).
This patch also adds enforcement of the config where we already had
checks preventing rewrite of the root commit. The working-copy commit
is implicitly assumed to be writable in most cases. Specifically, we
won't prevent amending the working copy even if the user includes it
in the config but we do prevent `jj edit @` in that case. That seems
good enough to me. Maybe we should emit a warning when the working
copy is in the set of immutable commits.
Maybe we should add support for something more like [Mercurial's
phases](https://wiki.mercurial-scm.org/Phases), which is propagated on
push and pull. There's already some affordance for that in the view
object's `public_heads` field. However, this is simpler, especially
since we can't propagate the phase to Git remotes, and seems like a
good start. Also, it lets you say that commits authored by other users
are immutable, for example.
For now, the functionality is in the CLI library. I'm not sure if we
want to move it into the library crate. I'm leaning towards letting
library users do whatever they want without being restricted by
immutable commits. I do think we should move the functionality into a
future `ui-lib` or `ui-util` crate. That crate would have most of the
functionality in the current `cli_util` module (but in a
non-CLI-specific form).
I'm going to make this function check against a configurable revset
indicating immutable commits. It's more efficient to do that by
evaluating the revset only once.
We may want to have a version of the function where we pass in an
unevaluated revset expression. That would allow us to error out if the
user accidentally tries to rebase a large set of commits, without
having to evaluate the whole set first.
Once we add support for immutable commits, `jj duplicate` should be
allowed to create duplicate of them. The reason it can't duplicate the
root commit is that it would mean there would be multiple root
commits, which would break the invariant that the single root commit
is the only root commit (and the backends refuse to write a commit
without parents). So let's have `jj duplicate` check specifically that
the user doesn't try to duplicate the root commit instead.
For `jj split --interactive`, the user will want to select changes from a subset of files. This means that we need to pass the `Matcher` object when materializing the list of changed files. I also updated the parameter lists so that the matcher always immediately follows the tree objects.
https://github.com/cargo-bins/cargo-binstall
Note that `jj` will only become installable once the next release
is published to crates.io. For this reason, I am planning to
wait until then before documenting the fact that `jj` can be
installed this way.
At that point, `cargo binstall jj-cli` should be sufficient.
Before then, it's possible to test that this will work by doing
```
cargo binstall jj-cli --force --strategies crate-meta-data --log-level debug --dry-run --manifest-path cli/Cargo.toml
```
Without --dry-run, this should install the 0.9 release if run
on `cli/Cargo.toml` form this commit.
Among other things, this prevented `jj` from working with
`cargo binstall`.
The trick is taken from
https://github.com/rust-lang/cargo/issues/2911#issuecomment-1483256987.
We could now also remove `--bin jj` from the installation commands
in `install-and-setup.md`, but I'm not sure we should. That argument
makes it clear that the binary is `jj`, not `jj-cli`.
Fixes#216.
I don't think there's any reason to use the local backend in tests
instead of using the stricter test backend.
I think we should generally use the test backend in tests and only use
the local backend or git backend when there's a particular reason to
do so (such as in `test_bad_locking` where the on-disk directory
structure matters). But this patch only deals with the simpler cases
where we were only testing with the local backend.
This appears to be broken at db0d14569b "cli: wrap repo in a struct to
prepare for adding cached data." Testing this isn't easy since the operation
id recorded here will be overwritten immediately by snapshot_working_copy(),
and the snapshotting should work fine so long as the tree id matches.
The rest of the functions in this file are defined before they are used, so it confused me when trying to track down this function in the static call graph.
It makes the call sites clearer if we pass the `TestRepoBackend` enum
instead of the boolean `use_git` value. It's also more extensible (I
plan to add another backend for tests).
I think the feature is requested by enough users that we should
include it by default, also for people who install from source (we
include it in the `packaging` feature already).
It increases the size of the binary from 16.5 MiB to 17.8 MiB. I
suspect we'd see some of that increase in size soon anyway, as I'm
probably going to use Tokio for making async backend requests.
There should be no reason to include `criterion` without the `bench`
feature. Adding the `dep:` prefix to any dependency makes cargo not
expose the implicit `criterion` feature.
Since we have overloaded operator symbols, we need to deduplicate them
upfront. Legacy and compat operators are also removed from the suggestion.
It's a bit ugly to mutate the error struct before calling Error::renamed_rule(),
but I think it's still better than reimplementing message formatting function.
As we discussed in #1928, it seems better to print information about
abandoned commits in the context of the pre-abandon state. For
example, that means that we'll include any branches that pointed to
the now-abandoned commits.
This is a naive implementation, which cannot deal with multiple children
or parents stemming from merges.
Note: I gave each command separate a separate argument struct
for extensibility.
Fixes#878
Many failure to export refs to Git are not about conflicts between a
branch named `foo` and a branch named `foo/bar`, so don't give that
hint in most cases.
Suppose "x::y" is the operator that defaults to "root()::visible_heads()"
respectively, "::" is identical to "all()". Since we've just changed the
behavior of "..y", ".." is now "root()..visible_heads()" meaning "~root()".
In `LockedWorkingCopy::drop()`, we panic if the caller had not called
`finish()`. IIRC, the idea was both to find bugs where we forgot to
call `finish()` and to prevent continuing with a modified
`WorkingCopy` instance. I don't think the former has been a problem in
practice. It has been a problem in practice to call `discard()` to
avoid the panic, though. To address that, we can make the `Drop`
implementation discard the changes (forcing a reload of the state if
the working copy is accessed again).
I also converted the error from `InternalError` to `UserError`. So far
I've intented to use `InternalError` only to indicate bugs or corrupt
repos. I'm not sure that's a good idea, and we can revisit it later.
If the path is too long to fit on the screen, this patch makes it so
we elide the first part of it. It goes a bit further and trims it down
to ~70% of the screen, giving some room for the stat. This seems
somewhat similar to what Git does.
`insta` ignores leading indentation (as long as it's consistent within
the snapshot), but when a test case fails and you let `cargo insta`
update it, it's going to use a specific indentation. There were a few
tests that didn't match that indentation, which could lead to
surprising diffs if the tests fail at some point.
This shows that there's too much padding because we pad based on
number of bytes.
I had to reduce the path names for the file names not to get too long
for my file system.
We can still crash on terminals that are less than 4 characters wide
(maybe it doesn't matter if we do because the user can't tell the
crash report from a diffstat in such a terminal?). This patch fixes
the crash.
We would run into a panic due to "attempt to subtract with overflow"
if the path was long. This patch fixes that and adds tests showing the
current behavior when there are long paths and/or large diffs.
When we start writing tree-level conflicts in an existing repo, we
don't want commits that change the format to be non-empty if they
don't change any content. This patch updates `MergeTreeId::eq()` to
consider two resolved trees equal even if only their `MergedTreeId`
variant is different (one is path-level and one is tree-level).
I think I've gone through all places we compare tree ids and checked
that it's safe to compare them this way. One consequence is that
rebasing a commit without changing the parents (typically
auto-rebasing after `jj describe`) will not lead to the tree id
getting upgraded, due to an optimization we have for that case. I
don't think that's serious enough to handle specially; we'll have to
support the old format for existing repos for a while regardless of a
few commits not getting upgraded right away.
The number of failing tests with the config option enabled drop from
108 to 11 with this patch.
We're finally ready to start writing trees using the new format where
we represent conflicts by having multiple trees in the commit instead
of having a single tree with multiple entries at a path. This patch
adds a config option for that. It's not ready to be used yet, so I
haven't updated the release notes or other documentation.
I added only a simple CLI test for testing what happens when the
config is enabled in an existing repo. 108 tests currently fail if we
flip the default.
With the idea that less severe placeholders (like description) could
(and should) explicitly "opt out".
(Both email and name placeholders will be red with this change.)
I made it simply fail on explicit fetch/import, and ignored on implicit import.
Since the error mode is predictable and less likely to occur. I don't think it
makes sense to implement warning propagation just for this.
Closes#1690.
This switches the whole `diff_util` module to working with
`MergedTree`, `Merge<Option<TreeValue>>` etc., so it can support
tree-level conflicts.
Since we want to avoid using `ConflictId`s, I switched the hash we use
for conflicts in `--git` style diffs to use an all-'0' id instead of
using the conflict id.
This patch also extracts format_detailed_signature() function to deduplicate
the "show" template bits.
The added placeholder templates aren't labeled as "empty". If needed, I think
the whole template can be labeled as "empty" (or "empty_commit") just like
"working_copy".
Closes#2112
An alternative name for it would be `arity()`, but `num_sides()`
probably more clearly says that it's not about the number of removes
or the total number of terms.
I think this is less surprising than falling back to the default length.
i64-to-usize conversion can also overflow on 32 bit environment, but I'm not
bothered to handle overflow scenario.
To support tree-level conflicts, we're going to need to update the
working copy from one `MergedTree` to another. We're going need to
store multiple tree ids in the `tree_state` file. This patch gets us
closer to that by getting the diff from `MergedTree`s`, even though we
assume that they are legacy trees for now, so we can write to the
single-tree `tree_state` file.
This allows negative numbers, which also means functions which took numbers can now take negative numbers
Luckily, they all already handled this exactly as expected.
Many of the `TreeBuilder` users have an `Option<TreeValue>` and call
either `set()` or `remove()` or the builder depending on whether the
value is present. Let's centralize this logic in a new
`TreeBuilder::set_or_remove()`.
The way `jj git push` without arguments chooses branches pointing to
either `@` or `@-` is unusual and difficult to explain. Now that we
have `-r`, we could instead default it to `-r '@-::@'`. However, I
think it seems likely that users will want to push all local branches
leading up to `@` from the closest remote branch. That's typically
what I want. This patch changes the default to do that.
If there are branches in the revset that don't need to be pushed
because they already match the destination, we currently just print
`Nothing changed.` It seems consistent with that to also treat it as
success if there are no branches in the specified set to start
with. This patch makes the command print a warning in that case
instead.
The VS Code "Better TOML" plugin (which I think most of our VS Code developers use?) doesn't support the `x.y = z` syntax at the top level, even though it's valid TOML.
This is also useful if we ever want to add additional properties in different sub-crates (although unlikely for the near future).
`revset::parse()` already has a `RevsetWorkspaceContext` argument, so
I think it makes sense to put that and the other context arguments
into a larger `RevsetParseContext` object.
We resolve file paths into repo-relative paths while parsing the
revset expression, so I think it's consistent to also resolve which
workspace "@" refers to while parsing it. That means we won't need the
workspace context both while parsing and while resolving symbols.
In order to break things like `author("martinvonz@")` (thanks to @yuja
for catching this), I also changed the parsing of working-copy
expressions so they are not allowed to be
quoted. `author(martinvonz@)` will therefore be an error now. That
seems like a small improvement anyway, since we have recently talked
about making `root` and `[workspace]@` not parsed as other symbols.
Custom binaries will often want to provide e.g. additional command
aliases, additional revset aliases, custom colors, etc. This adds a
mechanism for them to do that.
This commit replaces the functions `UserSettings::user_name_placeholder()`` and
`UserSettings::user_email_placeholder()` with `const` `&str`s to emphasize that
the placeholder strings must not be changed to support commits without
names or email addresses made before this change.
Bright green really pops on my screen, and I don't think there is a reason
for the root commit to be attention-grabbing.
This follows up on https://github.com/martinvonz/jj/pull/2084.
The syntax is slightly different from Mercurial. In Mercurial, a pattern must
be quoted like "<kind>:<needle>". In JJ, <kind> is a separate parsing node, and
it must not appear in a quoted string. This allows us to report unknown prefix
as an error.
There's another subtle behavior difference. In Mercurial, branch(unknown) is
an error, whereas our branches(literal:unknown) is resolved to an empty set.
I think erroring out doesn't make sense for JJ since branches() by default
performs substring matching, so its behavior is more like a filter.
The parser abuses DAG range syntax for now. It can be rewritten once we remove
the deprecated x:y range syntax.
One use case for `jj split` is when creating a new commit from some of
the changes in the working copy. If there's no description on the
working-copy commit in that case, it seems better to not ask the user
to provide one when they're splitting the commit either.
I've extracted the `builtin_log_root` template for users to customize the
default templates without fully overriding them, for example I would remove
the change_id/commit_id for myself - and we discussed in Discord that leaving
those makes sense for the user to be reminded/teached that the root commit has
a change id made from z's.
Similar to other boolean flags, such as "working_copy" or "empty".
We could test something like
`"0000000000000000000000000000000000000000".contains(commit_id)`
like I did for myself, but first of all this is ugly, and secondly the root
commit id is not guaranteed to be 40 zeroes as custom backend implementations
could have some other root.
This basically means that heads in a filtered graph appear in reverse
chronological order. Before, "jj log -r 'tags()'" in linux-stable repo would
look randomly sorted once you ran "jj debug reindex" in it.
With this change, indexing is more like breadth-first search, and BFS is
known to be bad at rendering nice graph (because branches run in parallel.)
However, we have a post process to group topological branches, so we don't
have this problem. For serialization formats like Mercurial's revlog iirc,
BFS leads to bad compression ratio, but our index isn't that kind of data.
Reindexing gets slightly slower, but I think this is negligible.
(in Git repository)
% hyperfine --warmup 3 --runs 10 "jj debug reindex --ignore-working-copy"
(original)
Time (mean ± σ): 1.521 s ± 0.027 s [User: 1.307 s, System: 0.211 s]
Range (min … max): 1.486 s … 1.573 s 10 runs
(new)
Time (mean ± σ): 1.568 s ± 0.027 s [User: 1.368 s, System: 0.197 s]
Range (min … max): 1.531 s … 1.625 s 10 runs
Another idea is to sort heads chronologically and run DFS-based topological
sorting. It's ad-hoc, but worked surprisingly well for my local repositories.
For repositories with lots of long-running branches, this commit will provide
more predictable result than DFS-based one.
`jj chmod` won't operate on conflicts involving non-files on the
positive sides. However, the error message says "None of the sides of
the conflict are files", which is not correct.
That is, jj will use ui.default_description as a starting point when
user is about to describe an empty change.
I think it might be confusing to do this with -m / --stdin (violates
WYSIWYG), so I'm only doing this when jj invokes an editor.
Also, this could evolve into a proper template in the future instead of
just plain text, to allow inheriting from parent change(s), for example.
Partially addresses #1354.
We anyway trim the newlines eventually and this just does that eagerly
so we output the "correct" description back to stdout (on describe for
example, we'd now print the first non empty line).
Empty files can be confusing in diff output. For example:
```
Added regular file file1:
Added regular file file2:
1: foo
```
This commit adds an "(empty)" placeholder instead. Since it's not
colored, and doesn't have line numbers, it will hopefully not be
mistaken for a file with the contents "(empty)".
AFAIK, we can't make HEAD detached in an empty Git repository, so we need
to temporarily switch to the new default branch before checking out.
Fixes#2047