This will allows us to parse "file(..)" arguments as fileset expression by
transforming AST for example. I'm not sure if that's good or bad, but we'll
probably want to embed fileset expressions without quoting.
parse_expression_rule() is split to the first str->ExpressionNode stage and
the second ExpressionNode->RevsetExpression stage. The latter is called
"resolve_*()" in fileset, but we have another "symbol" resolution stage in
revset. So I choose "lower_*()" instead.
Otherwise, newly created default branch would be re-imported as a new Git HEAD.
This could be addressed by cmd_git_init(), but the same situation can be
crafted by using "git checkout -b".
See #2651 and a935a4f70c for more
background.
This speeds up `jj log` in a large repo with watchman enabled by around
9%:
```
$ hyperfine --sort command --warmup 3 --runs 20 -L bin \
jj-before,jj-after "target/release/{bin} -R ~/chromiumjj/src log"
Benchmark 1: target/release/jj-before -R ~/chromiumjj/src log
Time (mean ± σ): 788.3 ms ± 3.4 ms [User: 618.6 ms, System: 168.8 ms]
Range (min … max): 783.1 ms … 793.3 ms 20 runs
Benchmark 2: target/release/jj-after -R ~/chromiumjj/src log
Time (mean ± σ): 713.4 ms ± 5.2 ms [User: 536.1 ms, System: 176.2 ms]
Range (min … max): 706.6 ms … 724.7 ms 20 runs
Relative speed comparison
1.11 ± 0.01 target/release/jj-before -R ~/chromiumjj/src log
1.00 target/release/jj-after -R ~/chromiumjj/src log
```
Some backends, like the one we have at Google, can restrict access to
certain files. For such files, if they return a regular
`BackendError::ReadObject`, then that will terminate iteration in many
cases (e.g. when diffing or listing files). This patch adds a new
error variant for them to return instead, plus handling of such errors
in diff output and in the working copy.
In order to test the feature, I added a new commit backend that
returns the new `ReadAccessDenied` error when the caller tries to read
certain objects.
Merge commits are very similar to non-merge commits in jj. An empty
merge commit with no description is not really different from an empty
non-merge commit with no description. As we discussed on
https://github.com/martinvonz/jj/issues/2859, we should not treat
merge commits differently when updating away from them. This patch
adds test for the current behavior (which is to leave the merge commit
in place).
I recently needed to test something on top of a two branches at the
same time, so I created a new commit on top of both of them (i.e. a
merge commit). I then ran tests and made some adjustments to the
code. These adjustments belonged in one of the parent branches, so I
used `jj squash --into` to squash it in there. Unfortunately, that
meant that my working copy became a single-parent commit based on one
of the branches only. We already had #2859 for tracking this issue.
This patch changes the behavior so we create a new working-copy commit
with all of the previous parents.
I used .map_err(|_: Vec<_>|) to clarify that the original data is returned as
an error (so it can't be .unwrap()-ed.) However, it can be said that the error
detail isn't important and .map_err() is too verbose.
This accounts for commit message-only changes which do not change the
tree ID, as well as checkouts between commits with identical tree
contents.
See #3748 for previous work on avoiding resetting HEAD when the
*commits* are identical.
This shares some common code to both the root and the non-root case, and
provides easier access to `mut_repo` in the function - as
`set_git_head_target` mutably borrows `mut_repo`.
This will be used for a cleaner implementation of #3767.
This should be a no-op, though that is not necessarily obvious in corner
cases.
Note that libgit2 already performs the push negotiation even when
pushing without force (without `+` in the refspec).
As explained in the commit, our logic is a bit more complicated than
that of `git push --force-with-lease`. This is to match the behavior of
`jj git fetch` and branch conflict resolution rules.
We want to move UI-independent logic that's currently in `jj-cli` into
`jj-lib`. `WorkspaceCommandHelper` is perhaps the most important part
to start moving. As I was looking into what to move from
`WorkspaceCommandHelper`, the first thing I saw there was the
`cwd`. It might seem like a good candidate to start moving. However,
when running a server, you might be running operations on repos stored
in database, so `cwd` and the workspace root don't make sense then
(because the repo is not stored at a particular path).
So, instead, this patch starts abstracting out our uses of those two
paths by adding an enum for converting between `RepoPath` and paths as
they are presented in the UI. I added a variant for repos stored in a
file system, and made `WorkspaceCommandHelper` use that to show that
it works. We'll probably add a server variant later.
I put the new type in `repo_path.rs` because at least the
file-system-based implementation is closely related to
`RepoPath::parse_fs_path()`.
Before this patch, `MergedTreeBuilder::write_tree()` would create a
new legacy tree if the base tree was a legacy tree. This patch makes
it always write multiple root trees instead (if there are conflicts).
We still support reading legacy conflicts if the
`format.tree-level-conflicts` config is set.
It's been about six months since we started using tree-level conflicts
by default. I can't imagine we would switch back. So let's continue
the migration by always using tree-level conflicts when merging trees,
even if all inputs were legacy trees.
In chromium/src.git, this gives an approximate ~0.83s speedup for
commands if the Git index is empty, and a ~0.14s slowdown if the Git
index is non-empty.
As most users using jj will likely be using jj to write to a colocated
repo - and therefore avoid modifying the Git index - this should be a
general speedup for most colocated checkouts.
The original expand_node() body is migrated as follows:
- Identifier -> fold_identifier()
- FunctionCall -> fold_function_call()
expand_defn() now manages states stack by itself, which simplifies lifetime
parameters.
The templater implementation of FoldableExpression is a stripped-down version
of expand_node(). It's visitor-like because I'm going to write generic alias
substitution rules over abstract expression types (template, revset, fileset.)
Naming comes from rustc.
https://rust-unofficial.github.io/patterns/patterns/creational/fold.html
This is basically the same as the previous patch, but for error types. Some
of these functions could be encoded as "E: From<AliasExpandError<'i>>", but
alias substitution logic is recursive, so it would have to convert E back and
force.
I'm going to extract generic alias substitution functions, and these AST types
will be accessed there. Revset parsing will also be migrated to the generic
functions.
This will help extract common FunctionCallNode<'i, T> type. We don't need
freedom of arbitrary error type choices, but implementing From<_> is the
easiest option I can think of. Another option is to constrain error type by
the expression type T through "T::ParseError: ArgumentsParseError" or
something, but it seemed a bit weird that we have to use trait just for that.
This revset correctly implements "reachability" from a set of source commits following both parent and child edges as far as they can go within a domain set. This type of 'bfs' query is currently impossible to express with existing revset functions.
This follows up on https://github.com/martinvonz/jj/pull/3459 and adds a
label to the closing delimeter of each conflict, e.g. "Conflict 1 of 3
ends".
I didn't initially put any label at the ending delimeter since the
starting delimeter is already marked with "Conflict 1 of 3". However,
I'm now realizing that when I resolve conflicts, I usually go from top
to bottom. The first thing I do is delete the starting conflict
delimeter. It is when I get to the *end* of the conflict that I wonder
whether there are any more conflicts left in the file.
When I addded the workaround in 256988de65, I missed the comment
just below explaining that heads removed by the other side were
already handled. Since that's not handled when using non-default
indexes now, we need to handle it in an `else` block.
The function now returns an iterator over `Result`s, matching
`Operation::parents()`.
I updated callers to also propagate the error where it was trivial.
We do a 3-way merge of repo views in a few different
scenarios. Perhaps the most obvious one is when merging concurrent
operations. Another case is when undoing an operation.
One step of the 3-way repo view merge is to find which added commits
are rewrites of which removed commits (by comparing change IDs). With
the high commit rate in the Google repo (combined with storing the
commits on a server), this step can get extremely slow.
This patch adds a hack to disable the slow step when using a
non-standard `Index` implementation. The Google index is the only
non-standard implementation I'm aware of, so I don't think this will
affect anyone else. The patch is also small enough that I don't think
it will cause much maintenance overhead for the project.
parse_function_argument_to_file/string_pattern() functions aren't moved.
They are more like functions that transform parsed tree to intermediate
representation.
I'm planning to rewrite the parser in a similar way to fileset/template parsing,
and the revset_parser module will host functions and types for the first stage.
I haven't started the rewrite, but it seems good to split the revset module
even if we reject the idea.
This is more consistent with the other method and it makes some extension operations easier, by giving access to the OpStore and other relevant context for custom index extensions.
`RewriteType::Rewritten` must have exactly one replacement. I think
it's better to encode that in the type by attaching the value to the
enum variant. I also renamed the type to just `Rewrite` since it now
has attached data and `Type` sounds like a traditional data-free enum
to me.
Looks like I forgot this in some recent refactoring.
I don't really see any harm in making the type public later. I might
want to make `rebase_descendants()` not clear `parent_mapping` and
instead provide a way of accessing it afterwards (removing the need
for the `_return_map()` flavors). We'll see if that ends up
happening. For now it can be private anyway.
For example,
```
<<<<<<< Conflict 1 of 3
+++++++ Contents of side #1
left 3.1
left 3.2
left 3.3
%%%%%%% Changes from base to side #2
-line 3
+right 3.1
>>>>>>>
```
or
```
<<<<<<< Conflict 1 of 1
%%%%%%% Changes from base to side #1
-line 3
+right 3.1
+++++++ Contents of side #2
left 3.1
left 3.2
left 3.3
>>>>>>>
```
Currently, there is no way to disable these, this is TODO for a future
PR. Other TODOs for future PRs: make these labels configurable. After
that, we could support a `diff3/git`-like conflict format as well, in
principle.
Counting conflicts helps with knowing whether you fixed all the
conflicts while you are in the editor.
While labeling "side #1", etc, does not tell you the commit id or
description as requested in #1176, I still think it's an improvement.
Most importantly, I hope this will make `jj`'s conflict format less
scary-looking for new users.
I've used this for a bit, and I like it. Without the labels, I would see
that the two conflicts have a different order of conflict markers, but I
wouldn't be able to remember what that means. For longer diffs, it can
be tricky for me to quickly tell that it's a diff as opposed to one of
the sides. This also creates some hope of being able to navigate a
conflict with more than 2 sides.
Another not-so-secret goal for this is explained in
https://github.com/martinvonz/jj/pull/3109#issuecomment-2014140627. The
idea is a little weird, but I *think* it could be helpful, and I'd like
to experiment with it.
The format is 7 characters of the separator followed by a space and arbitrary
text, followed by a newline. Separator followed by a newline is also allowed.
E.g.:
<<<<<<< Random text
%%%%%%% Random text
line 2
-line 3
+left
line 4
+++++++ Random text
right
%%%%%%% Random text
line 2
+forward
line 3
line 4
>>>>>>> Random text
This commit only allows reading such conflicts.
I considered allowing longer separators (`<<<<<<<<<<<<<< Random text`), but we
wouldn't currently write them, so let's be strict for now.
7 characters if they are followed by a space and arbitrary text
We already have two uses for this function and I think we're soon
going to have more.
The function record the old commit as abandoned with the new parents,
which is typically what you want. We could record it as abandoned with
the old parents instead but then we'd have to do an extra iteration to
find the parents when rebasing any children. It would also be
confusing if
`rewriter.set_parents(new_parents).record_abandoned_commit()` didn't
respect the new parents.
Since fileset/revset/template expressions are specified as command-line
arguments, it's sometimes convenient to use single quotes instead of double
quotes. Various scripting languages parse single-quoted strings in various ways,
but I choose the TOML rule because it's simple and practically useful. TOML is
our config language, so copying the TOML syntax would be less surprising than
borrowing it from another language.
https://github.com/toml-lang/toml/issues/188
While I like strict parsing, it's not uncommon that we have to deal with file
names containing spaces, and doubly-quoted strings such as '"Foo Bar"' look
ugly. So, this patch adds an exception that accepts top-level bare strings.
This parsing rule is specific to command arguments, and won't be enabled when
loading fileset aliases.
When you use e.g. `git switch` to check out a conflicted commit,
you're going to end up with the `.jjconflicts-*` directories in your
working copy. It's probably not obvious what those mean. This patch
adds a README file to the root tree to try to explain to users what's
going on and how to recover.
The authoritative information about conflicts is stored in the
`jj:trees` commit header. The contents of conflicted commits is only
used for preventing GC. We can therefore add contents to the tree
without much consequence.
This addresses the test instability. The underlying problem still exists, but
it's unlikely to trigger user-facing issues because of that. A repo instance
won't be reused after gc() call.
Fixes#3537
Apparently, these gc() invocations rely on that the previous "git gc" packed
all refs so there are no loose refs to compare mtimes. If there were new (or
remaining) loose refs, mtime comparison could fail. I also added +1sec to
effectively turn off the keep_newer option, which isn't important in these
tests.
CommitIds are often manipulated by reference, so this makes the API more
flexible for cases where the caller doesn't already have a Vec or array of
owned CommitIds.
In many cases `rewrite_parents()` does not even need to clone the input
CommitIds. This refactor allows the clone to be avoided if it's unnecessary.
There might be other APIs that would benefit from a similar change. In general,
it seems like there are a lot of places where we're writing
`&[commit_x.id().clone, commit_y.id().clone()]` and similiar.
- [Rust API Guidelines](https://rust-lang.github.io/api-guidelines/flexibility.html#functions-minimize-assumptions-about-parameters-by-using-generics-c-generic)
## Feature Description
If enabled in the user or repository settings, the local branches pointing to the
parents of the revision targeted by `jj commit` will be advanced to the newly
created commit. Support for `jj new` will be added in a future change.
This behavior can be enabled by default for all branches by setting
the following in the config.toml:
```
[experimental-advance-branches]
enabled-branches = ["glob:*"]
```
Specific branches can also be disabled:
```
[experimental-advance-branches]
enabled-branches = ["glob:*"]
disabled-branches = ["main"]
```
Branches that match a disabled pattern will not be advanced, even if they also
match an enabled pattern.
This implements feature request #2338.
It's reasonable for a `WorkingCopy` implementation to want to return
an error. `LocalWorkingCopyFactory` doesn't because it loads all data
lazily. The VFS-based one at Google wants to be able to return an
error, however.
Previously, this command would work:
jj --config-toml='snapshot.max-new-file-size="1"' st
And is equivalent to this:
jj --config-toml='snapshot.max-new-file-size="1B"' st
But this would not work, despite looking like it should:
jj --config-toml='snapshot.max-new-file-size=1' st
This is extremely confusing for users.
This config value is deserialized via serde; and while the `HumanByteSize`
struct allegedly implemented Serde's `visit_u64` method, it was not called by
the deserialize visitor. Strangely, adding an `visit_i64` method *did* work, but
then requires handling of overflow, etc. This is likely because TOML integers
are naturally specified in `i64`.
Instead, just don't bother with any of that; implement a `TryFrom<String>`
instance for `HumanByteSize` that uses `u64::from_str` to try parsing the string
immediately; *then* fall back to `parse_human_byte_size` if that doesn't work.
This not only fixes the behavior but, IMO, is much simpler to reason about; we
get our `Deserialize` instance for free from the `TryFrom` instance.
Finally, this adjusts the test for `max-new-file-size` to now use a raw integer
literal, to ensure it doesn't regress. (There are already in-crate tests for
parsing the human readable strings.)
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: I8dafa2358d039ad1c07e9a512c1d10fed5845738
`jj parallelize` was a good example of a command that can be
simplified by the new API, so I decided to rewrite it as an example.
The rewritten version is more flexible and doesn't actually need the
restrictions from the old version (such as checking that the commits
are connected). I still left the check for now to keep this patch
somewhat small. A subsequent commit will remove the restrictions.
There are several existing commands that would benefit from an API
that makes it easier to rewrite a whole graph of commits while
transforming them in some way.
`jj squash` is one example. When squashing into an ancestor, that
command currently rewrites the ancestor, then rebases descendants, and
then rewrites the rewritten source commit. It would be better to
rewrite the source commit (and any descendants) only once.
Another example is the future `jj fix`. That command will want to
rewrite a graph while updating the trees. There's currently no good
API for that; you have to manually iterate over descendants and
rewrite them.
This patch adds a new `MutableRepo::transform_descendants()` method
that takes a callback which gets a `CommitRewriter` passed to it. The
callback can then decide to change the parents, the tree, etc. The
callback is also free to leave the commit in place or to abandon it.
I updated the regular `rebase_descendants()` to use the new function
in order to exercise it. I hope we can replace all of the
`rebase_descendant_*()` flavors later.
I added a `replace_parent()` method that was a bit useful for the test
case. It could easily be hard-coded in the test case instead, but I
think the method will be useful for `jj git sync` and similar in the
future.
`CommitRewriter` wraps 3 of the arguments, so I think it makes sense
to pass it instead. More importantly, I hope to continue refactoring
so many of the callers already have a `CommitRewriter`.
The new `rebase()` method is meant to be called after deciding on the
new parents (typically by leaving them unchanged). It returns a
`CommitBuilder` for setting any additional values.
There will probably be a `reparent()` method in the future.
This patch adds a struct that's meant to help when rewriting
commits. It contains the old commits and the new parents. I hope to
move most of the logic from `rebase_commit_with_options()` onto it in
coming patches. Then this type can be passed in a callback to make it
easier to do custom rewriting of commits that is currently hard to do
because `rebase_descendants()` does not give the caller any control
over the process.
The helper is similar to `CommmitBuilder`, but it is a bit different
by also embedding information about the source commit, so I don't
think the API would be as convenient if we just used `CommitBuilder`
directly.
Mercurial appears to resolve cwd-relative path first, so "glob:*.c" could be
parsed as "**/*.c" if cwd was literally "**". It wouldn't practically matter,
but isn't correct. Instead, jj's parser first splits glob into literal part
and pattern. That's mainly because we want to parse the user input texts into
type-safe objects, and (RepoPathBuf, glob::Pattern) pairs are the simplest
ones. The current parser can't handle patterns like "foo/*/.." (= "foo" ?),
and errors out. I believe this restriction is acceptable.
Unlike literal paths, the 'glob:' pattern anchors to the whole file path. I
don't think "prefix"-matching glob is useful, and making it the default would
be rather confusing.
Patterns are specified as (dir, pattern) pairs because we need to handle
parse errors prior to constructing a matcher, and it's convenient to split
literal directory paths there.
It's cheap to look up commits again from the cache in `Store` but it
can be expensive to look up commits we didn't end up needing. This
will make it easier to refactor further and be able to cheaply set
preliminary parents for a rewritten commits and then let the caller
update them.
I'm going to add a helper struct to help with rewriting commits. I
want to make that struct own the old commit and the new parents to
simplify lifetimes. This patch prepares for that by passing the
commits by value to `rebase_commit()`.
Running `cargo publish` from a non-colocated repo (such as my usual
repo) is currently quite scary because it uploads all non-hidden
files, even if they're ignored by `.gitignore`
(https://github.com/rust-lang/cargo/issues/2063). I noticed this a
while ago and have always run the command from a fresh clone since
then. To avoid the need for that, let's use the workaround mentioned
on the bug, which is to explicitly list patterns we want to publish.
This prepares for adding glob matcher, which will be backed by
RepoPathTree<Vec<glob::Pattern>>.
FilesNodeKind/PrefixNodeKind are basically boolean types, but implemented as
enums for better code readability.
The is_dir flag will be removed soon. Since FilesMatcher doesn't set is_dir
flag explicitly, is_dir is equivalent to !entries.is_empty(). OTOH,
PrefixMatcher always sets is_dir, so all tree nodes are directories.
Perhaps, I didn't do that because it's important to initialize is_dir/file to
false. Since I'm going to extract a generic map-like API, and is_dir/file will
be an enum, this won't be a problem.
I'm going to extract generic map from RepoPathTree, and .get_visit_sets()
will be inlined into FilesMatcher/PrefixMatcher. These removed tests should
be covered by the corresponding matcher tests.
The functions now depend only on `MutableRepo`, so I think they belong
on that type. This gets us closer to being able to make
`parent_mapping` private again.
I think the recent refactorings (especially 9c382fd8c6) make it
pretty clear that `DescendantRebaser` will not attempt to rebase the
same commit twice, so I think we can remove the assertions. This
removes some of the places where `DescendantRebaser` reaches into
`MutableRepo`'s internals.
The Pijul maintainer has opinions that I don't understand about how we
mention Pijul (they consider the current mentions offensive as
"bashing Pijul"). Let's just remove the references so we don't have to
deal with it. I think the references to Darcs we already had in most
of these places are sufficient.
This implements the other workaround described in 57167cefda "git: on
import_refs(), don't abandon ancestors of newly fetched refs":
> I think there are two ways to fix the problem:
> a. pin non-tracking remote branches just like local refs
> b. pin newly fetched refs in addition to local refs
> This patch implements (b) because it's simpler and more obvious that the
> fetched commits would never be abandoned immediately.
The idea of (a) is that untracked remote branches are independent read-only
refs, and read-only branches shouldn't be rewritten implicitly. Once the
branch gets rewritten or abandoned by user, these remote refs will be hidden,
and won't be pinned anymore.
Since (a) effectively supersedes (b), this patch also removes the original
workaround.
Fixes#3495
If we ever implement some sort of ABI for dynamic extension loading, we'll need these underlying APIs to support multiple extensions, so we might as well do that first.
If this doesn't work out, maybe we can try one of these:
a. fall back to bare file name if expression doesn't contain any operator-like
characters (e.g. "f(x" is an error, but "f x" can be parsed as bare string)
b. introduce command-line flag to opt in (e.g. -e FILESET)
c. introduce pattern prefix to opt in (e.g. set:FILESET)
Closes#3239, #2915, #2286
This command checks not only whether Watchman works, but also whether
it's enabled in the config. Also, the output is easier to understand
than that of the other `jj debug watchman` commands.
It would be nice if `jj debug watchman` called `jj debug watchman
status`, but it's not trivial in `clap` to have a default subcommand.
The primary use case is to warn unmatched paths. I originally thought paths in
negated expressions shouldn't be checked, but doing that seems rather
inconsistent than useful. For example, "~x" in "jj split '~x'" should match at
least one file to split to non-empty revisions.
Since fileset is primarily used in CLI, it's better to avoid inner quoting if
possible. For example, ".." would have to be quoted in the original grammar
derived from the revset.
This patch also adds a stricter version of an identifier rule. If we add a
symbol alias, it will follow the "strict_identifier" rule.
The fileset grammar is basically a stripped-down version of the revset grammar,
with a few adjustments:
* extract function call to "function" rule (like templater)
* inline "symbol" rule (because "identifier" and "string" should be treated
differently at the early parsing stage.)
The parser will have a separate name resolution stage. This will help to do
alias substitution properly. I'll probably rewrite the revset parser in the
same way. It will also help if we want to embed fileset expression in file()
revset.
There are no more callers of parse_function_argument_to_string(), so it's
removed. This function was a thin wrapper of literal parser, and can be
easily reintroduced if needed.
FilesetExpression is similar to RevsetExpression, but there are two major
differences:
- Union is represented as N-ary operator,
- Expression node isn't Rc-ed.
The former is because of the nature of the runtime Matcher objects. It's easier
to construct a Matcher from flattened union expressions than from a binary tree.
The latter choice comes from UnionAll(Vec<FilesetExpression>), which doesn't
have to be Vec<Rc<FilesetExpression>>, and Rc<[FilesetExpression]> can't be
constructed from [Rc<_>, ..]. Anyway, the internal representation may change as
needed.
Another design decision I made is Vec<Pattern(RepoPathBuf)> vs
Pattern(Vec<RepoPathBuf>). I chose the former because it will be more closer
to the parsed tree of the fileset language.
The nightly compiler has several clippy fix-its that, if applied, break the
build. There are various bugs about this, but there isn't enough space in the
margins to detail it all.
Just ignore these on a per-function basis; about 70% of them are just multiple
instances happening inside a single function.
This makes `cargo clippy --workspace --all-targets` run clean, even with the
nightly compiler.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Ic26a025d3c62b12fbf096171308b56e38f7d1bb9
This will be needed to concatenate patterns of different types (such as
"prefix/dir" exact:"file/path".)
The implementation is basically a copy of IntersectionMatcher, with some
logical adjustments. In Mercurial, unionmatcher supports list of matchers
as input, but I think binary version is good enough.
In order to implement a fileset, we'll need owned variants of these matchers.
We can of course let callers move Box<dyn Matcher> into these adapters, but
we might need to somehow clone Box<dyn Matcher>. So, I simply made adapters
generic.
This function doesn't actually need commits, it only needs their IDs. In some
contexts we may only have commit IDs, so there's no need to require an iterator
of Commits.
This commit also adds a `CommitIteratorExt` that makes it easy to convert an
iterator of `&Commit` to an iterator of `&CommitId`.
Templater doesn't have the one yet, but I think it belongs to the same
category.
For clap::Error, we could use clap's own mechanism to render suggestions as
"tip: ...", but I feel "Hint: ..." looks better because our error/hint message
is capitalized.
I'm going to add RevsetParseError constructor for InvalidFunctionArguments,
with/without a source error, and I don't want to duplicate code for all
combinations. The templater change is just for consistency.
I couldn't find a good naming convention for the builder-like API, so it's
called .with_source(mut self, _). Another option was .source_set(source).
Apparently, it's not uncommon to name consuming constructor as
with_<something>().
Now that we no longer bother to keep the set of heads to add and
remove updated while we rewrite descendants, we can simplify how we
find the set of heads to remove - it's simply all commits that have
been marked rewritten, divergent, or abandoned, i.e. the keys in
`parent_mapping`.
I don't think we have any transactions that mark commit as abandoned
and then later mark it as rewritten or divergent. But if we ever do, I
think it should be considered just rewritten/divergent. So let's
enforce that invariant by removing the old value from the set of
abandoned commits.
This commit moves the parse_string_pattern helper function into the
str_util module in jj lib and adds tests for it.
I'd like to reuse this code in a function defined by `UserSettings`, which is
part of the jj lib crate and cannot use functions from the cli crate.
This makes the summary line more informative. Even though it just duplicates
the message printed later, I think it's easier to follow.
This patch also adjusts some RevsetParseError messages because it seemed
redundant to repeat "revset function", "argument", etc.
Because the CLI error handler now prints error sources in multi-line format,
it doesn't make much sense to render Revset/TemplateParseError differently.
This patch also fixes the source() of the SyntaxError kind. It should be
self.pest_error.source() (= None), not self.pest_error.
I'm going to make TemplateParseError hold RevsetParseError as Box<dyn _>, but
Box<dyn std::error::Error ..> doesn't implement Eq. I could remove Eq from
ErrorKind enums, but it's handly if these enums remain as value types.
This change will also simplify fmt::Display and error::Error impls.
It's common to normalize an empty directory path as ".". This change unblocks
the use of from_relative_path() in edit_sparse().
There are a couple of callers who do to_fs_path(Path::new("")), but they all
translate non-directory paths, which should never be empty.
We currently include the commits in `parent_mapping` and `abandoned`
in the set of commits to visit when rebasing descendants. The reason
was that we used to update branches and working copies when we visited
these commits. Since we started updating refs after rebasing all
commits, there's no need to even visit these commits.
We only use `new_commits` in `update_heads()`, so let's calculate it
there. It should also be more correct in case other commits were
created after we initialized `DescendantRebaser`.
Now that we only call `update_references()` in one place, there's no
reason to have it also update `heads_to_add` and `heads_to_remove`. By
moving it out of the function, we can consolidate the logic in one
place.
When `rebase_commit_with_options()` decides to abandons a commit, it
records the new parents in the `MutableRepo`, but it's currently the
caller's responsibility to remember to mark it as abandoned. Let's
move that logic into the function to reduce the risk of future bugs.
By adding the abandoned commit's parents to `parent_mapping`, we can
remove a bit more of the special handling of abandoned commitsin
`DescendantRebaser`.
In the normal case when we don't abandon a commit because it became
empty, then `CommitBuilder::write()` will have recorded the new commit
as a rewrite of the old commit. We don't need to do that again in
`rebase_one()`.
A subset of the state in `DescendantRebaser` now matches exactly what
`MutableRepo` already stores, so we can avoid copying that state and
have `DescendantRebaser` use it directly instead. Having a single
source of truth for the state will enable further simplifications and
improvements.
I'm going to make `DescendantRebaser` share the state about rewritten
commits with `MutableRepo` next. That means that the call to
`rebase_commit_with_options()` will update that state, which would
make this assertion fail. So let's move it a little earlier to avoid
that.
This is just to match `DescendantRebaser`, to make the next commit a
bit simpler. I think `MutableRepo` still has few enough fields that
just `abandoned` is clear enough. Maybe we'll move the three
rewrite-related fields into a new struct at some point.
With this patch, `MutableRepo` has the same tracking of rewritten
commits as `DescendantRebaser`, so we can simply pass that state into
`DescendantRebaser` when we create it. The next step is to remove the
state from `DescendantRebaser`.
I don't think we have any callers left that call
`record_rewritten_commit()` multiple times within a transaction and
expect it to result in divergence. I think we should consider it a bug
to do that.
When rebasing descendants, we generally move branches, child commits,
the working copy to the rewritten commit(s). However, we don't move
the working copy to the new rewritten commit (s) if the old commit had
been abandoned, and we don't move child commits if the rewriten was
divergent.
This patch aims to make it clearer that there's only one mapping from
old to new parents, and that is in `parent_mapping`. It does so by
merging the current `divergent` map into it, and makes the `divergent`
just a set instead. When finding the new parents for a child, we leave
the existing parent if it's in the set.
My longer-term goal is to move `parent_mapping`, `abandoned`, and
`divergent` into `MutableRepo` (maybe in a nested struct), so we can
do some transformations on descendants as we rebase them. By having
the state in a single place (not moving it from `MutableRepo` to
`DescendantRebaser` as we currently do), I hope it will be easier to
write a `MutableRepo::transform_descendants(callback)`, where the
callback gets a `CommitBuilder` and can change parents of the commit,
for example.
Apart from (IMO) looking nicer, this will also sidestep the potential problem
that if the file contains actual jj conflict markers (`>>>>>>>` in the beginning
of a line, for example), jj would currently have trouble materializing and
subsequently parsing conflicts in the file if it actually became conflicted.
I'll demo this bug in either this or a subsequent PR. It's the kind of bug that
sounds serious in theory but might never cause a problem in practice.
After this PR, only `docs/tutorial.md` has a conflict marker that's not indented.
There's only one there, so hopefully it won't be too much of a pain to deal with.
I also indented other strings in `test_conflicts.rs`. IMO, this looks nice and
more consistent with the `insta::assert_snapshot` output. I didn't spend the
time to do the same for `test_resolve_command`.
Suppose we have an alias 'immutable()' = '::immutable_heads()', user can
express (visible) mutable set as '~immutable()'. 'immutable_heads()..' can
terminate early, but a generic difference 'all() & ~immutable()' can't.
Suppose the generation value is usually small, it should be faster to do
bounded range look up first 'y-', then walk ancestors with the unwanted set
'y-..x'.
When an operation is missing and we recover the workspace, we create a
new working-copy commit on top of the desired working-copy commit (per
the available head operation). We then reset the working copy to an
empty tree because it shouldn't really matter much which commit we
reset to. However, when the workspace is sparse, it does matter, as
the test case from the previous patch shows. This patch fixes it by
replacing the `reset_to_empty()` method by a new `recover(&Commit)`,
which effectively resets to the empty tree and then resets to the
commit. That way, any subsequent snapshotting will result keep the
paths from that tree for paths outside the sparse patterns.
Prepares for removing &CompositeIndex from the RevsetGraphIterator struct.
The input iterator will also be changed to position-based.
I've turned self.look_ahead.get().unwrap() into assertion, but it's not super
important here. It's just for sanity that we've mapped missing edges properly.
FWIW, we could say RevsetGraphIterator is an example of iterating *and* testing
membership of the input revset (though the yielded entries are discarded.)
For the same reason as the previous commit. Since self.inner.positions()
basically clones the underlying evaluation tree, there is no reason to stick
to &self lifetime. Perhaps, some of the CLI utility can be changed to not
collect() the iterator.
Migrating iter_graph() requires non-trivial changes, so it will be done
separately.
This allows callers to cache the returned function at 'index lifetime. It's
important in templater. It also means the returned function could be 'static
if the index were Arc<_> and we had a trait interface to achieve that.
Option<Box<dyn ..>> is removed since RevWalk is fused.
This makes the whole evaluation tree 'static, and we can freely move it without
keeping the root RevsetImpl object alive.
Perhaps, "Self: 'a" can be replaced with 'static, but let's leave it for now.
It's not technically wrong to store lifetimed object in InternalRevset.
Prepares for dropping &self lifetime from to_predicate_fn(). All predicate
functions could be wrapped as Box::new(PurePredicateFn(Rc::new(f))) instead, but
I don't think the .clone() cost matters.
This is the step towards removing &CompositeIndex references from the revset
evaluation tree. The filter input is changed from &IndexEntry to IndexPosition
to simplify the lifetime thingy. We might want to pass around CommitId or
Commit object once it's loaded, but that can be implemented later. I don't
see significant performance difference in revset benches.
This serves the same role as templater::Literal. I'm going to add basic
RevWalk adapters so that the revset evaluation tree can be constructed without
capturing the index. EagerRevWalk will help to write tests for these adapters.
Now as default and elided node symbols come from the config, the next logical
step is to use them directly bypassing GraphLog. Note that commands like `jj op
log` and `jj obslog` do not use the elided node symbol at all.
Just a minor code cleanup. We still need Index for &CompositeIndex because the
type is unsized, and unsized type cannot be converted to another dyn reference.
This helps to eliminate higher-ranked trait bounds from RevWalkRevset and
RevWalk combinators to be added. Since &CompositeIndex is now a real reference,
it can be passed to functions as index: &T.
This helps to migrate CompositeIndex<'_> wrapper to &CompositeIndex. If
the wrapped reference had a lifetimed field, it couldn't be represented as
a trivial reference type.
We haven't used custom Git commit headers for two main reasons:
1. I don't want commits created by jj to be different from any other
commits. I don't want Git projects to get annoyed by such commit
and reject them.
2. I've been concerned that tools don't know how to handle such
headers, perhaps even resulting in crashes.
The first argument doesn't apply to commits with conflicts because
such commits would never be accepted by a project whether or not they
use custom commit headers. The second argument is less relevant for
conflicted commits because most tools will be confused by such commits
anyway.
Storing conflict information in commit headers means that we can
transfer them via the regular Git wire protocol. We already include
the tree objects nested inside the root-level tree, so they will also
be transferred.
So, let's start by writing the information redundantly to the commit
header and to the existing storage. That way we can roll it back if we
realize there's a problem with using commit headers.
Initially we were thinking to have `Revset` return something like
`CachedRevset`:
```
pub trait CachedRevset {
fn iter(&self) -> Box<dyn Iterator<Item = Commit>>;
fn contains(&self, &CommitId) -> bool;
}
```
But we weren't sure what use case for `iter` would be, so we dropped the `iter`
method. `CachedRevset` with single `contains` method needed a better name. We
weren't able to come up with one, so we decided instead to have a method on
`Revset` that returns a closure to check if a commit is in a revset.
"for<'index> RevWalk<CompositeIndex<'index>, .." works as of now, but it won't
be composed well. So I'll turn CompositeIndex<'_> into &CompositeIndex in the
next batch, and remove "for<'index>".
This eliminates lifetimed fields from RevWalk objects, and the RevWalk object
will be embedded directly in RevWalkRevset.
This patch adds two separate iterator adapters. They are identical at this
point, but I'm going to add detach/reattach methods only to the borrowed
version. I'm also planning to change CompositeIndex<'_> to &CompositeIndex
to get around higher-ranked trait bound restrictions.
This simplifies the RevWalkIndex API. It would probably add fractional msecs of
overhead per next() call, but I don't see significant difference in revset
benches.
I'm going to make CompositeIndex<'_> detachable from the RevWalk, and
"F: Fn(CompositeIndex) -> Box<dyn Iterator<..>>" of RevWalkRevset<F> will
be replaced with "W: RevWalk<CompositeIndex>". This will simplify the code
structure, but also means that we can no longer apply .take_while() here and
convert it back to RevWalk. Fortunately, ancestors_until_roots() is the only
function I need to reimplement.
It doesn't make sense to build BinaryHeap with intermediate type, and I'm
going to reimplement take_until_roots() in a way that the queue drops
uninteresting items.
The current RevWalk constructors insert intermediate items to BinaryHeap
and convert them as needed. This is redundant, and I'm going to add another
parameter that should be applied to the queue first. That's why I decided
to factor out a builder type. I considered adding a few set of factory
functions that receive all parameters, but they looked messy because most of
the parameters are of [IndexPosition] type.
This patch also adds must_use to the builder and its return types, which are
all iterator-like.
Although watchman client appears to fail at decoding non-UTF-8 path (somewhere
in serde), jj shouldn't panic if watchman could deal with that.
The outer error message "path not in the repo" would sounds odd, but I think
that's okay because 1. it's unlikely that a user input is not UTF-8, and 2.
it's technically correct that a non-UTF-8 path is not contained in the repo.
This should address both use cases:
1. If from_relative_path() is directly called, the error says ".." shouldn't
be included in the (normalized) relative path.
2. If parse_fs_path() is used, the error message contains paths relative to
cwd. #3216
Some of the RevWalk methods could be generalized, but I decided to not try that
for now. I'll probably need to do more cleanup to (hopefully) remove 'index
lifetime from these types.
This requires a code tweak to avoid clippy failures, as `whoami` 1.5.0 has
deprecated the default `hostname()` function.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
This will also provide a better error indication. If write() failed, the child
process would presumably have exited with non-zero status and error message to
stderr.
`cargo doc` complains that two URLs aren't actually links:
```
warning: this URL is not a hyperlink
--> lib/src/fsmonitor.rs:66:6
|
66 | /// (https://facebook.github.io/watchman/). Requires `watchman` to already be
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use an automatic link instead: `<https://facebook.github.io/watchman/>`
|
= note: bare URLs are not automatically turned into clickable links
= note: `#[warn(rustdoc::bare_urls)]` on by default
warning: `jj-lib` (lib doc) generated 1 warning (run `cargo fix --lib -p jj-lib` to apply 1 suggestion)
Documenting jj-cli v0.14.0 (/Users/emesterhazy/oss/github.com/martinvonz/jj/cli)
Documenting testutils v0.14.0 (/Users/emesterhazy/oss/github.com/martinvonz/jj/lib/testutils)
warning: this URL is not a hyperlink
--> cli/src/cli_util.rs:2077:41
|
2077 | /// To get started, see the tutorial at https://github.com/martinvonz/jj/blob/main/docs/tutorial.md.
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use an automatic link instead: `<https://github.com/martinvonz/jj/blob/main/docs/tutorial.md.>`
|
= note: bare URLs are not automatically turned into clickable links
= note: `#[warn(rustdoc::bare_urls)]` on by default
warning: `jj-cli` (lib doc) generated 1 warning (run `cargo fix --lib -p jj-cli` to apply 1 suggestion)
```
This commit fixes the warnings by making the watchman URL a hyperlink and by
disabling the lint for the jj-cli error. Disabling the link is the right thing
to do because the comment is captured by clap and printed when `jj --help`
runs and any markdown formatting like `<>` is passed through.
The only thing we need from the `RepoLoader` is the `OpHeadsStore`, so we can
extract it in UnpublishedOperation::new instead of keeping the entire
`RepoLoader` around.
`NewRepoData` is just a container that holds data used to construct a
`ReadonlyRepo`. The `ReaonlyRepo` is always constructed before the
`UnpublishedOperation` is dropped, so we can simply construct the
`ReadonlyRepo` upfront and delete the `NewRepoData` type.
The custom Drop impl prevents us from moving members of UnpublishedOperation,
and is the reason why `NewRepoData` is wrapped in an `Option`. We don't use
custom Drop functions like this for debugging elsewhere in the codebase, and in
some ways #[must_use] provides better protection since it will typically cause
a compiler error if the UnpublishedOperation isn't used.
MutableRepo and CommitBuilder both define public (now crate-public) functions
which should only be called by each other. This commit adds documentation and
restricts visibility of these functions to the jj_lib crate. It might be even
better to move CommitBuilder to the same module as MutableRepo so that these
codependent functions can be private to the module to avoid misuse.
These comments are intended to make it easier for new developers to get up to
speed with the project. This is just a starting point... there are other types
and functions that could benefit from documentation.
There's no need to have a block of code at the beginning of the function to
cache the rewrite source id. We can simply check the necessary condition before
calling record_rewritten_commit.
This tweak makes the function a little easier to read since we don't check the
condition until we're ready to do the work.
This reverts dc074363d1 "no-op: Move external git repo canonicalization into
Workspace::init_git_external." As I said in the PR comment, appending ".git"
is normalization of the user input, which is IMHO more appropriate to be done
in the CLI layer.
This allows us to define documentation comments for types implemented using the
id_type! macro. Comments defined above the type inside the macro will be
captured and visible in generated docs.
Example:
```
id_type!(
/// Stable identifier for a [`Commit`]. Unlike the `CommitId`, the `ChangeId`
/// follows the commit and is not updated when the commit is rewritten.
pub ChangeId
);
```
This commit also adds documentation for the `CommitId` and `ChangeId` types
defined using the `id_type!` macro.
Follows up 7552f939c6 "tests: disable most gpg integration tests on Windows."
I couldn't find this test failing in a few samples before, but it does now.
This removes the special handling of the working-copy commit. By
recording when an empty/emptied commit was abanoned, we rebase
descendants correctly and create a new empty working-copy commit on
top.
This partially reverts changes in a9f489ccdf "Switch to ignore crate for
gitignore handling." Since child ignore object no longer needs to access the
root to resolve the prefix path, it's simpler to store a matcher per node.
With the current implementation, the file3 pattern is set to the prefix
"foo/foo/bar". I don't know if (unrooted) "baz" prefixed with "foo/foo/bar"
should match "foo/bar/baz", but apparently it is. Anyway, that wouldn't be
the case in practice because adjacent .gitignore files shouldn't be loaded.
We do clone Operation object in several places, and I'm going to add one more
.clone() in the templater. Since the underlying metadata has many fields, I
think it's better to wrap it with Arc just like a Commit object.
The default immutable_heads() includes tags(), which makes sense, but computing
heads(tags()) can be expensive because the tags() set is usually sparse. For
example, "jj bench revset 'heads(tags())'" took 157ms in my linux stable
mirror. We can of course optimize the heads evaluation by using bit set or
segmented index, but the query includes many historical heads if the repository
has per-release branches, which are uninteresting anyway. So, this patch
replaces heads(immutable_heads()) with trunk().
The reason we include heads(immutable_heads()) is to mitigate the following
problem. Suppose trunk() is the branch to be based off, I think using trunk()
here is pretty good.
```
A B
*---*----* trunk() ⊆ immutable_heads()
\
* C
```
https://github.com/martinvonz/jj/pull/2247#discussion_r1335078879
In my linux stable mirror, this makes the default log revset evaluation super
fast. immutable_heads(), if configured properly, includes many historical
branch heads which are also the visible heads.
revsets/immutable_heads()..
---------------------------
0 12.27 117.1±0.77m
3 1.00 9.5±0.08m
I'm going to add pre-filtering to the 'roots..heads' evaluation path, and
difference_by() will be used there to calculate 'heads ~ roots'.
Union and intersection iterators are slightly changed so that all iterators
prioritize iter1's item.
This adds a guard to the gpg signing tests which will skip the test if
`gpg` is not installed on the system.
This is done in order to avoid requiring all collaborators to have setup
all the tools on their local machines that are required to test commit
signing.
When doing things like testing snapshot performance differences,
this allows you to turn off the monitor, no matter what the enabled
user or repository configuration has, e.g.
jj st --config-toml='core.fsmonitor="none"'
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Consider this code:
```
struct NoContentHash {}
#[derive(ContentHash)]
enum Hashable {
NoCanHash(NoContentHash),
Empty,
}
```
Before this commit, it generates an error like this:
```
error[E0277]: the trait bound `NoContentHash: ContentHash` is not satisfied
--> lib/src/content_hash.rs:150:10
|
150 | #[derive(ContentHash)]
| ^^^^^^^^^^^ the trait `ContentHash` is not implemented for `NoContentHash`
151 | enum Hashable {
152 | NoCanHash(NoContentHash),
| --------- required by a bound introduced by this call
|
= help: the following other types implement trait `ContentHash`:
bool
i32
i64
u8
u32
u64
std::collections::HashMap<K, V>
BTreeMap<K, V>
and 35 others
For more information about this error, try `rustc --explain E0277`.
```
After this commit, it generates a better error message:
```
error[E0277]: the trait bound `NoContentHash: ContentHash` is not satisfied
--> lib/src/content_hash.rs:152:15
|
152 | NoCanHash(NoContentHash),
| ^^^^^^^^^^^^^ the trait `ContentHash` is not implemented for `NoContentHash`
|
= help: the following other types implement trait `ContentHash`:
bool
i32
i64
u8
u32
u64
std::collections::HashMap<K, V>
BTreeMap<K, V>
and 35 others
For more information about this error, try `rustc --explain E0277`.
error: could not compile `jj-lib` (lib) due to 1 previous error
```
It also works for enum variants with named fields:
```
error[E0277]: the trait bound `NoContentHash: ContentHash` is not satisfied
--> lib/src/content_hash.rs:152:23
|
152 | NoCanHash { named: NoContentHash },
| ^^^^^^^^^^^^^ the trait `ContentHash` is not implemented for `NoContentHash`
|
= help: the following other types implement trait `ContentHash`:
bool
i32
i64
u8
u32
u64
std::collections::HashMap<K, V>
BTreeMap<K, V>
and 35 others
For more information about this error, try `rustc --explain E0277`.
```
This is a no-op in terms of function, but provides a nicer way to derive the
ContentHash trait for structs using the `#[derive(ContentHash)]` syntax used
for other traits such as `Debug`.
This commit only adds the macro. A subsequent commit will replace uses of
`content_hash!{}` with `#[derive(ContentHash)]`.
The new macro generates nice error messages, just like the old macro:
```
error[E0277]: the trait bound `NotImplemented: content_hash::ContentHash` is not satisfied
--> lib/src/content_hash.rs:265:16
|
265 | z: NotImplemented,
| ^^^^^^^^^^^^^^ the trait `content_hash::ContentHash` is not implemented for `NotImplemented`
|
= help: the following other types implement trait `content_hash::ContentHash`:
bool
i32
i64
u8
u32
u64
std::collections::HashMap<K, V>
BTreeMap<K, V>
and 38 others
```
This commit does two things to make proc macros re-exported by jj_lib useable
by deps:
1. jj_lib needs to be able refer to itself as `jj_lib` which it does
by adding an `extern crate self as jj_lib` declaration.
2. jj_lib::content_hash needs to re-export the `digest::Update` type so that
users of jj_lib can use the `#[derive(ContentHash)]` proc macro without
directly depending on the digest crate. This is done by re-exporting it
as `DigestUpdate`.
#3054
It should be useful at least in the presentation layer to know which
operations correspond to working-copy snapshots. They might be
rendered differently in the graph, for example. Or maybe an undo
command wants to warn if you just undid a snapshot operation. This
patch just introduces a field in the metadata to store the
information.
I think the conclusion from #2600 is that at least auto-rebasing
should not simplify merge commits that merge a commit with its
ancestor. Let's start by adding an option for that in the library.
The shortest change id prefix will become a few digits longer, but I think
that's acceptable. Entries included in the "revsets.short-prefixes" set are
unaffected.
The reachable set is calculated eagerly, but this is still faster as we no
longer need to sort the reachable entries by change id. The lazy version will
save another ~100ms in mid-size repos.
"jj log" without working copy snapshot:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
-s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
"target/release-with-debug/{bin} -R ~/mirrors/linux \
--ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 353.6 ms ± 11.9 ms [User: 266.7 ms, System: 87.0 ms]
Range (min … max): 329.0 ms … 365.6 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 271.3 ms ± 9.9 ms [User: 183.8 ms, System: 87.7 ms]
Range (min … max): 250.5 ms … 282.7 ms 20 runs
Relative speed comparison
1.99 ± 0.16 target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
1.53 ± 0.12 target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
```
"jj status" with working copy snapshot (watchman enabled):
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
-s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
"target/release-with-debug/{bin} -R ~/mirrors/linux \
status --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 396.6 ms ± 10.1 ms [User: 300.7 ms, System: 94.0 ms]
Range (min … max): 373.6 ms … 408.0 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 318.6 ms ± 12.6 ms [User: 219.1 ms, System: 94.1 ms]
Range (min … max): 294.2 ms … 333.0 ms 20 runs
Relative speed comparison
1.85 ± 0.14 target/release-with-debug/jj-0 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
1.48 ± 0.12 target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
```
These methods are basically the same as the commit_id versions, but
resolve_change_id_prefix() is a bit more involved as we need to gather matches
from multiple segments.
In resolve_change_id_prefix(), I've implemented two different ways of
collecting the overflow items. I don't think they impact the performance,
but we can switch to the alternative method as needed.
This basically means that the change ids are interned. We'll implement binary
search over the sorted change ids table. The table could be sorted differently
for better cache locality, but it is in lexicographical order for simplicity.
With my testing, the cost of the id lookup isn't dominant.
Unlike the parent entries, the size of the per-id overflow items isn't saved.
That's s because the number of the same-change-id commits is either 1 or many.
It doesn't make sense to allocate 8 bytes for each change id. Instead, we'll
pay extra indirection cost to determine the size.
I'm going to add change id overflow table whose elements are of LocalPosition
type. Let's make sure that the serialization code would break if we changed
the underlying data type.
Apparently, gix has 100ms timeout. Since this test tries to create contended
situation, it's possible that the ref lock can't be acquired. I've added
upper bound to the retry loop at b37293fa68 "tests: add upper bound to
test_concurrent_read_write_commit() loop", so ignoring arbitrary errors
should be okay.
The problem can be reproduced on my Linux machine by inserting 10ms sleep() to
gix and increasing the concurrency.
Fixes#3069
This is for completeness and to avoid accidents such as someone calling
`ContentHash::hash(1234u32.to_le_bytes())` and expecting it to hash properly as
a u32 instead of a 4 byte slice, which produces a different hash due to hashing
the length of the slice before its contents.
The `ContentHash` documentation specifies that implementations for enums should
hash the ordinal number of the variant contained in the enum as a 32-bit
little-endian number and then hash the contents of the variant, if any.
The current implementations for `std::Option`, `MergedTreeId`, and
`RemoteRefState` are non-conformant since they hash the ordinal number as a u8
with platform specific endianness.
Fixes#3051
Similar to the previous commit, these functions will be reused by the change id
lookup methods. The return value isn't cloned because resolve_id_prefix() will
return (key, value) pair, and the current caller doesn't need a cloned value.
This removes redundant case from resolve_neighbor_commit_ids(). The returned
position should never be lower than the prefix id.
The implementation is basically a copy of slice::binary_search_by(). We still
use (low + high) / 2 as the size wouldn't exceed 2^31.
https://github.com/rust-lang/rust/blob/1.76.0/library/core/src/slice/mod.rs#L2825
Since IdIndex sorts the entries by using .sort_unstable_by_key(), the order of
the same-key elements is undefined. Perhaps, it's stable for short arrays, and
the test passes because of that.
This saves 4 more bytes per entry, and more importantly, most commit parents
can be resolved with no indirection to the overflow table.
IIRC, Git always inlines the first parent, but that wouldn't be useful in jj
since jj diffs merge commit against the auto-merge parent. The first merge
parent is nothing special.
I'll use a similar encoding in change id sstable, where only one position
will be inlined (to optimize for imported commits.)
Benchmark number measuring the cost of change id index building:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1 \
-s "target/release-with-debug/{bin} -R ~/mirrors/linux \
--ignore-working-copy debug reindex" \
"target/release-with-debug/{bin} -R ~/mirrors/linux \
--ignore-working-copy log -r@ --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r@ --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 342.9 ms ± 14.5 ms [User: 202.4 ms, System: 140.6 ms]
Range (min … max): 326.6 ms … 360.6 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r@ --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 325.0 ms ± 13.6 ms [User: 196.2 ms, System: 128.8 ms]
Range (min … max): 311.6 ms … 343.2 ms 20 runs
Relative speed comparison
1.06 ± 0.06 target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r@ --config-toml='revsets.short-prefixes=""'
1.00 target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r@ --config-toml='revsets.short-prefixes=""'
```
I'm going to change the index format to store local positions in the lookup
table. That's not super important, but I think it makes sense because the
lookup table should never contain inter-segment links.
The mutable segment now stores local positions in its lookup map. The readonly
segment will be updated later.
This unblocks removal of 'is_legacy: bool' fields.
Note that all legacy dag range expressions can't be accepted by the new grammar.
For example, 'x:y()' is parsed as ('x:y', error) because 'x:y' is a valid string
pattern expression, and '(' isn't an infix operator. The old compat_dag_range_op
is NOT removed as it can still translate 'x():y' or 'x:(y)' to a better error,
and we might make the string pattern syntax stricter #2101.
The legacy parsing rules are turned into compatibility errors. The x:y rule
is temporarily enabled when parsing string patterns. It's weird, but we can't
isolate the parsing function because a string pattern may be defined in an
alias.
It's not ideal to print the error there, but using stderr should be slightly
better. It could be a tracing message, but tracing won't be displayed by
default.
I'm going to introduce breaking changes in index format. Some of them will
affect the file size, so version number or signature won't be needed. However,
I think it's safer to detect the format change as early as possible.
I have no idea if embedded version number is the best way. Because segment
files are looked up through the operation links, the version number could be
stored there and/or the "segments" directory could be versioned. If we want to
support multiple format versions and clients, it might be better to split the
tables into data chunks (e.g. graph entries, commit id table, change id table),
and add per-chunk version/type tag. I choose the per-file version just because
it's simple and would be non-controversial.
As I'm going to introduce format change pretty soon, this patch doesn't
implement data migration. The existing index files will be deleted and new
files will be created from scratch.
Planned index format changes include:
1. remove unused "flags" field
2. inline commit parents up to two
3. add sorted change ids table
Since the operation log has a root operation, we don't need to create
the repo-initialization operation in order to create a valid
`ReadonlyRepo` instance. I think it's conceptually simpler to create
the instance at the root operation id and then add the initial
operation using the usual `Transaction` API. That's what this patch
does.
Doing that also brought two issues to light:
1. The empty view object doesn't have the root commit as head.
2. The initialized `OpHeadsStore` doesn't have the root operation as
head.
Both of those seem somewhat reasonable, but maybe we should change
them. For now, I just made the initial repo (before the initial
operation) have a single op head (to compensate for (2)). It might be
worth addressing both issues so the repo is in a better state before
we create the initial operation. Until we do, we probably shouldn't
drop the initial operation.
This is #3002 with tests rerun to account for changes
to `strsim`, as @thoughtpolice noticed in
https://github.com/martinvonz/jj/pull/3002#issuecomment-1936763101
The string similarity changes include an example that
seems better and one that seems worse. Decreasing
the threshold definitely makes things worse.
I was a bit surprised to learn (or be reminded?) that checking out
symlinks on Windows leads to a panic. This patch fixes the crash by
materializing symlinks from the repo as regular files. It also updates
the snapshotting code so we preserve the symlink-ness of a path. The
user can update the symlink in the repo by updating the regular file
in the working copy. This seems to match Git's behavior on Windows
when symlinks are disabled.
The `write_path_to_store()` has almost no overlapping code between the
handling of symlinks and regular files, which suggests that we should
move out the handling of symlinks to the caller (there's only one).
If the operation corresponding to a workspace is missing for some reason
(the specific situation in the test in this commit is that an operation
was abandoned and garbage-collected from another workspace), currently,
jj fails with a 255 error code. Teach jj a way to recover from this
situation.
When jj detects such a situation, it prints a message and stops
operation, similar to when a workspace is stale. The message tells the
user what command to run.
When that command is run, jj loads the repo at the @ operation (instead
of the operation of the workspace), creates a new commit on the @
commit with an empty tree, and then proceeds as usual - in particular,
including the auto-snapshotting of the working tree, which creates
another commit that obsoletes the newly created commit.
There are several design points I considered.
1) Whether the recovery should be automatic, or (as in this commit)
manual in that the user should be prompted to run a command. The user
might prefer to recover in another way (e.g. by simply deleting the
workspace) and this situation is (hopefully) rare enough that I think
it's better to prompt the user.
2) Which command the user should be prompted to run (and thus, which
command should be taught to perform the recovery). I chose "workspace
update-stale" because the circumstances are very similar to it: it's
symptom is that the regular jj operation is blocked somewhere at the
beginning, and "workspace update-stale" already does some special work
before the blockage (this commit adds more of such special work). But it
might be better for something more explicitly named, or even a sequence
of commands (e.g. "create a new operation that becomes @ that no
workspace points to", "low-level command that makes a workspace point to
the operation @") but I can see how this can be unnecessarily confusing
for the user.
3) How we recover. I can think of several ways:
a) Always create a commit, and allow the automatic snapshotting to
create another commit that obsoletes this commit.
b) Create a commit but somehow teach the automatic snapshotting to
replace the created commit in-place (so it has no predecessor, as viewed
in "obslog").
c) Do either a) or b), with the added improvement that if there is no
diff between the newly created commit and the former @, to behave as if
no new commit was created (@ remains as the former @).
I chose a) since it was the simplest and most easily reasoned about,
which I think is the best way to go when recovering from a rare
situation.
this greatly speeds up the time to run all tests, at the cost of slightly larger recompile times for individual tests.
this unfortunately adds the requirement that all tests are listed in `runner.rs` for the crate.
to avoid forgetting, i've added a new test that ensures the directory is in sync with the file.
## benchmarks
before this change, recompiling all tests took 32-50 seconds and running a single test took 3.5 seconds:
```
; hyperfine 'touch lib/src/lib.rs && cargo t --test test_working_copy'
Time (mean ± σ): 3.543 s ± 0.168 s [User: 2.597 s, System: 1.262 s]
Range (min … max): 3.400 s … 3.847 s 10 runs
```
after this change, recompiling all tests take 4 seconds:
```
; hyperfine 'touch lib/src/lib.rs ; cargo t --test runner --no-run'
Time (mean ± σ): 4.055 s ± 0.123 s [User: 3.591 s, System: 1.593 s]
Range (min … max): 3.804 s … 4.159 s 10 runs
```
and running a single test takes about the same:
```
; hyperfine 'touch lib/src/lib.rs && cargo t --test runner -- test_working_copy'
Time (mean ± σ): 4.129 s ± 0.120 s [User: 3.636 s, System: 1.593 s]
Range (min … max): 3.933 s … 4.346 s 10 runs
```
about 1.4 seconds of that is the time for the runner, of which .4 is the time for the linker. so
there may be room for further improving the times.
Mostly, I was a bit confused that some of these functions return a
`TrackingRefPair` but don't seem to take into account whether the remote
branch is being tracked or not.
Our virtual file system at Google (CitC) would like to know the commit
so it can scan backwards and find the closest mainline tree based on
it. Since we always record an operation id (which resolves to a
working-copy commit) when we write the working-copy state, it doesn't
seem like a restriction to require a commit.
The error output gets more verbose because all gix error sources are printed.
Maybe we'll need a better formatting, but changing to multi-line output doesn't
look nice either.
These error types are special because the message is embedded in ASCII art. I
think it would be a source of bugs if some error types had ": {source}" but
others don't. So I'm going to remove all ": {source}"s, and let the callers
concatenate them when needed.
This mostly reverts https://github.com/martinvonz/jj/pull/2901 as well as its
fixup https://github.com/martinvonz/jj/pull/2903. The related bug is reopened,
see https://github.com/martinvonz/jj/issues/2869#issuecomment-1920367932.
The problem is that while the fix did fix#2869 in most cases, it did
reintroduce the more severe bug https://github.com/martinvonz/jj/issues/2760
in one case, if the working copy is the commit being rebased.
For example, suppose you have the tree
```
root -> A -> B -> @ (empty) -> C
```
### Before this commit
#### Case 1
`jj rebase -s B -d root --skip-empty` would work perfectly before this
commit, resulting in
```
root -> A
\-------B -> C
\- @ (new, empty)
```
#### Case 2
Unfortunately, if you run `jj rebase -s @ -d A --skip-empty`, you'd have the
following result (before this commit), which shows the reintroduction of #2760:
```
root -> A @ -> C
\-- B
```
with the working copy at `A`. The reason for this is explained in
https://github.com/martinvonz/jj/pull/2901#issuecomment-1920043560.
### After this commit
After this commit, both case 1 and case 2 will be wrong in the sense of #2869,
but it will no longer exhibit the worse bug #2760 in the second case.
Case 1 would result in:
```
root -> A
\-------B -> @ (empty) -> C
```
Case 2 would result in:
```
root -> A -> @ -> C
\-- B
```
with the working copy remaining a descendant of A
`rebase_next()` returns an `Option<RebasedDescendant>`, but the only
way we use it is to decide whether to terminate the loop over
`to_visit`. Let's simplify by making the caller iterate over
`to_visit` instead.
The `edit` argument seems to be true if and only if the
old commit was *not* abandoned. So, I flipped its value
and renamed it to `abandoned_old_commit`.