The text pattern is applied prior to comparison as we do in Mercurial. This
might affect hunk selection, but is much faster than computing diff of full
file contents. For example, the following hunk wouldn't be caught by
diff_contains("a") because the line "b\n" is filtered out:
- a
b
+ a
Closes#2933
Since fileset and revset languages are syntactically close, we can reparse
revset expression as a fileset. This might sound a bit scary, but helps
eliminate nested quoting like file("~glob:'*.rs'"). One oddity exists in alias
substitution, though. Another possible problem is that we'll need to add fake
operator parsing rules if we introduce incompatibility in fileset, or want to
embed revset expressions in a fileset.
Since "file(x, y)" is equivalent to "file(x|y)", the former will be deprecated.
I'll probably add a mechanism to collect warnings during parsing.
For the same reason as the previous patch. I'm going to make DiffHunk leverage
BStr wrapper instead of custom Debug impl.
b"" literals in tests are changed to &str to get around type incompatibility
between &[u8; N].
This helps migrate internal [u8] variables to BStr.
b"" literals in tests are changed to &str to get around potential type
incompatibility between &[u8; N].
Adds support for revset functions `tracked_remote_branches()` and
`untracked_remote_branches()`. I think this would be especially useful
for configuring `immutable_heads()` because rewriting untracked remote
branches usually wouldn't be desirable (since it wouldn't update the
remote branch). It also makes it easy to hide branches that you don't
care about from the log, since you could hide untracked branches and
then only track branches that you care about.
As suggested by @crackcomm on discord, use eprintln!() to print warnings
to avoid messing up template output, e.g.:
jj --no-pager --ignore-working-copy show --tool true -T change_id -r rv...
rv...ignoring git submodule at "some/submodule"
Signed-off-by: Tim Janik <timj@gnu.org>
The return type T doesn't have to be a literal, and I'm going to use this
function to reparse fileset expression. We might also want to add another
expect_literal_with() helper that parses enum-like string value.
Maybe it'll also be good to keep RevsetExpression::Union(_) flattened, but
that's not needed to get around stack overflow. The constructed expression
tree is balanced.
test_expand_symbol_alias() is slightly adjusted since there are more than
one representation for "a|b|c" now.
Fixes#4031
Partially resolve a 1.5‐year‐old TODO comment.
Add opt‐in syntax for case‐insensitive matching, suffixing the
pattern kind with `-i`. Not every context supports case‐insensitive
patterns (e.g. Git branch fetch settings). It may make sense to make
this the default in at least some contexts (e.g. the commit signature
and description revsets), but it would require some thought to avoid
more confusing context‐sensitivity.
Make `mine()` match case‐insensitively unconditionally, since email
addresses are conventionally case‐insensitive and it doesn’t take
a pattern anyway.
This currently only handles ASCII case folding, due to the complexities
of case‐insensitive Unicode comparison and the `glob` crate’s lack
of support for it. This is unlikely to matter for email addresses,
which very rarely contain non‐ASCII characters, but is unfortunate
for names and descriptions. However, the current matching behaviour is
already seriously deficient for non‐ASCII text due to the lack of any
normalization, so this hopefully shouldn’t be a blocker to adding the
interface. An expository comment has been left in the code for anyone
who wants to try and address this (perhaps a future version of myself).
I don't think there's a possibility that uncommon_shared_words can become
non-empty by trimming the same amount of lines from both sides. Well, there's
an edge case regarding max_occurrences, but that shouldn't matter in practice.
Patience diff starts by lining up unique elements (e.g. lines) to find
matching segments of the inputs. After that, it refines the
non-matching segments by repeating the process. Histogram expands on
that by not just considering unique elements but by continuing with
elements of count 2, then 3, etc.
Before this commit, when diffing "a b a b b" against "a b a b a b", we
would match the two "a"s in the first input against the first two "a"s
in the second input. After this patch, we ignore the "a"s because
their counts differ, so we try to align the "b"s instead.
I have had this commit lying around since I wrote the histogram diff
implementation in 1e657c5331. I vaguely remember thinking that the
way I had implemented it (without this commit) was a bit weird, but I
wasn't sure if this commit would be an improvement or not. The bug
report from @chooglen today of a case where we behave differently from
Git is enough to make me think that we make this change after all.
#761
This is adapted from Breezy/Python patiencediff. AFAICT, Git implementation is
slightly different (and maybe more efficient?), but it's not super easy to
integrate with our diff logic. I'm not sure which one is better overall, but I
think the result is good so long as "uncommon LCS" matching is attempted first.
a9a3e4edc3/patiencediff/_patiencediff_py.py (L108)
This patch prevents some weird test changes that would otherwise be introduced
by the next patch.
Forgetting a workspace removes its working-copy commit, so it makes
sense for it to be abandoned if it is discardable just like editing a
new commit will cause the old commit to be abandoned if it is
discardable.
Currently, if two workspaces are editing the same discardable commit and
one of them switches to editing a different commit, it is abandoned even
though the other workspace is still editing it. This commit treats
workspaces as referencing their working-copy commits so that they won't
be abandoned.
At work, a user encountered a panic upon attempting to create a dir at
the line in the diff below, but it turned out to be difficult to debug
because I didn't know what the path was. There already is a mechanism to
add path context in the lib crate; make it available in the cli crate as
well, and use the mechanism to add path context to "workspace add".
Fixes#2476.
Previously, if there was a change id match within the short prefix
lookup set, `jj` would not look for commits with that same change id
outside the short prefix set. So, it wouldn't find the conflicted
commits for a commit with a divergent (AKA conflicted) change id.
It's common to create empty working-copy commits while using jj, and
currently the author timestamp for a commit is only set when it is first
created. If you create an empty commit, then don't work on a repo for a
few days, and then start working on a new feature without abandoning the
working-copy commit, the author timestamp will remain as the time the
commit was created rather than being updated to the time that work began
or finished.
This commit changes the behavior so that discardable commits (empty
commits with no description) by the current user have their author
timestamps reset when they are rewritten, meaning that the author
timestamp will become finalized whenever a commit is given a description
or becomes non-empty.
Since we've split (local, remotes) branches to (locals, remotes { branches }),
.has_branch() API no longer makes much sense. Callers often need to check if
a remote branch is tracked.
It was convenient that expression nodes can be compared in tests, but no
equivalence property is needed at runtime. Let's remove Eq/PartialEq to
simplify the extension support.
Most of the tests are migrated to insta::assert_debug_snapshot!(). Some of them
could use assert_matches!(), but the resulting code would look ugly because of
nested RC<_>s.
We use `heads_ok()` for finding the head operations when there are
multiple current op heads. The current DFS-based algortihm needs to
always walk all the way to the root. That can be expensive when the
operations are slow to retrieve. In the common case where there are
two operations close to each other in the graph, we should be able to
terminate the search once we've reached the common ancestor. This
patch replaces the DFS by a BFS and adds the early termination.
This allows users to easily filter a commit range by conflicts, which will be needed for `next/prev`
further down in the next commit. Users which benefit from it were also migrated.
The error message that says something like 'Workspace "default"
doesn't have a working copy' confused me when I saw it. The problem
it's describing is that the repo view doesn't have a working-copy
commit for the given workspace id. Saying "working-copy commit"
instead of "working copy" hopefully clarifies it a bit.
The function has no callers outside the module anymore (probably since
`MaterializeTreeValue` but I haven't checked). Inlining it will help
keep error handling simple in the next commit. Otherwise we'd need to
have it return an error type wrapping both `BackendError` and
`io::Error`.
I don't think the message adds anything over what the `BackendError`
itself provides, so let's use `transparent` instead.
I also dropped the `Internal` prefix from the variant because that
also didn't seem to add anything.
- make an internal set of watchman extensions until the client api gets
updates with triggers
- add a config option to enable using triggers in watchman
Co-authored-by: Waleed Khan <me@waleedkhan.name>
Still alias function shadows builtin function (of any arity) by name. This
allows to detect argument error as such, but might be a bit inconvenient if
user wants to overload heads() for example. If needed, maybe we can add some
config/revset syntax to import builtin function to alias namespace.
The functions table is keyed by name, not by (name, arity) pair. That's mainly
because std collections require keys to be Borrow, and a pair of borrowed
values is incompatible with owned pair. Another reason is it makes easy to look
up overloads by name.
Alias overloading could also be achieved by adding default parameters, but that
will complicate the implementation a bit more, and can't prevent shadowing of
0-ary immutable_heads().
Closes#2966
I'm going to add arity-based alias overloading, and we'll need function
(name, arity) pair to identify it in alias expansion stack. The exact parameter
names aren't necessary, but they can be embedded in error messages.
I've wanted to make the Git support optional for a long time. However,
since everyone uses the Git backend (and we want to support it even in
the custom binary at Google), there hasn't been much practical reason
to make Git support optional.
Since we now use jj-lib on the server at Google, it does make sense to
have the server not include Git support. In addition to making the
server binary smaller, it would make it easier for us (jj team at
Googlle) to prove that our server is not affected by some libgit2 or
Gitoxide vulnerability. But to be honest, neither of those problems
have come up, so it's more of an excuse to make the Git support
optional at this point.
It turned out to be much simpler than I expected to make Git support
in the lib crate optional. We have done a pretty good job of keeping
Git-related logic separated there.
If we make Git support optional in the lib crate, it's going to make
it a bit harder to move logic from the CLI crate into the lib crate
(as we have planned to do). Maybe that's good, though, since it helps
remind us to keep Git-related logic separated.
It's been more than 6 months since we added support for dynamically
selecting the working copy implementation. This patch drops support
for selecting the default implementation of that and other stores.
This will hopefully help remove PartialEq from RevsetExpression. Some assertions
are duplicated so that AST->RevsetExpression transformation is covered by tests.
As we discovered in the `jj fix` tests,
`MergedTreeBuilder::write_tree()` doesn't try to resolve conflicts,
not even trivial ones. This patch fixes that.
This changes the documented behavior of `resolve()` since it was
previously documented to not change the arity unless all conflicts
were resolved.
I plan to use `resolve()` from `MergedTreeBuilder::write_tree()`.
The current implementation of `resolve()` on legacy trees just
pretended any conflicts were regular files. It's better to error
out. The function is only used in tests so far.
The goal is to remove special case from parsing functions and provide slightly
better error message. I don't know if we'd want to use "all:" in aliases, but
there are no strong reasons to disable it.
This simplifies the interface of helper functions. While revset doesn't have
top-level string pattern or integer literal, these parsing helpers could be
used to parse array subscript or n-th parent operator if any.
This patch copies function rule processing from templater and fileset. Since
function and identifier rules are quite different, it's better to not rely on
the subtle difference between identifier and function_name tokens.
This will allows us to parse "file(..)" arguments as fileset expression by
transforming AST for example. I'm not sure if that's good or bad, but we'll
probably want to embed fileset expressions without quoting.
parse_expression_rule() is split to the first str->ExpressionNode stage and
the second ExpressionNode->RevsetExpression stage. The latter is called
"resolve_*()" in fileset, but we have another "symbol" resolution stage in
revset. So I choose "lower_*()" instead.
Otherwise, newly created default branch would be re-imported as a new Git HEAD.
This could be addressed by cmd_git_init(), but the same situation can be
crafted by using "git checkout -b".
See #2651 and a935a4f70c for more
background.
This speeds up `jj log` in a large repo with watchman enabled by around
9%:
```
$ hyperfine --sort command --warmup 3 --runs 20 -L bin \
jj-before,jj-after "target/release/{bin} -R ~/chromiumjj/src log"
Benchmark 1: target/release/jj-before -R ~/chromiumjj/src log
Time (mean ± σ): 788.3 ms ± 3.4 ms [User: 618.6 ms, System: 168.8 ms]
Range (min … max): 783.1 ms … 793.3 ms 20 runs
Benchmark 2: target/release/jj-after -R ~/chromiumjj/src log
Time (mean ± σ): 713.4 ms ± 5.2 ms [User: 536.1 ms, System: 176.2 ms]
Range (min … max): 706.6 ms … 724.7 ms 20 runs
Relative speed comparison
1.11 ± 0.01 target/release/jj-before -R ~/chromiumjj/src log
1.00 target/release/jj-after -R ~/chromiumjj/src log
```
Some backends, like the one we have at Google, can restrict access to
certain files. For such files, if they return a regular
`BackendError::ReadObject`, then that will terminate iteration in many
cases (e.g. when diffing or listing files). This patch adds a new
error variant for them to return instead, plus handling of such errors
in diff output and in the working copy.
In order to test the feature, I added a new commit backend that
returns the new `ReadAccessDenied` error when the caller tries to read
certain objects.
Merge commits are very similar to non-merge commits in jj. An empty
merge commit with no description is not really different from an empty
non-merge commit with no description. As we discussed on
https://github.com/martinvonz/jj/issues/2859, we should not treat
merge commits differently when updating away from them. This patch
adds test for the current behavior (which is to leave the merge commit
in place).
I recently needed to test something on top of a two branches at the
same time, so I created a new commit on top of both of them (i.e. a
merge commit). I then ran tests and made some adjustments to the
code. These adjustments belonged in one of the parent branches, so I
used `jj squash --into` to squash it in there. Unfortunately, that
meant that my working copy became a single-parent commit based on one
of the branches only. We already had #2859 for tracking this issue.
This patch changes the behavior so we create a new working-copy commit
with all of the previous parents.
I used .map_err(|_: Vec<_>|) to clarify that the original data is returned as
an error (so it can't be .unwrap()-ed.) However, it can be said that the error
detail isn't important and .map_err() is too verbose.
This accounts for commit message-only changes which do not change the
tree ID, as well as checkouts between commits with identical tree
contents.
See #3748 for previous work on avoiding resetting HEAD when the
*commits* are identical.
This shares some common code to both the root and the non-root case, and
provides easier access to `mut_repo` in the function - as
`set_git_head_target` mutably borrows `mut_repo`.
This will be used for a cleaner implementation of #3767.
This should be a no-op, though that is not necessarily obvious in corner
cases.
Note that libgit2 already performs the push negotiation even when
pushing without force (without `+` in the refspec).
As explained in the commit, our logic is a bit more complicated than
that of `git push --force-with-lease`. This is to match the behavior of
`jj git fetch` and branch conflict resolution rules.
We want to move UI-independent logic that's currently in `jj-cli` into
`jj-lib`. `WorkspaceCommandHelper` is perhaps the most important part
to start moving. As I was looking into what to move from
`WorkspaceCommandHelper`, the first thing I saw there was the
`cwd`. It might seem like a good candidate to start moving. However,
when running a server, you might be running operations on repos stored
in database, so `cwd` and the workspace root don't make sense then
(because the repo is not stored at a particular path).
So, instead, this patch starts abstracting out our uses of those two
paths by adding an enum for converting between `RepoPath` and paths as
they are presented in the UI. I added a variant for repos stored in a
file system, and made `WorkspaceCommandHelper` use that to show that
it works. We'll probably add a server variant later.
I put the new type in `repo_path.rs` because at least the
file-system-based implementation is closely related to
`RepoPath::parse_fs_path()`.
Before this patch, `MergedTreeBuilder::write_tree()` would create a
new legacy tree if the base tree was a legacy tree. This patch makes
it always write multiple root trees instead (if there are conflicts).
We still support reading legacy conflicts if the
`format.tree-level-conflicts` config is set.
It's been about six months since we started using tree-level conflicts
by default. I can't imagine we would switch back. So let's continue
the migration by always using tree-level conflicts when merging trees,
even if all inputs were legacy trees.
In chromium/src.git, this gives an approximate ~0.83s speedup for
commands if the Git index is empty, and a ~0.14s slowdown if the Git
index is non-empty.
As most users using jj will likely be using jj to write to a colocated
repo - and therefore avoid modifying the Git index - this should be a
general speedup for most colocated checkouts.
The original expand_node() body is migrated as follows:
- Identifier -> fold_identifier()
- FunctionCall -> fold_function_call()
expand_defn() now manages states stack by itself, which simplifies lifetime
parameters.
The templater implementation of FoldableExpression is a stripped-down version
of expand_node(). It's visitor-like because I'm going to write generic alias
substitution rules over abstract expression types (template, revset, fileset.)
Naming comes from rustc.
https://rust-unofficial.github.io/patterns/patterns/creational/fold.html
This is basically the same as the previous patch, but for error types. Some
of these functions could be encoded as "E: From<AliasExpandError<'i>>", but
alias substitution logic is recursive, so it would have to convert E back and
force.
I'm going to extract generic alias substitution functions, and these AST types
will be accessed there. Revset parsing will also be migrated to the generic
functions.
This will help extract common FunctionCallNode<'i, T> type. We don't need
freedom of arbitrary error type choices, but implementing From<_> is the
easiest option I can think of. Another option is to constrain error type by
the expression type T through "T::ParseError: ArgumentsParseError" or
something, but it seemed a bit weird that we have to use trait just for that.
This revset correctly implements "reachability" from a set of source commits following both parent and child edges as far as they can go within a domain set. This type of 'bfs' query is currently impossible to express with existing revset functions.
This follows up on https://github.com/martinvonz/jj/pull/3459 and adds a
label to the closing delimeter of each conflict, e.g. "Conflict 1 of 3
ends".
I didn't initially put any label at the ending delimeter since the
starting delimeter is already marked with "Conflict 1 of 3". However,
I'm now realizing that when I resolve conflicts, I usually go from top
to bottom. The first thing I do is delete the starting conflict
delimeter. It is when I get to the *end* of the conflict that I wonder
whether there are any more conflicts left in the file.
When I addded the workaround in 256988de65, I missed the comment
just below explaining that heads removed by the other side were
already handled. Since that's not handled when using non-default
indexes now, we need to handle it in an `else` block.
The function now returns an iterator over `Result`s, matching
`Operation::parents()`.
I updated callers to also propagate the error where it was trivial.
We do a 3-way merge of repo views in a few different
scenarios. Perhaps the most obvious one is when merging concurrent
operations. Another case is when undoing an operation.
One step of the 3-way repo view merge is to find which added commits
are rewrites of which removed commits (by comparing change IDs). With
the high commit rate in the Google repo (combined with storing the
commits on a server), this step can get extremely slow.
This patch adds a hack to disable the slow step when using a
non-standard `Index` implementation. The Google index is the only
non-standard implementation I'm aware of, so I don't think this will
affect anyone else. The patch is also small enough that I don't think
it will cause much maintenance overhead for the project.
parse_function_argument_to_file/string_pattern() functions aren't moved.
They are more like functions that transform parsed tree to intermediate
representation.
I'm planning to rewrite the parser in a similar way to fileset/template parsing,
and the revset_parser module will host functions and types for the first stage.
I haven't started the rewrite, but it seems good to split the revset module
even if we reject the idea.
This is more consistent with the other method and it makes some extension operations easier, by giving access to the OpStore and other relevant context for custom index extensions.
`RewriteType::Rewritten` must have exactly one replacement. I think
it's better to encode that in the type by attaching the value to the
enum variant. I also renamed the type to just `Rewrite` since it now
has attached data and `Type` sounds like a traditional data-free enum
to me.
Looks like I forgot this in some recent refactoring.
I don't really see any harm in making the type public later. I might
want to make `rebase_descendants()` not clear `parent_mapping` and
instead provide a way of accessing it afterwards (removing the need
for the `_return_map()` flavors). We'll see if that ends up
happening. For now it can be private anyway.
For example,
```
<<<<<<< Conflict 1 of 3
+++++++ Contents of side #1
left 3.1
left 3.2
left 3.3
%%%%%%% Changes from base to side #2
-line 3
+right 3.1
>>>>>>>
```
or
```
<<<<<<< Conflict 1 of 1
%%%%%%% Changes from base to side #1
-line 3
+right 3.1
+++++++ Contents of side #2
left 3.1
left 3.2
left 3.3
>>>>>>>
```
Currently, there is no way to disable these, this is TODO for a future
PR. Other TODOs for future PRs: make these labels configurable. After
that, we could support a `diff3/git`-like conflict format as well, in
principle.
Counting conflicts helps with knowing whether you fixed all the
conflicts while you are in the editor.
While labeling "side #1", etc, does not tell you the commit id or
description as requested in #1176, I still think it's an improvement.
Most importantly, I hope this will make `jj`'s conflict format less
scary-looking for new users.
I've used this for a bit, and I like it. Without the labels, I would see
that the two conflicts have a different order of conflict markers, but I
wouldn't be able to remember what that means. For longer diffs, it can
be tricky for me to quickly tell that it's a diff as opposed to one of
the sides. This also creates some hope of being able to navigate a
conflict with more than 2 sides.
Another not-so-secret goal for this is explained in
https://github.com/martinvonz/jj/pull/3109#issuecomment-2014140627. The
idea is a little weird, but I *think* it could be helpful, and I'd like
to experiment with it.
The format is 7 characters of the separator followed by a space and arbitrary
text, followed by a newline. Separator followed by a newline is also allowed.
E.g.:
<<<<<<< Random text
%%%%%%% Random text
line 2
-line 3
+left
line 4
+++++++ Random text
right
%%%%%%% Random text
line 2
+forward
line 3
line 4
>>>>>>> Random text
This commit only allows reading such conflicts.
I considered allowing longer separators (`<<<<<<<<<<<<<< Random text`), but we
wouldn't currently write them, so let's be strict for now.
7 characters if they are followed by a space and arbitrary text
We already have two uses for this function and I think we're soon
going to have more.
The function record the old commit as abandoned with the new parents,
which is typically what you want. We could record it as abandoned with
the old parents instead but then we'd have to do an extra iteration to
find the parents when rebasing any children. It would also be
confusing if
`rewriter.set_parents(new_parents).record_abandoned_commit()` didn't
respect the new parents.
Since fileset/revset/template expressions are specified as command-line
arguments, it's sometimes convenient to use single quotes instead of double
quotes. Various scripting languages parse single-quoted strings in various ways,
but I choose the TOML rule because it's simple and practically useful. TOML is
our config language, so copying the TOML syntax would be less surprising than
borrowing it from another language.
https://github.com/toml-lang/toml/issues/188
While I like strict parsing, it's not uncommon that we have to deal with file
names containing spaces, and doubly-quoted strings such as '"Foo Bar"' look
ugly. So, this patch adds an exception that accepts top-level bare strings.
This parsing rule is specific to command arguments, and won't be enabled when
loading fileset aliases.
When you use e.g. `git switch` to check out a conflicted commit,
you're going to end up with the `.jjconflicts-*` directories in your
working copy. It's probably not obvious what those mean. This patch
adds a README file to the root tree to try to explain to users what's
going on and how to recover.
The authoritative information about conflicts is stored in the
`jj:trees` commit header. The contents of conflicted commits is only
used for preventing GC. We can therefore add contents to the tree
without much consequence.