Commit graph

1035 commits

Author SHA1 Message Date
Martin von Zweigbergk
10c90a5099 merged_tree: propagate errors from conflict iterator 2024-11-23 13:53:04 -08:00
Scott Taylor
26f5d6150c conflicts: add "git" conflict marker style
Adds a new "git" conflict marker style option. This option matches Git's
"diff3" conflict style, allowing these conflicts to be parsed by some
external tools that don't support JJ-style conflicts. If a conflict has
more than 2 sides, then it falls back to the similar "snapshot" conflict
marker style.

The conflict parsing code now supports parsing Git-style conflict
markers in addition to the normal JJ-style conflict markers, regardless
of the conflict marker style setting. This has the benefit of allowing
the user to switch the conflict marker style while they already have
conflicts checked out, and their old conflicts will still be parsed
correctly.

Example of "git" conflict markers:

```
<<<<<<< Side  (Conflict 1 of 1)
fn example(word: String) {
    println!("word is {word}");
||||||| Base
fn example(w: String) {
    println!("word is {w}");
=======
fn example(w: &str) {
    println!("word is {w}");
>>>>>>> Side  (Conflict 1 of 1 ends)
}
```
2024-11-23 08:28:47 -06:00
Scott Taylor
d2b06b9cf9 conflicts: add "snapshot" conflict marker style
Adds a new "snapshot" conflict marker style which returns a series of
snapshots, similar to Git's "diff3" conflict style. The "snapshot"
option uses a subset of the conflict hunk headers as the "diff" option
(it just doesn't use "%%%%%%%"), meaning that the two options are
trivially compatible with each other (i.e. a file materialized with
"snapshot" can be parsed with "diff" and vice versa).

Example of "snapshot" conflict markers:

```
<<<<<<< Conflict 1 of 1
+++++++ Contents of side 
fn example(word: String) {
    println!("word is {word}");
------- Contents of base
fn example(w: String) {
    println!("word is {w}");
+++++++ Contents of side 
fn example(w: &str) {
    println!("word is {w}");
>>>>>>> Conflict 1 of 1 ends
}
```
2024-11-23 08:28:47 -06:00
Scott Taylor
e5cb9f94f6 conflicts: add "ui.conflict-marker-style" config
Adds a new "ui.conflict-marker-style" config option. The "diff" option
is the default jj-style conflict markers with a snapshot and a series of
diffs to apply to the snapshot. New conflict marker style options will
be added in later commits.

The majority of the changes in this commit are from passing the config
option down to the code that materializes the conflicts.

Example of "diff" conflict markers:

```
<<<<<<< Conflict 1 of 1
+++++++ Contents of side 
fn example(word: String) {
    println!("word is {word}");
%%%%%%% Changes from base to side 
-fn example(w: String) {
+fn example(w: &str) {
     println!("word is {w}");
>>>>>>> Conflict 1 of 1 ends
}
```
2024-11-23 08:28:47 -06:00
Scott Taylor
6e959fa12c conflicts: allow stripped trailing whitespace in diffs
Some editors strip trailing whitespace on save, which breaks any diffs
which have context lines, since the parsing function expects them to
start with a space. There's no visual difference between " \n" and "\n",
so it seems reasonable to accept both.
2024-11-22 18:00:05 -06:00
Scott Taylor
efacbcbd45 conflicts: demo failed parse of diff with empty line 2024-11-22 18:00:05 -06:00
Scott Taylor
9674852dc7 conflicts: allow CRLF line endings on conflict markers
Currently, conflict markers ending in CRLF line endings aren't allowed.
I don't see any reason why we should reject them, since some
editors/tools might produce CRLF automatically on Windows when saving
files, which would break the conflicts otherwise.
2024-11-22 18:00:05 -06:00
Scott Taylor
ee7f829d4c conflicts: demo failed parse of markers with CRLF 2024-11-22 18:00:05 -06:00
Yuya Nishihara
59a79fdcc0 conflicts: extract materialize_merge_result_to_bytes() helper
We have many callers of materialize_merge_result() who just want in-memory
buffer.
2024-11-21 10:50:37 +09:00
Yuya Nishihara
5cc0bd0950 rewrite: fix duplicated commits to be rebased onto destination
I believe this was an oversight. "jj duplicate" should duplicate commits (=
patches), not trees.

This patch adds a separate test file because test_rewrite.rs is pretty big, and
we'll probably want to migrate CLI tests to jj-lib.
2024-11-21 10:49:51 +09:00
Luke Randall
068fa0f37e revset: allow tags() to take a pattern for an argument
This makes it more consistent with `bookmarks()`.

Co-authored-by: Austin Seipp <aseipp@pobox.com>
2024-11-20 00:47:23 +00:00
Benjamin Tan
4db4f413a7 revset: add fork_point function
This can be used to find the fork point (best common ancestors) of a
revset with an arbitrary number of commits, which cannot be expressed
currently in the revset language.
2024-11-16 04:08:01 +08:00
Martin von Zweigbergk
de6da1a088 transaction: propagate errors from commit() 2024-11-13 23:05:24 -08:00
Yuya Nishihara
7be4904982 tests: fix flakiness in shallow Git repo test
This test reliably failed if I dropped tv_nsec part from statx().

Since we reload the repo now, several assertions get "fixed". I've added
index().has_id() test to clarify that it's still broken.
2024-11-12 20:27:51 +09:00
Yuya Nishihara
062a1bceb9 local_working_copy: on check out, skip entries conflicting with untracked dirs
This seems more consistent because file->directory conflicts are skipped.
2024-11-12 16:12:12 +09:00
Yuya Nishihara
f3a75c5c46 local_working_copy: on check out, ignore diff of Git submodule ids
This is different from skipped paths because the file state has to remain as
FileType::GitSubmodule in order to ignore the submodule directory when
snapshotting.

Fixes .
2024-11-12 16:12:12 +09:00
Yuya Nishihara
4983db563f local_working_copy: migrate Git submodule test to MergedTreeBuilder
I also removed tx.commit() because the test doesn't rely on the committed
operation.
2024-11-12 16:12:12 +09:00
Benjamin Tan
1aad724798 repo: remove MutableRepo::rebase_descendants_return_map
This function is merely a simple wrapper around
`MutableRepo::rebase_descendants_with_options_return_map`.
2024-11-12 14:00:00 +08:00
Yuya Nishihara
077bac8be1 annotate: add low-level function to specify starting file content
In "jj absorb", we'll need to calculate annotation from the parent tree. It's
usually identical to the tree of the parent commit, but this is not true for a
merge commit. Since I'm not sure how we'll process conflict trees in general,
this patch adds a minimal API to specify a single file content, not a
MergedTree.
2024-11-12 08:26:42 +09:00
Yuya Nishihara
85e0a8b068 annotate: add option to not search all ancestors of starting commit
The primary use case is to exclude immutable commits when calculating line
ranges to absorb. For example, "jj absorb" will build annotation of @ revision
with domain = mutable().
2024-11-12 08:26:42 +09:00
dploch
41631bc0e6 test_git: fix some clippy ref errors 2024-11-08 13:59:37 -05:00
Yuya Nishihara
62e4943c04 revset: reorganize expression resolution/evaluation methods
Both user and programmatic expressions use the same .evaluate() function now.
optimize() is applied globally after symbol resolution. The order shouldn't
matter, but it might be nicer because union of commit refs could be rewritten
to a single Commits(Vec<CommitId>) node.
2024-11-08 10:34:02 +09:00
Yuya Nishihara
e6ea88aac0 revset: add visitor-like tree rewriting function, reimplement symbol resolution
I'm going to add RevsetExpression<State> type parameter, but the existing tree
transformer can't rewrite nodes to different state because the input and the
output must be of the same type. (If they were of different types, we couldn't
reuse the input subtree by Rc::clone().) The added visitor API will handle
state transitions by mapping RevsetExpression::<St1>::<Kind> to
RevsetExpression::<St2>::<Kind>.

CommitRef and AtOperation nodes are processed by specialized methods because
these nodes will depend on the State type. OTOH, Present node won't be
State-dependent, so it's inspected by the common fold_expression() method.

An input expression is not taken as an &Rc<RevsetExpression> but a &_ because
we can't reuse the allocation behind the Rc.
2024-11-08 09:56:33 +09:00
Yuya Nishihara
ba76299818 tests: use platform path separator in symlink content
Appears that this was the reason why we got the error "The filename, directory
name, or volume label syntax is incorrect" on Windows CI.
2024-11-07 13:38:04 +09:00
Yuya Nishihara
adef815d1d tests: try both DOS and hashed NT short file names
For some unknown reasons, hashed 8.3 file name is chosen for ".jj" on Github
CI. Hashed ".git" short name is also added for consistency.
2024-11-07 13:38:04 +09:00
Yuya Nishihara
ded48ff6e7 local_working_copy: do not create file or write in directory named .jj or .git
I originally considered adding deny-list-based implementation, but the Windows
compatibility rules are super confusing and I don't have a machine to find out
possible aliases. This patch instead adds directory equivalence tests.

In order to test file entity equivalence, we first need to create a file or
directory of the requested name. It's harmless to create an empty .jj or .git
directory, but materializing .git file or symlink can temporarily set up RCE
situation. That's why new empty file is created to test the path validity. We
might want to add some optimization for safe names (e.g. ASCII, not contain
"git" or "jj", not contain "~", etc.)

That being said, I'm not pretty sure if .git/.jj in sub directory must be
checked. It's not safe to cd into the directory and run "jj", but the same
thing can be said to other tools such as "cargo". Perhaps, our minimum
requirement is to protect our metadata (= the root .jj and .git) directories.

Despite the crate name (and internal use of std::fs::File),
same_file::is_same_file() can test equivalence of directories. This is
documented and tested, so I've removed my custom implementation, which was
slightly simpler but lacks Windows support.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
f10c5db739 local_working_copy: skip existing symlinks consistently
If new file would overwrite an existing regular file, the file path is skipped.
It makes sense to apply the same rule to existing symlinks. Without this patch,
check out would fail if an existing path was a dead symlink or a symlink to
a directory.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
24ccfda781 local_working_copy: do not try to remove old file traversing symlinks
I'm not sure if this was attackable before, but it should be better to not
try to remove file across symlinks.

The disk_path is now returned from create_parent_dirs() to clarify that the
path is identical.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
8540536ea2 local_working_copy: detect error of file removal earlier
This should be safer than relying on file open error. It's scary to continue
processing if the file was a symlink.

I'll add a few more sanity checks to remove_old_file(), so it's extracted as a
function.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
1c30f3b3e8 repo_path: reject invalid path components by to_fs_path/name()
This addresses a simple path traversal attack.

I don't have a Windows machine, so the added Windows tests aren't checked
locally.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
739bf8decf repo_path: add stub for checked to_fs_path(), rename unchecked functions
I'm going to add "checked" version of to_fs_path(), but all callers can't be
migrated to it. For example, an error message should be produced even if the
path is malformed.

This patch also adds error variants to propagate InvalidRepoPathError. They
don't use ::Other { .. } so the errors can be distinguished in tests.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
e819cec305 revset: inline resolve/evaluate_programmatic() in tests
I'm going to replace the current .evaluate_programmatic() which does minimal
commit-ref resolution. The new .evaluate_programmatic() will be implemented on
a "resolved" expression.
2024-11-06 09:45:09 +09:00
Yuya Nishihara
e38f7b0594 revset: add RevsetExpression::present() as there's an external caller 2024-11-04 09:20:46 +09:00
Yuya Nishihara
7b5df93fe4 testutils: move default_store_factories() to TestEnvironment
It will capture the TestBackendData mapping.
2024-11-02 08:39:02 +09:00
Yuya Nishihara
d4786a3256 testutils: move load_repo_at_head() to TestEnvironment
It will depend on the TestBackendData mapping.
2024-11-02 08:39:02 +09:00
Yuya Nishihara
ab10b7c0a0 annotate: do not collect result lines into Vec, return Iterator instead
We might want to calculate (commit_id, range) pairs of consecutive lines in
order to "absorb" changes, for example.

This should also be cheaper since Vec<u8> doesn't have to be allocated per line.
2024-10-29 23:33:46 +09:00
Yuya Nishihara
bd1024547d annotate: use sorted Vec<(usize, usize)> to propagate lines to ancestors
This isn't so complicated compared to the HashMap version, and we can handle
multiple (cur, orig1), (cur, orig2) pairs. It's also cheaper to access.
2024-10-29 14:57:57 +09:00
Yuya Nishihara
b485881d50 tests: add basic tests for annotation function 2024-10-27 22:51:54 +09:00
Yuya Nishihara
a493913000 revset: propagate evaluation errors from other Revset methods
is_empty() could also return Result<bool, _>, but I think the current definition
is also good. If an error occurred, revset.iter() would return at least one
item, so it's not empty.
2024-10-22 09:03:53 +09:00
Martin von Zweigbergk
9d4a97381f rewrite: don't resolve intermediate parent tree when rebasing
Let's say we're updating one parent of a merge:


```
  E            E'
 /|\          /|\
B C D   ->   B C D'
 \|/          \|/
  A            A
```

When rebasing `E` to create `E'` there, we do that by merging the
changes compared to the auto-merged parents. The auto-merged parents
before is `B+(C-A)+(D-A)`, and after it's `B+(C-A)+(D'-A)`. Then we
rebase the diff, which gives us `E' = B+(C-A)+(D'-A) + (E -
(B+(C-A)+(D-A))) = D' + (E - D')`.

However, we currently don't do quite that simplification because we
first resolve conflicts when possible in the two auto-merged parent
trees (before and after). That rarely makes a difference to the
result, but it's wasteful to do it. It does make a difference in some
cases where our merge algorithm is lossy, which currently is only the
"A+(A-B)=A" case. I added a test case showing where it does make a
difference. It's a non-obvious cases but I think the new behavior is
more correct (the old behavior was a conflict).
2024-10-21 10:58:47 -07:00
Yuya Nishihara
3d31928dac revset: drop support for HEAD@git symbol resolution
This was added at f5f61f6bfe "revset: resolve 'HEAD@git' just like other
pseudo @git branches." As I said in this patch, there was no practical use case
of the HEAD@git symbol.

Suppose we implement colocated workspaces/worktrees , there may be multiple
Git HEAD revisions. This means HEAD can no longer be abstracted as a symbol of
the "git" remote.
2024-10-21 09:21:34 +09:00
dploch
49e9003c4e revset: allow iterators to return evaluation errors
Custom backends may rely on networking or other unreliable implementations to support revsets, this change allows them to return errors cleanly instead of panicking.

For simplicity, only the public-facing Revset and RevsetGraph types are changed in this commit; the internal revset engine remains mostly unchanged and error-free since it cannot generally produce errors.
2024-10-18 17:09:35 -04:00
Benjamin Tan
8e817bc24b revset: add coalesce(revsets...)
The `coalesce` function takes a list of revsets and returns the commits in the
first revset in the list which evalutes to a non-empty set of commits.

It can be used to display fallbacks if a certain commit cannot be found,
e.g. `coalesce(present(user_configured_trunk), builtin_trunk)`.
2024-10-16 10:36:27 +08:00
Yuya Nishihara
ad4b940daa object_id: implement Display on ObjectId types
It's convenient if id can be inlined in error messages.
2024-10-16 09:12:16 +09:00
Yuya Nishihara
59c635bfd0 object_id: add ChangeId::reverse_hex() for convenience
Borrowed from .
2024-10-16 09:12:16 +09:00
Lukas Wirth
9f16419202 git clone: Add depth argument 2024-10-14 20:01:08 +02:00
Lukas Wirth
802e3db27e git_backend: Support shallow git repositories 2024-10-14 20:01:08 +02:00
Yuya Nishihara
f166fd0726 revset: add at_operation(op, expression)
This can be used in order to refer old working-copy commit, for example. If
we find it's useful, maybe we can add an infix syntax later.

Closes 
2024-10-12 07:57:55 +09:00
Yuya Nishihara
09d91efea5 id_prefix: propagate error from disambiguation index
The id.shortest() template prints a warning and falls back to repo-global
resolution. This seems better than erroring out. There are a few edge cases
in which the short-prefixes resolution can fail unexpectedly. For example, the
trunk() revision might not exist in operations before "jj git clone".
2024-10-09 14:07:48 +09:00
Yuya Nishihara
3ff1f985f3 revset: pass separate repo to disambiguation index
The idea is that the disambiguation index can be loaded from a repo which is
different from the symbol resolution context.

Suppose we add at_operation(op, expr) revset, a symbol inside at_operation()
expression will have to be resolved within that operation, whereas the
disambiguation index is cached globally by WorkspaceCommandHelper. We could
build temporary disambiguation index for each at-op repo, but that would be
complicated implementation-wise, and wouldn't be useful. For example, a query
"x | at_operation(@-, x)" might be resolved to "xy | at_operation(@-, xz)"
if disambiguation index were reloaded for the @- operation. Instead, the
short change ID "x" can be disambiguated to "xy", then resolved to the
corresponding commit IDs at each operation.
2024-10-09 14:07:48 +09:00