ok/jj
1
0
Fork 0
forked from mirrors/jj
Commit graph

118 commits

Author SHA1 Message Date
Anton Bulakh
e3a1e5b80e sign: Implement storage for digital commit signatures
Recognize signature metadata from git commit objects, implement
a basic version of that for the native backend.
Extract the signed data (a commit binary repr without the signature) to
be verified later.
2023-11-12 03:37:13 +02:00
Yuya Nishihara
b42a69db6d git_backend: configure committer (and author) of gix::Repository
Otherwise, ref updates would fail if we port git::export_refs() to gitoxide.
This change isn't strictly needed for the backend itself, but we'll reuse the
gix::Repository instance created by the backend when importing and exporting
Git refs.
2023-11-11 22:35:54 +09:00
Yuya Nishihara
ea32c0cb9e git_backend: pass UserSettings to GitBackend constructors 2023-11-11 22:35:54 +09:00
Yuya Nishihara
e0c35684af merge: rename Merge::new() to Merge::from_removes_adds()
Since (removes, adds) pair is no longer the canonical representation of Merge,
the name Merge::new() seems too generic. Let's give more verbose name.
2023-11-07 17:10:12 +09:00
Martin von Zweigbergk
d989d4093d merged_tree: let backend influence whether to use new diff algo
Since the concurrent diff algorithm is significantly slower when using
the Git backend, I think we'll have to use switch between the two
algorithms depending on backend. Even if the concurrent version always
performed as well as the sequential version, exactly how concurrent it
should be probably still depends on the backend. This commit therefore
adds a function to the `Backend` trait, so each backend can say how
much concurrency they deal well with. I then use that number for
choosing between the sequential and concurrent versions in
`MergedTree::diff_stream()`, and also to decide the number of
concurrent reads to do in the concurrent version.
2023-11-06 23:12:02 -08:00
Yuya Nishihara
d9fbf21794 merge: have Merge::adds()/removes() return iterator
The Merge type will be changed to store interleaved values internally.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
b12c688ea0 merge: add method for indexed adds/removes access
The current adds()/removes() will be changed to return iterators.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
602b44258e workspace: add function that initializes colocated git repository
One less git2 API use in CLI.

The function name GitBackend::init_colocated() is a bit odd, but we need to
specify the work-tree path, not the ".git" repo path. So we can't eliminate
the notion of the working copy path anyway.
2023-11-05 08:48:35 +09:00
Yuya Nishihara
ce46c10c96 git_backend: extract inner function that initializes backend with open git repo 2023-11-05 08:48:35 +09:00
Martin von Zweigbergk
24b706641f async: switch to pollster's block_on()
During the transition to using more async code, I keep running into
https://github.com/rust-lang/futures-rs/issues/2090. Right now, I want
to convert `MergedTree::diff()` into a `Stream`. I don't want to
update all call sites at once, so instead I'm adding a
`MergedTree::diff_stream()` method, which just wraps
`MergedTree::diff()` in a `Stream. However, since the iterator is
synchronous, it needs to block on the async `Backend::read_tree()`
calls. If we then also block on the `Stream` in the CLI, we run into
the panic.
2023-11-03 08:15:10 -07:00
Yuya Nishihara
162dcd49b4 cli: rewrite base GitIgnoreFile lookup to use gitoxide instead of libgit2
Since gix::Repository::config_snapshot() borrows the repo instance, it has to
be allocated in caller's stack. That's why GitBackend::git_config() is removed.
2023-11-02 19:33:06 +09:00
Yuya Nishihara
c88e69ad6f git_backend: replace git2::Repository with gix::Repository
My gut feeling is that gitoxide aims to be more transparent than libgit2. We'll
need to know more about the underlying Git data model.

Random comments on gix API:

 * gix::Repository provides API similar to git2::Repository, but has less
   "convenient" functions. For example, we need to use .find_object() +
   .try_to/into_<kind>() instead of .find_<kind>().
 * gix::Object, Blob, etc. own raw data as bytes. gix::object and gix::objs
   types provide high-level views on such data.
 * Tree building is pretty low-level compared to git2.
 * gix leverages bstr (i.e. bytes) extensively.

It's probably not difficult to migrate git::import/export_refs(). It might
help eliminate the startup overhead of libssl initialization. The gix-based
GitBackend appears to be a bit faster, but that wouldn't practically matter.

#2316
2023-11-02 19:33:06 +09:00
Yuya Nishihara
f5a61dc2b7 git_backend: open just-initialized repo with canonicalized path
Otherwise, the initialized repo could have a different work-dir path than the
load()-ed one. libgit2 appears to do some normalization somewhere, but gix
won't.
2023-11-02 19:33:06 +09:00
Yuya Nishihara
fd187d266f git_backend: box GitBackendInit/LoadError up front
These error enums will wrap gix error types, and will become bigger enough for
clippy to complain.
2023-11-02 19:33:06 +09:00
Yuya Nishihara
1788b5014e git_backend: remove redundant copy back of author timestamp
Only the committer timestamp can be updated inside a loop.
2023-10-31 06:51:27 +09:00
Yuya Nishihara
f5aa739c70 git_backend: use .strip_suffix() instead of manual slicing 2023-10-31 06:51:27 +09:00
Yuya Nishihara
9bd84c55e0 git_backend: use file mode extensively in read_tree()
Both filemode() and kind() are calculated from the same underlying data,
and kind() is libgit2-specific API.
2023-10-31 06:51:27 +09:00
Yuya Nishihara
b3c9cab12d git_backend: handle read_tree() lookup/encoding errors gracefully 2023-10-31 06:51:27 +09:00
Yuya Nishihara
847adc832f git_backend: use lossy conversion to decode non-UTF-8 commit message
If message() returned None, it doesn't mean the commit message is empty. I
originally mapped it to an error, but that made import of linux repo fail.

https://docs.rs/git2/latest/git2/struct.Commit.html#method.message
2023-10-31 06:51:27 +09:00
Yuya Nishihara
06c254e742 git_backend: use non-owned str::from_utf8() to decode symlink target
Just for consistency with the other changes. str::Utf8Error is 2 words long,
so I removed the boxing.
2023-10-31 06:51:27 +09:00
Yuya Nishihara
d1c71c05c9 git_backend: remove redundant error handling for invalid hash length
The only error that could be returned by libgit2 is invalid hash length, and
we check that explicitly. If we switch the backends to gitoxide, there will be
panicking constructor.

https://docs.rs/git2/latest/git2/struct.Oid.html#method.from_bytes
2023-10-31 06:51:27 +09:00
Martin von Zweigbergk
cfcdd71865 backend: make read_conflict synchronous again
This avoids https://github.com/rust-lang/futures-rs/issues/2090. I
don't think we need to worry about reading legacy conflicts
asynchronously - async is really only useful for Google's backend
right now, and we don't use the legacy format at Google. In
particular, I don't want `MergedTree::value()` to have to be async.
2023-10-28 16:45:40 -07:00
Martin von Zweigbergk
578d61ec2e git: use forward slashes in relative path to backing git repo
Closes #2403.
2023-10-20 22:51:14 -05:00
Martin von Zweigbergk
f541f9f3a6 cleanup: import futures::exectutor::block_on() instead of qualifying
It seems we'll end up using `block_on()` quite a bit, at least until
we're done transitioning to async, and the function name doesn't
conflict with anything else, so let's always import it when we need
it.
2023-10-20 07:38:34 -07:00
Martin von Zweigbergk
f8be0b2030 backends: deduplicate definition of backend names
I copied the example set by `DefaultSubmoduleStore`.
2023-10-14 06:38:35 -07:00
Martin von Zweigbergk
5174489959 backend: make read functions async
The commit backend at Google is cloud-based (and so are the other
backends); it reads and writes commits from/to a server, which stores
them in a database. That makes latency much higher than for disk-based
backends. To reduce the latency, we have a local daemon process that
caches and prefetches objects. There are still many cases where
latency is high, such as when diffing two uncached commits. We can
improve that by changing some of our (jj's) algorithms to read many
objects concurrently from the backend. In the case of tree-diffing, we
can fetch one level (depth) of the tree at a time. There are several
ways of doing that:

 * Make the backend methods `async`
 * Use many threads for reading from the backend
 * Add backend methods for batch reading

I don't think we typically need CPU parallelism, so it's wasteful to
have hundreds of threads running in order to fetch hundreds of objects
in parallel (especially when using a synchronous backend like the Git
backend). Batching would work well for the tree-diffing case, but it's
not as composable as `async`. For example, if we wanted to fetch some
commits at the same time as we were doing a diff, it's hard to see how
to do that with batching. Using async seems like our best bet.

I didn't make the backend interface's write functions async because
writes are already async with the daemon we have at Google. That
daemon will hash the object and immediately return, and then send the
object to the server in the background. I think any cloud-based
solution will need a similar daemon process. However, we may need to
reconsider this if/when jj gets used on a server with a custom backend
that writes directly to a database (i.e. no async daemon in between).

I've tried to measure the performance impact. That's the largest
difference I've been able to measure was on `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo,
which increases from 749 ms to 773 ms (3.3%). In most cases I've
tested, there's no measurable difference. I've tried diffing from the
root commit, as well as `jj --ignore-working-copy log --no-graph -r
'::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a
commit-heavy load).
2023-10-08 23:36:49 -07:00
Martin von Zweigbergk
bd5eef9c5e git_backend: rename some store variables backend in tests
This is to avoid confusion with instances of the `Store` type.
2023-10-08 23:36:49 -07:00
Yuya Nishihara
f0ad1f53ea git_backend: on read_commit(), fall back to importing extras as needed
One problematic scenario is that we have commits imported by old jj, and all
of their descendant commits are created by jj. Therefore import_head_commits()
wouldn't reach the old ancestor commits.

This change might bury a real bug, but I don't have a better alternative. Maybe
we can remove this hack after a couple of jj releases, and add a debug command
that imports all reachable Git commits from all historical heads.

Closes #2343
2023-10-07 02:08:36 +09:00
Yuya Nishihara
7c96cead34 git_backend: rename git_repo_clone() as it isn't just cloning, propagate error
Since git2::Repository::open() will access to the filesystem, it can technically
fail.
2023-10-04 00:04:24 +09:00
Yuya Nishihara
837dc4f47c git_backend: rewrite remaining git_repo() callers, make it private
While debugging git issues, I often ended up creating a deadlock by adding
debug prints. It's also not obvious that git::export_refs() works even if the
git_repo() has already been locked, whereas git::import_refs() wouldn't. Let's
consolidate lock handling to the backend implementation.
2023-10-04 00:04:24 +09:00
Yuya Nishihara
0d63223dad git_backend: proxy git2::Repository methods to manage lock scope internally
Since git_repo() acquires Mutex, it's super easy to create a deadlock.
2023-10-04 00:04:24 +09:00
Martin von Zweigbergk
d575aaeca8 backend: move constant functions first
`root_commit_id()`, `root_change_id()`, and `empty_tree_id()` were
strangely ordered between `write_symlink()` and `read_tree().
2023-09-19 05:24:51 -07:00
Yuya Nishihara
d05aaf30b4 git: promote imported commits to tree-conflict format if configured 2023-09-08 21:54:43 +09:00
Yuya Nishihara
f744e5920b git: create no-gc ref by GitBackend
Since we now have an explicit method to import heads, it makes more sense to
manage no-gc refs by the backend.
2023-09-08 21:54:43 +09:00
Yuya Nishihara
17f502c83a git: do not import unknown commits by read_commit(), use explicit method call
The main goal of this change is to enable tree-level conflict format, but it
also allows us to bulk-import commits on clone/init. I think a separate method
will help if we want to provide progress information, enable check for
.jjconflict entries under certain condition, etc.

Since git::import_refs() now depends on GitBackend type, it might be better to
remove git_repo from the function arguments.
2023-09-08 21:54:43 +09:00
Martin von Zweigbergk
7e6930b56f backend: remove last few instances of MergedTreeId::as_legacy_tree_id() 2023-08-30 06:17:21 -07:00
Martin von Zweigbergk
e4ba6a42fc backends: store tree id conflicts as list with alternating signs
Now that we have `Merge::iter()` and friends, it's simpler to store
the tree ids in a single list.
2023-08-26 07:02:04 -07:00
Martin von Zweigbergk
fd4146d485 backend: use new enum for Commit::root_tree
We currently represent the root tree id in a commit by `Merge<TreeId>`
plus a boolean `uses_tree_conflict_format`. It's better to use an enum
for that. That makes it harder to forget to check which type of tree
it is, and it makes it impossible to store a legacy tree with multiple
ids (as we could with `uses_tree_conflict_format=false`,
`root_tree=Merge::new(...)`).

Maybe more importantly, we're also going to want to pass around this
information in most places where we currently pass a single `TreeId`,
and passing two separate values would be annoying.
2023-08-26 07:02:04 -07:00
Martin von Zweigbergk
589e0db3c5 git_backend: remove unused proto field for resolved tree id
We store resolved tree ids in the regular Git commit, so we we never
ended up using the `resolved` variant in the `root_tree`.
2023-08-26 07:02:04 -07:00
Waleed Khan
134d85e635 backend: reduce BackendError size somewhat
One of the error types that I later created embedded `BackendError`, but `clippy` complained that the size of the type was too large. This helps address that.
2023-08-23 21:11:15 -07:00
Emily Fox
2c88da02b4 git: teach backend to handle empty name and email strings 2023-08-18 17:22:59 -05:00
Martin von Zweigbergk
ef5f97f8d7 conflicts: move Merge<T> to merge module
The `merge` module now seems like the obvious place for this type.
2023-08-06 22:08:09 +00:00
Martin von Zweigbergk
ecc030848d conflicts: rename Conflict<T> to Merge<T>
Since `Conflict<T>` can also represent a non-conflict state (a single
term), `Merge<T>` seems like better name.

Thanks to @ilyagr for the suggestion in
https://github.com/martinvonz/jj/pull/1774#discussion_r1257547709

Sorry about the churn. It would have been better if I thought of this
name before I introduced `Conflict<T>`.
2023-08-06 22:08:09 +00:00
Martin von Zweigbergk
006c764694 backend: learn to store tree-level conflicts
Tree-level conflicts (#1624) will be stored as multiple trees
associated with a single commit. This patch adds support for that in
`backend::Commit` and in the backends.

When the Git backend writes a tree conflict, it creates a special root
tree for the commit. That tree has only the individual trees from the
conflict as subtrees. That way we prevent the trees from getting
GC'd. We also write the tree ids to the extra metadata table
(i.e. outside of the Git repo) so we don't need to load the tree
object to determine if there are conflicts.

I also added new flag to `backend::Commit` indicating whether the
commit is a new-style commit (with support for tree-level
conflicts). That will help with the migration. We will remove it once
we no longer care about old repos. When the flag is set, we know that
a commit with a single tree cannot have conflicts. When the flag is
not set, it's an old-style commit where we have to walk the whole tree
to find conflicts.
2023-07-19 22:04:16 -07:00
Waleed Khan
54dba51a08 docs: warn about missing docs for jj-lib crate 2023-07-10 18:28:59 +03:00
Yuya Nishihara
5346bd734f git_backend: translate io::Error of read_conflict() to ReadObject error
This is the last place in Git backend where io::Error is magically converted
to BackendError::Other.
2023-07-06 20:48:46 +09:00
Yuya Nishihara
4e4ca46998 git_backend: wrap TableStoreError to preserve source error object 2023-07-06 20:48:46 +09:00
Yuya Nishihara
cf8a0466c4 backend: introduce error types specific to init/load phases
Errors that may occur while loading backend would vary per backends, and
it's unlikely that these errors could be mapped to BackendError variants
other than BackendError::Other. So let's extract Other(_) of that kind as
a separate type to clarify there would be no other error variants.

Perhaps, Backend/Error will be renamed to CommitBackend/Error or
CommitStore/Error?, whereas I think BackendInit/LoadError can be shared
among store factories.
2023-07-06 20:48:46 +09:00
Yuya Nishihara
e1e75daa8e backend: make BackendError::Other preserve source error object 2023-07-06 20:48:46 +09:00
Yuya Nishihara
5b78fe75b1 git_backend: propagate load() error to caller
#1794
2023-07-06 12:43:49 +09:00