This doc describes what we need to consider in a submodule storage solution, some possible solutions and what criteria we should use to decide on a future direction. This is still a WIP: - The solutions are still underdescribed - The actual evaluation of solutions is missing Suggestions for the above are welcome :)
5.4 KiB
Git submodules
This is an aspirational document that describes how jj will support Git submodules. Readers are assumed to have some familiarity with Git and Git submodules.
This document is a work in progress; submodules are a big feature, and relevant details will be filled in incrementally.
Objective
This proposal aims to replicate the workflows users are used to with Git submodules, e.g.:
- Cloning submodules
- Making new submodule commits and updating the superproject
- Fetching and pushing updates to the submodule's remote
- Viewing submodule history
When it is convenient, this proposal will also aim to make submodules easier to use than Git's implementation.
Non-goals
- Non-Git 'submodules' (e.g. native jj submodules, other VCSes)
- Non-Git backends (e.g. Google internal backend)
- Changing how Git submodules are implemented in Git
Background
We mainly want to support Git submodules for feature parity, since Git submodules are a standard feature in Git and are popular enough that we have received user requests for them. Secondarily (and distantly so), Git submodules are notoriously difficult to use, so there is an opportunity to improve the UX over Git's implementation.
Intro to Git Submodules
Git submodules are a feature of Git that allow a repository (submodule) to be embedded inside another repository (the superproject). Notably, a submodule is a full repository, complete with its own index, object store and ref store. It can be interacted with like any other repository, regardless of the superproject.
In a superproject commit, submodule information is captured in two places:
-
A
gitlink
entry in the commit's tree, where the value of thegitlink
entry is the submodule commit id. This tells Git what to populate in the working tree. -
A top level
.gitmodules
file. This file is in Git's config syntax and entries take the formsubmodule.<submodule-name>.*
. These include many settings about the submodules, but most importantly:-
submodule<submodule-name>.path
contains the path from the root of the tree to thegitlink
being described. -
submodule<submodule-name>.url
contains the url to clone the submodule from.
-
In the working tree, Git notices the presence of a submodule by the .git
entry
(signifying the root of a Git repository working tree). This is either the
submodule's actual Git directory (an "old-form" submodule), or a .git
file
pointing to <superproject-git-directory>/modules/<submodule-name>
. The latter
is sometimes called the "absorbed form", and is Git's preferred mode of
operation.
Roadmap
Git submodules should be implemented in an order that supports an increasing set of workflows, with the goal of getting feedback early and often. When support is incomplete, jj should not crash, but instead provide fallback behavior and warn the user where needed.
The goal is to land good support for pure Jujutsu repositories, while colocated repositories will be supported when convenient.
This section should be treated as a set of guidelines, not a strict order of work.
Phase 1: Readonly submodules
This includes work that inspects submodule contents but does not create new objects in the submodule. This requires a way to store submodules in a jj repository that supports readonly operations.
Outcomes
- Submodules can be cloned anew
- New submodule commits can be fetched
- Submodule history and branches can be viewed
- Submodule contents are populated in the working copy
- Superproject gitlink can be updated to an existing submodule commit
- Conflicts in the superproject gitlink can be resolved to an existing submodule commit
Phase 2: Snapshotting new changes
This allows a user to write new contents to a submodule and its remote.
Outcomes
- Changes in the working copy can be recorded in a submodule commit
- Submodule branches can be modified
- Submodules and their branches can be pushed to their remote
Phase 3: Merging/rebasing/conflicts
This allows merging and rebasing of superproject commits in a content-aware way (in contrast to Git, where only the gitlink commit ids are compared), as well as workflows that make resolving conflicts easy and sensible.
This can be done in tandem with Phase 2, but will likely require a significant amount of design work on its own.
Outcomes
- Merged/rebased submodules result in merged/rebased working copy content
- Merged/rebased working copy content can be committed, possibly by creating sensible merged/rebased submodule commits
- Merge/rebase between submodule and non-submodule gives a sensible result
- Merge/rebase between submodule A and submodule B gives a sensible result
Phase ?: An ideal world
I.e. outcomes we would like to see if there were no constraints whatsoever.
- Rewriting submodule commits rewrites descendants correctly and updates superproject gitlinks.
- Submodule conflicts automatically resolve to the 'correct' submodule commits, e.g. a merge between superproject commits creating a merge of the submodule commits.
- Nested submodules are as easy to work with as non-nested submodules.
- The operation log captures changes in the submodule.
Design
Guiding principles
TODO
Storing submodules
Possible approaches under discussion. See ./git-submodule-storage.md.
Snapshotting new submodule changes
TODO
Merging/rebasing with submodules
TODO