loro/docs/JsonSchema.md
Leon Zhao 2df2a52b05
feat: Stable JSON representation for history (#368)
---------

Co-authored-by: Zixuan Chen <remch183@outlook.com>
2024-06-07 13:18:30 +08:00

11 KiB

JSON Schema for Loro's OpLog

Introduction

Loro supports multiple data structures and introduces many new concepts. Having only binary export formats would make it difficult for developers to understand the underlying processes. Better transparency leads to better developer experience. A human-readable JSON representation enables users to better understand and operate the document and to develop related tools.

To better understand this document, you may first need to understand how Loro stores historical editing data:

It should be noted that considering the usage scenario, JSON Schema only supports backward compatibility but not forward compatibility.

Specification

Root object

The root object contains all Changes, Ops, and critical metadata like start/end versions and schema version.

We will also extract the 64-bit integer PeerID to the beginning of the document and replace it internally with incrementing numbers starting from zero: 0, 1, 2, 3... This significantly reduces the document size and enhances readability.

{
    "schema_version": number,
    "start_version": Map<string, number>,
    "peers": string[],
    "changes": Change[],
}
  • schema_version: the version of the schema that the document is encoded with. It's 1 for the current specification.
  • start_version: the start Frontiers version of the document. They are represented as a map from the decimal string representation of PeerID to Counter.
  • peers: the list of peers in the document. We represent all PeerIDs as decimal strings to avoid exceeding JavaScript's number limit.
  • changes: the list of changes in the document.

Changes

Changes are crucial in the OpLog. A REG(Replay event graph) is a directed acyclic graph where each node is a Change, and each edge is a causal dependency between Changes. The metadata of the Changes helps us reconstruct the graph.

You can also attach a commit message to a Change like you usually do with Git's commit.

{
    "id": string,
    "timestamp": number,
    "deps": OpID[],
    "lamport": number,
    "msg": string,
    "ops": Op[]
}

type OpID = `${number}@${PeerID}`;
  • id: the string representation of the unique ID of each Change, in the form of {Counter}@{PeerID} which is the @ character connecting Counter and PeerID. Of course, This PeerID is the index of peers in the global context.
  • timestamp: the number of Unix timestamp when the change is committed. Timestamp is not recorded by default
  • deps: a list of causal dependency of this Change, each item is the ID represented by a string.
  • lamport: the lamport timestamp of the Change.
  • msg: the commit message.
  • ops: all of the Op in the Change.

Operations

Operation (abbreviated as Op) is the most complex part of the document. Loro currently supports multiple containers List, Map, RichText, Movable List and Movable Tree. Each data structure has several different Ops.

But in general, each Op is composed of the ContainerID of the container that created it, a counter, and the corresponding content of the Op.

type Op = {
    "container": ContainerID,
    "counter": number,
    "content": OpContent // Its detailed definition is elaborated below, with different types for different Containers.
};

type OpContent = ListOp | TextOp | MapOp | TreeOp | MovableListOp | UnknownOp;
type ContainerID =
  | `cid:root-${string}:${ContainerType}`
  | `cid:${number}@${PeerID}:${ContainerType}`;
  • container: the ContainerID of the container that created this Op, represented by a string starts with cid:.
  • counter: the counter part of the OpID
  • content: the semantic content of the Op, it is different for each field depending on the Container.

The following is the content of each container。

List

type ListOp = ListInsertOp | ListDeleteOp;

Insert

type ListInsertOp = {
    "type": "insert",
    "pos": number,
    "value": LoroValue
}
  • type: insert.
  • pos: the index of the insert operation.
  • value: the insert content which is a list of LoroValue

Delete

type ListDeleteOp = {
    "type": "delete",
    "pos": number,
    "len": number,
    "start_id": OpID
}
  • type: delete.
  • pos: the start index of the deletion.
  • len: the length of deleted content.
  • start_id: the string id of start element deleted.

MovableList

type MovableListOp = ListInsertOp | ListDeleteOp | MovableListMoveOp | MovableListSetOp;

Insert

type ListInsertOp = {
    "type": "insert",
    "pos": number,
    "value": LoroValue
}
  • type: insert,
  • pos: the index of the insert operation.
  • value: the insert content which is a list of LoroValue

Delete

type ListDeleteOp = {
    "type": "delete",
    "pos": number,
    "len": number,
    "start_id": OpID
}
  • type: delete
  • pos: the start index of the deletion.
  • len: the length of deleted content.
  • start_id: the string id of start element deleted.

Move

type MovableListMoveOp = {
    "type": "move",
    "from": number,
    "to": number,
    "elem_id": ElemID
}

type ElemID = `L${number}@${PeerID}`
  • type:insert, delete, move or set.
  • from: the index of the element before is moved.
  • to: the index of the index moved to after moving out the element
  • elem_id: the ID (described by lamport@peer) of the element moved.

Set

type MovableListSetOp = {
    "type": "set",
    "elem_id": ElemID,
    "value": LoroValue
}

type ElemID = `L${number}@${PeerID}`
  • type:insert, delete, move or set.
  • elem_id: the ID (described by lamport@peer) of the element replaced.
  • value: the value set.

Map

type MapOp = MapInsertOp | MapDeleteOp;

Insert

type MapInsertOp = {
    "type": "insert",
    "key": string,
    "value": LoroValue
}
  • type: insert.
  • key: the key of the insertion.
  • value: the value of the insertion.

Delete

type MapDeleteOp = {
    "type": "delete",
    "key": string
}
  • type: delete.
  • key: the key of the deletion

Text

type TextOp = TextInsertOp | TextDeleteOp | TextMarkOp | TextMarkEndOp;

Insert

type TextInsertOp = {
    "type": "insert",
    "pos": number,
    "text": string
}

type: insert. pos: the index of the insert operation. The position is based on the Unicode code point length. text: the string of the insertion.

Delete

type TextDeleteOp = {
    "type": "delete",
    "pos": number,
    "len": number,
    "start_id": OpID
}

type: delete. pos: the index of the deletion. The position is based on the Unicode code point length. len: the length of the text deleted. start_id: the string id of the beginning element deleted.

Mark

type TextMarkOp = {
    "type": "mark",
    "start": number,
    "end": number,
    "style_key": string,
    "style_value": LoroValue,
    "info": number
}

type: mark start: the start index of text need to mark. The position is based on the Unicode code point length. end: the end index of text need to mark. The position is based on the Unicode code point length. style_key: the key of style, it is customizable. style_value: the value of style, it is customizable. info: the config of the style, whether to expand the style when inserting new text around it.

MarkEnd

type TextMarkEndOp = {
    "type": "mark_end"
}

type: mark_end.

Tree

type TreeOp = TreeCreateOp | TreeMoveOp | TreeDeleteOp;

Create

type TreeCreateOp = {
    "type": "create",
    "target": TreeID,
    "parent": TreeID | null,
    "fractional_index": string
}

type TreeID = `${number}@${PeerID}`
  • type: create.
  • target: the string format of target TreeID moved.
  • parent: the string format of TreeID or null. If it is null, the target node will be a root node.
  • fractional_index: the fractional index with hex string format of the target node.

Move

type TreeMoveOp = {
    "type": "move",
    "target": TreeID,
    "parent": TreeID | null,
    "fractional_index": string
}

type TreeID = `${number}@${PeerID}`
  • type: move.
  • target: the string format of target TreeID moved.
  • parent: the string format of TreeID or null. If it is null, the target node will be a root node.
  • fractional_index: the fractional index with hex string format of the target node.

Delete

type TreeDeleteOp = {
    "type": "delete",
    "target": TreeID
}

type TreeID = `${number}@${PeerID}`
  • type: delete.
  • target: the string format of target TreeID deleted.

Unknown

To support forward compatibility, we have an unknown type. When an Op with a newly supported Container from a newer version is decoded into the older version, it will be treated as an unknown type in a more general form, such as binary and string. When the new version decodes an unknown Op, the newer version of Loro will know its true type and decode correctly.

type UnknownOp = {
    "type": "unknown",
    "prop": number,
    "value_type": string,
    "value": `${EncodeValue}`
}
  • type: just an unknown type.
  • prop: a property of the encoded op, it's a number.
  • value_type: the type of EncodeValue.
  • value: common data types used in encoding with json string format.

Value

In this section, we will introduction two Value in Loro. One is LoroValue, it's an enum of data types supported by Loro, such as the value inserted by List or Map.

The another is EncodedValue, it's just used in encoding module for unknown type.

LoroValue

These are data types supported by Loro and its json format:

  • null: null
  • Bool: true or false
  • F64: number(float)
  • I64: number or bigint (signed)
  • Binary: UInt8Array
  • String: string
  • List: Array<LoroValue>
  • Map: Map<string, LoroValue>
  • Container: the id of container. 🦜:cid:{Counter}@{PeerID}:{ContainerType} or 🦜:cid:root-{Name}:{ContainerType}

Note: Compared with the string format, we add a prefix 🦜: when encoding the json format of ContainerID to prevent users from saving the string format of ContainerID and misinterpreting it as ContainerID when decoding.

EncodedValue

The EncodedValue is the specific type used by Loro when encoding, it's an internal value, users do not need to get it clear. It is specially designed to handle the schema mismatch due to forward and backward compatibility. In JSON encoding schema, the EncodedValue will be encoded as an object.