Git

gitformat-commit-graph 最后更新于 2.47.0

名称

gitformat-commit-graph - Git commit-graph 格式

概要

$GIT_DIR/objects/info/commit-graph
$GIT_DIR/objects/info/commit-graphs/*

描述

Git commit-graph 存储了一系列 commit OID 和一些相关元数据，包括

commit 的生成号。
根树 OID。
commit 日期。
commit 的父提交，使用图文件内的位置引用存储。
如果请求，commit 携带的、在 commit 及其第一个父提交之间发生更改的路径的 Bloom 过滤器。

这些位置引用存储为无符号 32 位整数，对应于 commit OID 列表中的数组位置。由于我们使用了一些特殊常量来跟踪父提交，我们可以存储最多 (1 << 30) + (1 << 29) + (1 << 28) - 1（约 18 亿）个 commit。

Commit-graph 文件的格式如下

为了允许扩展添加额外数据到图，我们将主体组织成“块”，并在主体的开头提供一个二进制查找表。头部包含某些值，例如块的数量和哈希类型。

所有多字节数字均采用网络字节序。

头部

4-byte signature:
    The signature is: {'C', 'G', 'P', 'H'}

1-byte version number:
    Currently, the only valid version is 1.

 1-byte Hash Version
     We infer the hash length (H) from this value:
1 => SHA-1
2 => SHA-256
     If the hash type does not match the repository's hash algorithm, the
     commit-graph file should be ignored with a warning presented to the
     user.

1-byte number (C) of "chunks"

1-byte number (B) of base commit-graphs
    We infer the length (H*B) of the Base Graphs chunk
    from this value.

块查找

(C + 1) * 12 bytes listing the table of contents for the chunks:
    First 4 bytes describe the chunk id. Value 0 is a terminating label.
    Other 8 bytes provide the byte-offset in current file for chunk to
    start. (Chunks are ordered contiguously in the file, so you can infer
    the length using the next chunk position if necessary.) Each chunk
    ID appears at most once.

The CHUNK LOOKUP matches the table of contents from
the chunk-based file format, see gitformat-chunk[5]

The remaining data in the body is described one chunk at a time, and
these chunks may be given in any order. Chunks are required unless
otherwise specified.

块数据

OID Fanout (ID: {O, I, D, F}) (256 * 4 字节)

The ith entry, F[i], stores the number of OIDs with first
byte at most i. Thus F[255] stores the total
number of commits (N).

OID 查找 (ID: {O, I, D, L}) (N * H 字节)

The OIDs for all commits in the graph, sorted in ascending order.

Commit 数据 (ID: {C, D, A, T }) (N * (H + 16) 字节)

前 H 字节用于存储根树的 OID。
接下来的 8 字节用于存储第 i 个 commit 的前两个父提交的位置。当该位置没有父提交时，存储值 0x70000000。如果存在两个以上的父提交，第二个值最高有效位将置为 1，其他位存储一个数组位置，指向 Extra Edge List 块。
接下来的 8 字节存储 commit 的拓扑级别（生成号 v1）以及自 EPOCH 以来的秒数。生成号使用前 4 字节的更高 30 位，而 commit 时间使用第二个 4 字节的 32 位，以及最低字节的最低 2 位，用于存储 commit 时间的第 33 位和第 34 位。

Generation Data (ID: {G, D, A, 2 }) (N * 4 字节) [可选]

此 4 字节值列表按与 commit 数据块相同的顺序存储 commit 的已更正提交日期偏移量。
如果已更正的提交日期偏移量无法存储在 31 位内，则该值最高有效位将置为 1，其他位存储已更正的提交日期在 Generation Data Overflow 块中的位置。
Generation Data 块仅在 commit-graph 文件由兼容版本的 Git 写入时存在，并且在分片 commit-graph 链的情况下，最顶层也包含 Generation Data 块。

Generation Data Overflow (ID: {G, D, O, 2 }) [可选]

此 8 字节值列表存储无法在 31 位内存储的已更正 commit 日期偏移量的 commit 的已更正 commit 日期偏移量。
Generation Data Overflow 块仅在 Generation Data 块存在且至少一个已更正的提交日期偏移量无法在 31 位内存储时存在。

Extra Edge List (ID: {E, D, G, E}) [可选]

This list of 4-byte values store the second through nth parents for
all octopus merges. The second parent value in the commit data stores
an array position within this list along with the most-significant bit
on. Starting at that array position, iterate through this list of commit
positions for the parents until reaching a value with the most-significant
bit on. The other bits correspond to the position of the last parent.

Bloom Filter Index (ID: {B, I, D, X}) (N * 4 字节) [可选]

第 i 个条目 BIDX[i] 存储从 commit 0 到 commit i（含）的所有 Bloom 过滤器的字节数，按字典顺序排列。第 i 个 commit 的 Bloom 过滤器从 BIDX[i-1] 到 BIDX[i]（加上头部长度），其中 BIDX[-1] 为 0。
如果不存在 BDAT 块，则 BIDX 块将被忽略。

Bloom Filter Data (ID: {B, D, A, T}) [可选]

它以一个包含三个无符号 32 位整数的头部开始
- 正在使用的哈希算法的版本。我们目前支持值 2，它对应于 murmur3 哈希的 32 位版本，其实现方式与 https://en.wikipedia.org/wiki/MurmurHash#Algorithm 中描述的完全相同，并且使用种子值 0x293ae76f 和 0x7e646e2 的双哈希技术，如 https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters in Probabilistic Verification" 中所述。版本 1 的 Bloom 过滤器存在一个 bug，当 char 为 signed 且仓库的路径名包含大于等于 0x80 的字符时会出现该 bug；Git 支持读取和写入它们，但此功能将在 Git 的未来版本中被移除。
- 路径被哈希的次数，因此累积确定文件是否存在于 commit 中的位数。
- 每个条目的最小位数 b。如果过滤器包含 n 个条目，则过滤器的大小是包含 n*b 位的最小 64 位字数。
块的其余部分是按字典顺序排列的所有计算出的 Bloom 过滤器的串联。
注意：没有更改或更改超过 512 个的 commit 具有长度为一的 Bloom 过滤器，其中所有位都被设置为零或一，分别表示。
BDAT 块的存在仅当 BIDX 块存在时才存在。

Base Graphs List (ID: {B, A, S, E}) [可选]

This list of H-byte hashes describe a set of B commit-graph files that
form a commit-graph chain. The graph position for the ith commit in this
file's OID Lookup chunk is equal to i plus the number of commits in all
base graphs.  If B is non-zero, this chunk must exist.

尾部

H-byte HASH-checksum of all of the above.

历史说明

Generation Data (GDA2) 和 Generation Data Overflow (GDO2) 块在其块 ID 中包含数字 *2*，因为 Git 的早期版本可能在 ID 为“GDAT”和“GDOV”的块中写入了可能错误的 GDA2 数据。通过更改 ID，较新版本的 Git 将会静默忽略这些旧块，并写入新信息，而不信任不正确的数据。

Git[1] 套件的一部分

设置和配置

获取和创建项目

基本快照

分支与合并

共享和更新项目

检查和比较

打补丁

调试

电子邮件

外部系统

服务器管理

指南

管理

底层命令

名称

概要

描述