Skip to content

1. Introduction 引言

NOTE

(August 2023) Wyag is now complete!

备注

(2023 年 8 月)Wyag 现已完成!

This article is an attempt at explaining the Git version control system from the bottom up, that is, starting at the most fundamental level moving up from there. This does not sound too easy, and has been attempted multiple times with questionable success. But there’s an easy way: all it takes to understand Git internals is to reimplement Git from scratch.

本文旨在从基础开始,深入解释 Git 版本控制系统。这听起来并不简单,过去的尝试往往效果不佳。但有一个简单的方法:要理解 Git 的内部机制,只需从头实现一个 Git。

No, don’t run.

不,别跑。

It’s not a joke, and it’s really not complicated: if you read this article top to bottom and write the code (or just download it as a ZIP — but you should write the code yourself, really), you’ll end up with a program, called wyag, that will implement all the fundamental features of git: init, add, rm, status, commit, log… in a way that is perfectly compatible with git itself — compatible enough that the commit finally adding the section on commits was created by wyag itself, not git. And all that in exactly 988 lines of very simple Python code.

这不是开玩笑,也并不复杂:如果你仔细阅读这篇文章并编写代码(或者直接 下载代码 压缩包——但我强烈建议你自己动手写代码),你将得到一个名为 wyag 的程序,它实现了 Git 的基本功能:initaddrmstatuscommitlog……而且与 Git 本身兼容,甚至可以说最后添加关于提交部分的记录是由 wyag 本身而不是 Git 创建的 (链接)。所有这一切仅需 988 行简单的 Python 代码。

But isn’t Git too complex for that? That Git is complex is, in my opinion, a misconception. Git is a large program, with a lot of features, that’s true. But the core of that program is actually extremely simple, and its apparent complexity stems first from the fact it’s often deeply counterintuitive (and Git is a burrito blog posts probably don’t help). But maybe what makes Git the most confusing is the extreme simplicity and power of its core model. The combination of core simplicity and powerful applications often makes thing really hard to grasp, because of the mental jump required to derive the variety of applications from the essential simplicity of the fundamental abstraction (monads, anyone?)

那么,Git 真的有那么复杂吗?我认为复杂性是个误解。确实,Git 是一个功能丰富的大型程序,但其核心其实非常简单,表面上的复杂性往往源于其深奥之处(而且 Git 被比作墨西哥卷饼 的讨论可能也没有帮助)。实际上,让 Git 令人困惑的,正是它核心模型的极简与强大。核心的简单性与丰富的应用之间的结合,常常让人难以理解,因为需要一定的思维跳跃才能从基本的简单性中推导出各种应用。

Implementing Git will expose its fundamentals in all their naked glory.

通过实现 Git,我们将能更清晰地认识其基本原理。

What to expect? This article will implement and explain in great details (if something is not clear, please report it!) a very simplified version of Git core commands. I will keep the code simple and to the point, so wyag won’t come anywhere near the power of the real git command-line — but what’s missing will be obvious, and trivial to implement by anyone who wants to give it a try. “Upgrading wyag to a full-featured git library and CLI is left as an exercise to the reader”, as they say.

期待什么? 本文将详细实现和解释一个简化版本的 Git 核心命令(如果有不清楚的地方,请随时 反馈。我会保持代码简单明了,因此 wyag 的功能远不能与真正的 Git 命令行相提并论,但缺失的部分将显而易见,任何想要尝试的人都能轻松添加这些功能。“将 wyag 升级为一个功能齐全的 Git 库和 CLI 是留给读者的练习”,正如人们所说的那样。

More precisely, we’ll implement:

更具体地说,我们将实现:

You’re not going to need to know much to follow this article: just some basic Git (obviously), some basic Python, some basic shell.

你无需掌握太多知识即可跟上这篇文章:只需了解一些基本的 Git(显然)、一些基本的 Python 和一些基本的 shell 知识。

  • First, I’m only going to assume some level of familiarity with the most basic git commands — nothing like an expert level, but if you’ve never used init, add, rm, commit or checkout, you will be lost.
  • 首先,我假设你对最基本的 git 命令 有一定了解——不需要达到专家水平,但如果你从未使用过 initaddrmcommitcheckout,你可能会感到困惑。
  • Language-wise, wyag will be implemented in Python. Again, I won’t use anything too fancy, and Python looks like pseudo-code anyways, so it will be easy to follow (ironically, the most complicated part will be the command-line arguments parsing logic, and you don’t really need to understand that). Yet, if you know programming but have never done any Python, I suggest you find a crash course somewhere in the internet just to get acquainted with the language.
  • 在编程语言方面,wyag 将使用 Python 实现。代码将保持简单易懂,因此对于初学者来说,Python 看起来像伪代码,容易上手(讽刺的是,最复杂的部分将是命令行参数解析逻辑,但你不需要深入理解这个)。如果你会编程但从未接触过 Python,建议找个速成课程熟悉一下这门语言。
  • wyag and git are terminal programs. I assume you know your way inside a Unix terminal. Again, you don’t need to be a l77t h4x0r, but cd, ls, rm, tree and their friends should be in your toolbox.
  • wyaggit 都是终端程序。我假设你对 Unix 终端操作非常熟悉。再强调一遍,你不需要是个黑客,但 cdlsrmtree 等命令应该是你工具箱里的基本工具。

WARNING

Note for Windows users

wyag should run on any Unix-like system with a Python interpreter, but I have absolutely no idea how it will behave on Windows. The test suite absolutely requires a bash-compatible shell, which I assume the WSL can provide. Also, if you are using WSL, make sure your wyag file uses Unix-style line endings (See this StackOverflow solution if you use VS Code). Feedback from Windows users would be appreciated!

警告

对 Windows 用户的说明

wyag 应该能够在任何带有 Python 解释器的类 Unix 系统上运行,但我不确定它在 Windows 上的表现。测试套件绝对需要一个兼容 bash 的 shell,我相信 WSL 可以满足这一需求。此外,如果你使用 WSL,请确保你的 wyag 文件采用 Unix 风格的行结束符(请参见这个 StackOverflow 解决方案,适用于 VS Code)。欢迎 Windows 用户提供反馈!

Note

Acknowledgments

This article benefited from significant contributions from multiple people, and I’m grateful to them all. Special thanks to:

  • Github user tammoippen, who first drafted the tag_create function I had simply… forgotten to write (that was #9).
  • Github user hjlarry fixed multiple issues in #22.
  • GitHub user cutebbb implemented the first version of ls-files in #27, and by doing so finally brought wyag to the wonders of the staging area!

备注

致谢

本文得益于多位贡献者的重要帮助,我对此深表感谢。特别感谢:

  • GitHub 用户 tammoippen,他草拟了我一度遗忘的 tag_create 函数(这是 #9)。
  • GitHub 用户 hjlarry#22 中修复了多个问题。
  • GitHub 用户 cutebbb#27 中实现了 ls-files 的第一个版本,从而让 wyag 实现了暂存区!