This year’s Git Merge conference was held on February 1 in Brussels, with all the talks given in one long but very interesting day. As usual, all the ticket proceeds were donated to the Software Freedom Conservancy, which helps promote, improve, develop, and defend free and open-source software projects.
Here we summarize how the day unfolded.
The Future of Free Software by Deb Nicholson
Director of Community Operations at Software Freedom Conservancy, Deb Nicholson opened Git Merge 2019 with what was more a keynote talk whose content was more social than tech-focused.
In addition to her managerial position, Nicholson is a developer. She shared her first experience of using Git, during which she had to deal with unpleasant behavior from others, and reminded us that we were all newbies once.
The future of software, particularly in the open-source community, depends on what we are doing now, how we behave, and how we interact with others. Are we always open to new developers, even if they are inexperienced? To build the future of software we have to share our knowledge with newcomers and help them grow.
Pro tip from her: No matter how good you are at what you do, be kind to everyone.
Tales in Scalability: How Google Has Seen Users Break Git by Ivan Frade and Minh Thai
Software Engineers Ivan Frade and Minh Thai work in the Git server team at Google. Their goal is to serve Git data fast, everywhere.
Google is hosting hundreds of thousands of repositories, including Chromium and Android, whose repos are very active—in fact, their numbers are pretty much jaw-dropping.
| | Linux kernel | Go | Chromium | Internal Android| |-------------|--------------|------------|------------|-----------------| | References | >5000 | >100k | >1 million | >2 million | | Commits | ~1 million | ~400k | >5 million | ~5 million | | Objects | >7 million | >1 million | >20 million| >12 million | | Pack size | 1.5GiB | ~1GiB | 23GiB | ~8GiB |
Numbers extracted from one slide of the presentation
The problem with these big repos is that everything is really slow. Each time a developer wants to push code, Git returns the entire repo index. This can take up to 20 seconds, and while that may sound like nothing, 20 seconds at each commit is impossible to work with.
To solve this issue, the Google team modified the Git instances on the servers to return only part of the index each time someone pushes code. Their solution was to calculate a bitmap table (a graph) to find out what the client’s needs are.
This first hypothesis was based on the fact that a client doesn’t really need the whole repository to work. Obviously, this was made by a Google engineer for a Google repository, so it’s a specific use case. When it comes to another project and another team, it might not work.
The What, How, and Why of Scaling Repositories by Johan Abildskov
Johan Abildskov is a consultant at the DevOps firm Praqma. He brings a more rational approach to the way we choose between mono- and many repositories, explaining that he adapts the repo choice to the project he’s working on and tries not to be biased by his personal preferences.
He stated that, in many situations, it is useful to use many repos. He dismissed the argument that many of his clients use–“Even Google and Microsoft use monorepo!”–as even Google and Microsoft can fail.
Using monorepo makes it hard to continuously deliver new features. For example, Microsoft has to rebuild Windows entirely, even when a developer updates an icon on the desktop manager options. Every update takes a lot of time and server resources, as it’s not possible for it to be easily handled by a regular continuous delivery (CD) server. Therefore, the low-cost solution is to use many repos.
Abildskov also warned of the extreme opposite of monorepo: Breaking down every single part of your system into a different repository is counterproductive. He showed a screenshot of someone waiting for 24 pull requests on 24 different repositories to fix just one Jira ticket. Some units of one system should stay together. Abildskov’s advice? Be smart about how you break down your app.
Bridging the Gap: Transitioning Git to SHA-256 by Brian M. Carlson
A Git Ecosystem Engineer at GitHub, Brian M. Carlson gave an interesting talk about how to transition Git to SHA-256. Like every transition, it needs to be done in several steps to make sure you don’t break anything (details on why Git needs to transition to SHA-256 can be found in this StackOverflow answer).
Since this transition is a breaking change, it is complicated due to the fact that it’s a distributed system and not everyone will upgrade their Git at the same time. To function, the client and the server needs to be using the same version. The 4-step transition that solves this includes a step where both algorithms are supported. The transition itself will not affect the way developers use Git, as SHA-1 and SHA-256 are able to interoperate.
Git for Games: Current Problems and Solutions by John Austin
John Austin is a video-game developer. Most files used in this field are images for videos—that is, binary files, not text files—which often leads to conflicts for developers.
The problem is that, because they aren’t text files, Git doesn’t understand how to solve the conflicts: The changes on binary files don’t get spotted. For example, when you change the background color of an image, it changes the entire image, not just a line in the file. If two developers are working on the same image, Git can’t automatically merge the two versions of the image; instead, it asks the developers what version they’d like to keep. But by choosing one version, the work made on the other version is erased.
Austin discussed how he therefore had to find a workflow that involves as little conflict as possible with images when several developers are working on the same part of a project. His idea was to warn developers when their commits contain a binary file that has had a more recent modification in another Git branch. Thus, using a pre-commit hook, the Git Global Graph (an open-source project written in Rust that you can contribute to) checks for binary modifications and warns the developers before they commit.
The Art of Patience: Why You Should Bother Teaching Git to Designers by Belén Barros Pena
Interaction designer Belén Barros Pena shared her experience of what happened when she had to work directly with developers who were using Git, a system she didn’t know how to use. Unlike teams she had previously worked with, this one decided to show her how to use Git! It’s not an easy system to teach someone who isn’t a developer—Barros Pena needed to learn all the jargon surrounding it and a host of unfamiliar concepts, such as trees, branches, and commits. But, after two years, thanks to the patience of her developer coworkers, who simplified the explanations and avoided going into concepts that were nothing to do with her work, she did it.
As her experience shows, it’s very important for developers to teach designers how to use Git, if they are to be more involved in the conception and review process. Afterwards, Barros Pena was able to push code and reject some pull requests, meaning she was able to be active in the design and creation process.
She also had tips for aspiring mentors:
- Always do things with the mentees, never do things for them—it helps to create muscle memory.
- Be sure the mentees take notes.
- Teach using the command line interface.
- Be sure the mentees use the command line interface, too—it creates a good mental model based on how Git works and not how a graphical user interface (GUI) works.
Two quotes from her that we took away with us were: “The wall between developers and designers is not made of bricks, but of attitudes,” and “If we want more designers participating in FOSS (free and open-source software), we need to help them to do so.”
Version Control for Visual Learners by Veronica Hanus
Self-taught web developer Veronica Hanus is a former geology researcher who worked on the Mars Curiosity rover. She used her talk to explain how she had previously never been able to tell the visual changes between two commits because Git is not visual at all.
Commit descriptions tend to solve this problem, but have you ever noticed what changes on a website when you’ve just changed the font in the logo image? Typing “change font on logo” is not an option because how do you know what fonts have been used in each of the 10 commits where you typed “change font on logo”? Adding the name of the font helps if you use it every day but for the vast majority of the fonts you use, this doesn’t work.
Hanus presented the different solutions that she attempted. First, she took a screenshot of the project at each commit and put it in a folder that was committed with the rest of the code. This proved to be a torturous and unsatisfactory process.
She then tried to automatize the work using a puppeteer/Selenium WebDriver. At every commit, the pre-commit Git hook prompted the tools to automatically take a screenshot and put it in the screenshot folder, etc. This was better.
She then found tools such as Zeplin, Abstract, and Sketch, which can be plugged into the version control system, could help, and this is what she uses now. These tools are mainly used by big companies, though, and are not very well known by freelance developers or those who work in small companies.
Git & Version Control in the Enterprise: A Panel Conversation with Atlassian, GitHub, and GitLab
This talk was a little bit different because of its format. Instead of it being one person addressing the audience, it was a panel discussion between 3 representatives from the 3 main forges that use Git, led by CB Bailey, a Software Developer at Bloomberg.
The representatives were:
- Erik van Zijst, Principal Engineer at Atlassian, who is primarily focused on Bitbucket development.
- James Ramsay, a Product Manager at GitLab
- Briana Swift, a Trainer at GitHub
The theme of the talk was how company teams are using Git. For example, do they mainly use monorepo or not? If you really want to know, Ramsay said that a fair amount of users were asking for monorepo, whereas van Zijst from Atlassian said there were very few.
When Bailey asked for clarification of the significant differences between open-source software and enterprise workflow, the entire panel agreed that these included access-control rules, permissions, and restrictions. Companies want to have a wider range of flexibility and capability when it comes to restricting a developer or a group of developers on a project. Open-source projects don’t have this problem because they use forking workflows, which give no rights to developers and allow them only to fork the project and make a pull request between the two projects.
Bailey’s last question was: “What’s next for Git?” As a trainer, Swift thinks the way Git is taught to newcomers could always be improved upon, because, “Git is accessible but it’s not easy.” Ramsay talked about the tools that can be used to build on top of Git. He thinks we should, “get better in our visual representation, so that new users can feel confident and powerful using Git.”
Technical Contributions Toward Scaling for Windows by John Briggs
Previously a speaker for companies such as Google, Facebook, and GitHub, Microsoft’s John Briggs was there to talk about the need to scale Git due to the huge repositories of his company’s hosts.
Using the word “huge” to describe the Windows repository doesn’t quite cover it.. There is only one branch for all the Windows engineers and more than 3 million commits are held on that branch. Since it was too large an issue to solve, Microsoft decided to contribute to Git so that it could support a large repository.
Briggs showed how Microsoft contributes to Git and outlined the Git features built by Microsoft engineers. The main one, the multi-pack-index feature, which has been available since Git 2.20, speeds up the time it takes to find an object in the Git index.
His team also worked on the Git commit-graph feature. They first changed some Git algorithms to make Visual Studio Team Services (VSTS) more reactive—such as being faster at showing the branch graph than using the git log —graph command. Once everything was working, Microsoft decided to share this feature among the whole Git community by submitting their code for review by the Git core team.
How a Git-Based Education Cultivates More Resilient Developers by Ben Greenberg
Ben Greenberg is a developer advocate and a former rabbi. He feels that learning software development using Git is better than just learning software development.
He talked about his own experience and about those he previously helped as a technical coach at the Flatiron School in New York. Learners who use Git are forced to write small pieces of code as they have to break down the problem they want to solve into small parts. This is only one example of how using Git helps newbies learn how to think.
He showed us his first commits from when he learnt to code so we could see how he improved in his commits for technical issues, and how he now thinks about these issues.
In all, it was a really great day. Listening and talking to these speakers was very inspiring.
Git protocols: Still Tinkering After All These Years? by Brandon Williams
Software Engineer Brandon Williams works for the Git team at Facebook, having previously worked in the same position at Google.
The Git wire protocol is used to push and pull from a remote repository and, until last May, it had been years since there had been any updates to it. During his talk, Williams showed the audience a test of version 2, which he worked on and which is 4 times faster than the previous version.
By default, all Git clients are set on version 0, meaning you have to change to version 2 manually, which you can do on chosen repositories if you don’t want it as a global configuration.
To enable the new protocol, use
git config protocol.version 2 from a repository, or
git config --global protocol.version 2.
The improvements that version 2 brings include:
- Having a mechanism to switch protocols.
- Server-side filtering of references.
- Being more extensible, thus allowing future improvements, such as content delivery network (CDN) offloading and resumable clones, rebase on push, and remote grep/log for partial clones.
Native Git Support for Large Objects by Terry Parker
As leader of the Git Core and Git Server teams at Google, developer Terry Parker knows how having a large binary file in a Git project can be a huge pain, even for Google. So, for his talk, Parker presented a new feature of Git called “partial clone,” which was designed to help when you’re working with a large repository in Git. When cloning the repo, you can filter missing objects (tree, commit, blob) you want to retrieve, and Git will retrieve them automatically.
Large objects aren’t necessarily large files, so this feature can be used as an alternative to Git LFS, or they can be used together. The main difference between the two is that, with LFS, even though you have all the objects, some blob objects are linked to a file on the LFS server, whereas with partial clone you don’t have the blob objects at all.
It’s a great feature. The only bad aspect about it is that it could encourage monorepo… Just kidding.
Version Control for Law: Posey Rule in the US Congress by Ari Hershowitz
Former neuroscientist Ari Hershowitz is a US lawyer who learnt how to code. Working for Xcential, a firm seeking to drive modernization, efficiency, and transparency in public institutions, he creates document-comparison software.
In his post in What are the nontechnical barriers to adopting a version control system for use in writing bills and new laws?, Hershowitz addressed the question of why we should use Git or another version control system. One of the main problems is how laws are written: We keep adding text to laws but never erase or delete existing ones. It’s not exactly “versioning” and Git doesn’t work that way.
Therefore, his team decided to create a new open standard XML specifically designed for law, called USLM (the United States Legislative Markup), and they are converting the entire US code of law using this markup standard. Using USLM, computers are able to understand changes in the law and will ultimately be able to show these changes at any point in time.
Git, The Annotated Notepad by Aniket Subhash Kadam
Aniket Subhash Kadam found he was having a problem with context reloading at the start of a working day or even when he came back to his desk from a coffee break. He could never remember what he had been doing before his break or what he had been about to commit. Thus, he came up with the idea of using Git as a “notepad,” so that he could always easily refer back to what he’d been doing before his break/the day before.
With Git’s ability to show you the diff of your current work and most recent commits, you can easily read only what you’ve just worked on. In addition, you can make a WIP commit using the commit description as an annotation on the work in progress. So, if the last commit is not enough to get your brain to reload context, you only need to navigate through your commit history, reading the diff line by line and the descriptions. Git thus becomes an annotated notepad.
Subhash Kadam also shared his best practices for increasing the usefulness of his method:
- Utilising atomic commits.
- Reading diffs line by line before committing (that is, doing a code review of his own code).
- Writing descriptive commit descriptions.
- Refactoring the best he can before committing.
As a quick aside, Subhash Kadam loves GUI. He was the only one of the speakers to show us a GUI to explain his thoughts.
Gitbase, SQL Interface to Git Repositories by Javier Fontan
Javier Fontan is a Senior Software Developer at source[d], where he is leading an open-source project that facilitates the retrieval of information on a Git repository using a language similar to SQL, treating the repository as though it were a database.
This project is very useful for retrieving data easily without having to learn all about Git’s API. Fontan explained the problems he went through at first. For example, if his software had to search data in real time in the Git repository, it would take a very long time, so he programmed it to scan the repo and the areas where data is stocked in a real database.
Here are some of the aspects we liked:
- It was a very inclusive conference.
- Several of the speakers weren’t former developers and it was interesting to learn about their experiences.
- There was a lot of delicious fruit to snack on, rather than junk food.
We learnt about the Git ecosystem and the developer community that uses Git. This included:
- Why Git should be used by designers, too.
- The issues game developers face when they use Git.
- Information about all the domains where Git could be useful, including law.
This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!
Want to contribute? Get published!
Follow us on Twitter to stay tuned!
Illustrations by WTTJ