Microsoft Open Source Scalar Boosts Speed to Operate Giant Git Warehouse

Git is part of a distributed version control system, and by default, each Git repository has a complete copy of the entire history. Even a medium-sized development team generates thousands of submissions, adding hundreds of terabytes of data to the repository each month. As the footprint of the repository increases, Git has difficulty managing all the data, making it less and less smooth.

As a result, the developer’s time is wasted waiting for feedback after executing the command, such as using git status to get the modified file, or using git fetch to pull the code locally. Because the wait is too long, developers will most likely switch to completing another task and then switching back when the command is executed. This way of working by switching tasks back and forth often reduces developer productivity.

Microsoft is clearly experienced in dealing with giant Git warehouses. After all, the code for the Windows operating system is managed using Git, and to overcome the above problems, Microsoft developed VFS for Git (formerly known as GVFS), a project that uses virtual file systems to bypass many warehouse size limits, so Windows Developers can also use Git before such a large project.

Microsoft Open Source Scalar Boosts Speed to Operate Giant Git Warehouse

While developing Vit for Git, Microsoft identified performance bottlenecks by using custom tracking systems and collecting user feedback. During this time, Microsoft also contributed some code to Git clients, including commit-graph capabilities and improvements to git push and sparse checkout. Based on these contributions and many other recent improvements to Git, Microsoft has launched a project to support giant Git warehouses without virtual file systems. This is the background to The Birth of Scalar.

Scalar is a .NET Core application written in C? and only runs on windows and macOS platforms. Scalar maximizes the performance of Git commands by setting the recommended configuration values and running background maintenance. No matter what service developers use to host the code repository, Scalar can effectively accelerate Git instructions. Microsoft mentions that as soon as Scalar is registered for the largest code repository, it can immediately feel the magnitude of Git’s execution.

Microsoft Open Source Scalar Boosts Speed to Operate Giant Git Warehouse

For The future of Scalar, Microsoft wants to contribute to Git. Microsoft plans to incorporate the Git-accelerated approach in Scalar directly into Git projects, resulting in the implementation that allows developers to achieve these performance improvements without the need for Scalar and using Git clients alone. But there is still a long way to go to achieve this goal. Microsoft notes that sparse checkout is currently the way Scalar can address the scale of the warehouse, and although Git recently updated the sparse checkout feature to make it easier to use, there is still some distance to reach the stage where full functionality is available.

Scalar currently uses sparse checkouts instead of virtual file systems, so there are bottlenecks when executing Git commands, especially git checkouts that are not as fast as VFS for Git, and Microsoft is working on a parallel version of git checkout to improve execution performance. Microsoft mentioned that in order to truly scale Git services to meet the needs of thousands of engineers and build machines that interact with central servers, Git needs to provide concepts similar to GVFS cache servers. They also said they planned to bring up the idea on the mailing list soon.

In addition, Git client warehouses are currently able to perform smoothly, relying on regularly executed foreground garbage collectors, but Microsoft notes that this is not a viable approach for giant warehouses. So Microsoft plans to include some form of background maintenance in Git clients, similar to git maintenance start commands, and as easy to use as scalar registers.

For detailed instructions, please see