A monorepo is a version-controlled code repository that houses multiple projects. While these projects may be related, they are often logically independent and run by different teams.
Some companies host all their code in a single repository, shared among everyone. Monorepos can reach colossal sizes. Google, for example, is theorised to have the largest code repository ever, which has tens of hundreds of commits per day and is over 80 TBs large.
In the quest to use DeepSource’s best-in-class tool to maintain code quality, teams building on monorepos often had to take a clunky workaround of onboarding their monorepos as normal git-based repositories, and then the engineers had to then sift through the noise of seeing all the issues for the repository to find the issues that was directly affecting the files they were owners of.
Early last year, we went down the rabbit hole of adding monorepo support for DeepSource (you can learn more in the launch blog post); and we were first faced with the question of ⎯ How do you define a monorepo?
<aside> 🥳 Introducing support for monorepos ↗
</aside>
TLDR; How to onboard a monorepo from the regular activation flow for repositories
TLDR; How to onboard a monorepo from the regular activation flow for repositories
Before we dive into the trenches, here’s a short overview of how DeepSource works -
<aside>
💡 DeepSource is connected to your git-based code hosting provider - the usual trio of GitHub, GitLab, and BitBucket, alongside Azure Devops Services, and Google’s Cloud Source Repositories.
After connecting your provider (and/or installing the DeepSource app on your provider), you activate your repository by adding a .deepsource.toml
file at the root of their directory which contains instructions for the DeepSource backend on which static analyzers to run on the code. Once the DeepSource app is installed and your repository is activated, we automatically scan all new commits and new PRs and leave a nice summary of what your code quality looks like.
</aside>
A big chunk of this project came without a lot of prior literature or references that we could look back on and take inspiration from. We actively talked to our customers to arrive at the definitions of components inside a monorepo and formed hypotheses on how we want this feature to function. The existing workaround that customers used to analyze their monorepos with DeepSource was clunky. Our primary competitor in the space ⎯ Sonarqube, had a similar flow to ours in terms of defining entities inside a monorepo as projects and also marking the repositories as monorepos while onboarding them to the platform.
We got on calls with quite a few customers who were looking for a solution to their monorepo woes and we worked directly with them to fine tune our hypotheses and arrive at an elegant solution that their teams loved using. We tested our flows with both new age and legacy companies to ensure that the new process wouldn’t be unintuitive to either. We discarded a few approaches to defining a monorepo and sub-repositories inside ⎯ the most notable being using the .deepsource.toml
file itself to define sub-repositories and their corresponding analyzers, and while this seemed like an elegant solution in the beginning, we also realized that most of our customers prefer to not write the configuration file by hand, and instead preferred using the configuration editor that comes with the Deepsource dashboard to craft them. And a mistake in the configuration file could throw the whole monorepo analysis in disarray.
In the absence of a visual editor to define the sub-repositories, it made more sense to build a GUI to enable monorepos and sub-repsoitories instead of enabling the configuration editor to do the heavy lifting. Our customers found the GUI-driven approach more intuitive and less prone to mistakes and that’s what we ended up shipping.
The post started with the formal definition of what is a monorepo is, but logical insulation is subjective for different teams, and we had to sift through a myriad of use cases to arrive at the most plausible flow. But before we got there, we had to answer 2 key questions -