Support for Monorepos

A monorepo is a version-controlled code repository that houses multiple projects. While these projects may be related, they are often logically independent and run by different teams.

Some companies host all their code in a single repository, shared among everyone. Monorepos can reach colossal sizes. Google, for example, is theorised to have the largest code repository ever, which has tens of hundreds of commits per day and is over 80 TBs large.

In the quest to use DeepSource’s best-in-class tool to maintain code quality, teams building on monorepos often had to take a clunky workaround of onboarding their monorepos as normal git-based repositories, and then the engineers had to then sift through the noise of seeing all the issues for the repository to find the issues that was directly affecting the files they were owners of.

Early last year, we went down the rabbit hole of adding monorepo support for DeepSource (you can learn more in the launch blog post); and we were first faced with the question of ⎯ How do you define a monorepo?

<aside> 🥳 Introducing support for monorepos ↗

</aside>

TLDR; How to onboard a monorepo from the regular activation flow for repositories

Before we dive into the trenches, here’s a short overview of how DeepSource works -

<aside> 💡 DeepSource is connected to your git-based code hosting provider - the usual trio of GitHub, GitLab, and BitBucket, alongside Azure Devops Services, and Google’s Cloud Source Repositories. After connecting your provider (and/or installing the DeepSource app on your provider), you activate your repository by adding a .deepsource.toml file at the root of their directory which contains instructions for the DeepSource backend on which static analyzers to run on the code. Once the DeepSource app is installed and your repository is activated, we automatically scan all new commits and new PRs and leave a nice summary of what your code quality looks like.

</aside>

Research

User assumption ⎯ We hypothesised 2 fundamental approaches for defining monorepos :

Use the existing .deepsource.toml file to define monorepos and sub-repositories.
Give the users a separate GUI to maintain a logical insulation on DeepSource without them having to make any modifications to their fundamental code structure.

A big chunk of this project came without a lot of prior literature or references that we could look back on and take inspiration from. We actively talked to our customers to arrive at the definitions of components inside a monorepo and formed hypotheses on how we want this feature to function. The existing workaround that customers used to analyze their monorepos with DeepSource was clunky. Our primary competitor in the space, had a similar flow to ours in terms of defining entities inside a monorepo as projects and also marking the repositories as monorepos while onboarding them to the platform.

User research ⎯ I conducted interviews with quite a few customers who were looking for a solution to their monorepo woes and we worked directly with them to fine tune our hypotheses and arrive at a solution that their teams loved using. We tested our flows with both new age and legacy companies to ensure that the new process wouldn’t be unintuitive to either.

We discarded the first approach to defining a monorepo and sub-repositories inside ⎯ using the .deepsource.toml file itself to define sub-repositories and their corresponding analyzers. While this seemed like an elegant solution in the beginning, we also realized that most of our customers prefer to not write the configuration file by hand, and instead preferred using the configuration editor that comes with the Deepsource dashboard to craft them.