🐉 Taming the beast - How I built a centralized AI toolkit and skills library | Blog

One of the challenges with agentic coding is the skill sprawl.

At Superbalist we have a lot of repositories, and the team had started using different AI coding agents to work in them.

Agentic sprawl is real

This gave rise to a few problems:

Scope. Skills are useful, but a skill that lives in a developer machine or is committed to only one repository helps only that repository. I wanted a single place to write skills that might be used across multiple repositories and have every repository read from it. I also wanted improvements to reach every repository without opening a pull request against each one by hand.
Duplication. If every repository has its own copy of a skill, then the same skill lives in many places. This makes it hard to keep them in sync, and it makes it hard to know where to edit when you want to improve a skill.
Not tool agnostic. Different people use different tools, and each tool expects its resources in a different shape. If I write a skill for one tool, I have to rewrite it for every other tool.

So I started with idea of creating a shared toolkit, which would be a single source of truth for skills, commands, and agent instructions. Every repository would read from it, and every update would propagate to every repository. It would also serve as an org wide bank of proven skills which people could pull from when they needed to solve a problem, and a place to share improvements when they found them.

This post describes how the toolkit ended up structured, and the two designs I built before the one I kept.

The first idea: a local server

My first design was a local stdio server that spoke the Model Context Protocol. It would run on each developer’s machine and serve skills to whatever agent asked for them. The appeal was that it was live. Change a skill, and every session would see the change immediately, with no sync step.

I wrote part of the design before I stopped. The question I could not answer was what the server did that the filesystem did not. Agents already read files. A skill is a markdown file. The server would add a background process, a protocol, and a per-session lifecycle, all to deliver text that a file on disk already delivers. The live updates were real, but they solved a problem I did not have, and they added cost I would pay continuously.

I like the idea that a Skill is a runbook, an MCP server is a live socket, so I dropped idea of serving instructions via the server. I decided to use the simplest mechanism that worked, and to revisit that only if a real need appeared.

The submodule and the sync script

The second design was a git submodule. Each repository carried the toolkit in a subdirectory. Two bash scripts did the wiring: one to set a repository up, and one to copy content into place. This is a standard use of submodules.

It worked for a couple of weeks, but several problems showed up:

Cloning a repository needed the --recursive flag, or the toolkit directory came up empty.
The .gitmodules file changed often and appeared in diffs that had nothing to do with the actual change.
Each repository carried a full copy of the toolkit, so the content was duplicated everywhere.
Updating a repository required a commit to bump the submodule. Most repositories fell behind, because that commit rarely happened.
CI containers handled submodule initialization inconsistently.

These looked like five separate problems. They had one cause. The toolkit was copied into each repository, and every problem was a symptom of that. Once I saw this, the fix was to stop copying it in.

One clone per machine

The design I kept inverts the arrangement. There is one clone of the toolkit per machine, in a fixed location in the home directory. Every repository on the machine shares that clone. The repositories do not carry the toolkit. They carry a small manifest file that lists which parts of the toolkit they use.

This removed all five problems at once. There is no duplication, because there is one copy per machine. There is no version pinning and no bump commits, because the clone tracks the latest and one pull moves every repository on the machine forward together. CI no longer deals with submodules, because there are none.

I also replaced the bash scripts. They had grown special cases and assumptions about the shell, and I no longer trusted them. I rewrote the toolkit as a small Python package with a CLI called supai. The common commands are short:

supai init      # once per repository
supai sync      # after editing a skill
supai update    # to pull the latest toolkit content

The clone is installed in editable mode, so it is the source of truth. A git pull in the clone takes effect immediately, with no reinstall step. The copy I develop against and the copy the repositories use are the same copy. The install step only clones and links. The rest of the logic lives in the toolkit, so it can change and every machine picks up the change on the next pull.

The CLI declares its own dependencies and uses uv to resolve them on first run, then caches them. There is no separate install step and no virtualenv to manage.

Supporting more than one tool

The next problem was that people use different AI tools. One person uses Claude Code, another uses Gemini CLI, another wants to try OpenCode. Each tool expects its configuration in a different shape: different filenames, different frontmatter, different directory layouts.

The obvious approach is to keep a copy of each skill for each tool. That is the duplication problem again, one level higher. I had dealt with duplication once already, so I avoided it here.

Each skill, command, and agent is written once, in a tool-agnostic format with a fixed frontmatter schema. When you sync, the toolkit translates that single source into the shape each tool expects. A small adapter per tool does the translation. Authors never write tool-specific configuration.

I did not trust this design until I tested it. The first two tools were both close to the Claude format, which is an easy case. So I added an adapter for OpenCode, which uses a different configuration format, including a different way of expressing an agent’s permissions. The skills did not change. Only a new adapter was added. After that I was confident the abstraction held. Adding a tool is now a matter of writing an adapter, not rewriting the content.

Tracked files and generated files

Setting up a repository produces two kinds of files. The line between them is the most important part of the design.

Some files are tracked. They are committed and shared with anyone who clones the repository. These define what the repository’s AI tooling is: the manifest that lists which packs the repository uses, and any skills written for that specific repository.

The rest are gitignored. These are the generated tool configurations, the entrypoint files, and the shared rules each agent reads. They are derived from the source and regenerated on every machine. Committing them would add noise that goes out of date as soon as the source changes.

With that line drawn, most questions about whether a file belongs in git answer themselves. A file belongs in git if a person wrote it on purpose. Everything else is generated.

Making sure people run the commands

A setup like this depends on people running the commands, and people forget. I added two git hooks to handle that.

A pre-commit hook checks that the generated files match their source. If they have drifted, it blocks the commit and tells you to run sync. A pre-push hook checks whether your toolkit clone is behind the upstream repository. It only warns, it does not block the push, and it checks the network at most once an hour.

The difference is deliberate. Drifted generated files are an error worth blocking on, because they will confuse the next person who opens the repository. Being a day behind on toolkit content is not an error, so it is only a warning. A toolkit problem should not stop anyone from shipping their own code.

What’s in the library

The plumbing only matters because of what flows through it. So the next addition to the toolkit was a library.

The library is a collection of skills, commands, and agent instructions that anyone can use. It is also the heart of the project, because it is where the value lives.

To keep the organization clear, the library is divided into different packs. People can download the whole library, but they can also pick and choose which packs they want. The packs are organized around product domains and tech stacks, so you can pull in the ones that are relevant to your work without getting overwhelmed by the ones that are not.

The Meta skills

Another cool thing about the library is that it contains skills that operate on skills. These are meta skills, and they are a powerful way help people create meaningful skills and to add content to the library and keep it up to date.

For example, there is a meta skill that takes a skill written in the tool-agnostic format and promotes it into the shared library. It polishes the skill, registers it in the catalog, and stages it on a branch in a development clone for me to review and turn into a pull request. Once that merges, every repository picks it up on the next update.

Another meta skill watches for repeated corrections. If I correct the same point twice, it offers to capture the correction as a skill. If I accept, it writes a skill into the current repository. This covers the original goal: a skill written once, in the repository where I needed it. The promotion skill takes it from there and makes it available to everyone if needed.

This makes the content layered. A small set of assets is always on, a core pack is included in every repository, and the rest is organized into opt-in packs for specific product domains and tech stacks. Each repo gets a manifest file that keeps track of which packs it uses.

Where it landed

The design comes down to a few points. Write each piece once. Keep the source files and the generated files separate. Use a small tool to translate between them. Most of the useful properties followed from those points, and most of the dead ends came from ignoring them.

The first idea: a local server#

The submodule and the sync script#

One clone per machine#

Supporting more than one tool#

Tracked files and generated files#

Making sure people run the commands#

What’s in the library#

The Meta skills#

Where it landed#