A structured, open-source taxonomy for classifying open source software projects.
The open source ecosystem contains millions of projects across dozens of software ecosystems, but there's no standardized way to classify and discover them beyond simple keyword searches. Traditional registries organize software by language or ecosystem, but lack semantic understanding of what projects do, who they're for, or where they fit in a technology stack.
This taxonomy solves that problem by providing a multi-dimensional classification system that captures the full context of open source software. Whether you're building a project discovery platform, analyzing ecosystem trends, or helping developers find the right tools, this taxonomy gives you a common vocabulary.
By establishing a shared classification framework, this taxonomy enables consistent understanding across the entire open source community — from project maintainers documenting their work, to developers searching for solutions, to researchers analyzing ecosystem health, to funders identifying gaps in critical infrastructure. When everyone speaks the same language, the ecosystem becomes more discoverable, understandable, and navigable for all.
This project defines a flexible and extensible classification system across multiple dimensions (called facets). Each facet describes a different way of grouping or tagging software — by domain, role, technology, audience, etc.
Instead of forcing software into a single category, the taxonomy allows projects to be tagged with multiple terms per facet and across facets. For example, a web framework project might be classified as:
- Domain:
web-development,api-development - Role:
framework,library - Technology:
python,docker - Audience:
developer,enterprise - Layer:
backend,full-stack - Function:
api-development,authentication,database-management
This multi-faceted approach creates a rich, queryable classification that enables powerful discovery, analysis, and recommendation features.
Each folder in this repository is a facet, containing YAML files that define individual terms.
- Domain — The industry or field
blockchain,cloud-computing,content-management,data-science,database,desktop-development,devops,embedded-systems,game-development,machine-learning, ...
- Role — The role the software plays
application,build-tool,cli-tool,compiler,database-system,editor,framework,library,orchestrator,package-manager, ...
- Technology — Technologies used or supported
angular,ansible,aws,azure,bootstrap,c,cpp,csharp,css,dart, ...
- Audience — Who the software is for
content-creator,data-scientist,designer,developer,educator,end-user,enterprise,hobbyist,researcher,student, ...
- Layer — Where it sits in the stack
backend,data-layer,frontend,full-stack,hardware,infrastructure,middleware,network-layer,operating-system,platform
- Function — What the software does
api-development,authentication,automation,caching,ci-cd,containerization,database-management,deployment,documentation,logging, ...
You can use this taxonomy to:
- Classify and tag open-source projects
- Power search and filtering interfaces
- Analyze ecosystem coverage and trends
- Find related and alternative projects
- Generate suggestions for best practices and tools
- Create visualizations of the open-source landscape
- Build recommendation engines
A combined JSON file is automatically generated (combined-taxonomy.json) for easy loading into apps. This file is always up-to-date and reflects the latest taxonomy changes. You can find it in the root of the repository.
The taxonomy structure is formally defined in schema.json, a JSON Schema file that can be used to validate the combined taxonomy data or integrate it into your tools and editors.
CodeMeta is a standard for software metadata that extends schema.org. If you're maintaining a codemeta.json file for your project, you can use this taxonomy to provide rich, structured classification.
Analysis of 1,631 open source projects with CodeMeta files reveals:
- No standard vocabulary for
applicationCategory(229 unique values for 496 uses, 61% used only once) - Fragmented keywords with no consistent structure or semantic meaning
- No multi-faceted classification to capture what software does, who it's for, and where it fits
This taxonomy fills that gap by providing a shared vocabulary with multi-dimensional classification.
Use colon-prefixed keywords to preserve the faceted structure of this taxonomy within CodeMeta's existing fields. This approach:
- ✓ Works with existing CodeMeta/schema.org standards (no changes required)
- ✓ Is machine-parsable with simple regex patterns
- ✓ Maintains backward compatibility (still searchable as plain text)
- ✓ Preserves semantic meaning across facets
{
"@context": "https://w3id.org/codemeta/3.0",
"@type": "SoftwareSourceCode",
"name": "django",
"description": "A high-level Python web framework",
"applicationCategory": "framework",
"keywords": [
"domain:web-development",
"domain:api-development",
"role:framework",
"role:library",
"technology:python",
"audience:developer",
"audience:enterprise",
"layer:backend",
"layer:full-stack",
"function:api-development",
"function:authentication",
"function:database-management"
]
}-
Use the facet name as the namespace prefix:
domain:,role:,technology:,audience:,layer:,function: -
Use kebab-case term names: Match the exact term names from this taxonomy (e.g.,
web-development, notWeb Development) -
Include multiple terms per facet: Software often fits multiple categories within each dimension
-
Set applicationCategory to the primary role: Use the most specific
roleterm (e.g.,framework,library,cli-tool) -
Mix with regular keywords: You can include both namespaced and non-namespaced keywords
We are working with the CodeMeta community to potentially formalize this approach or introduce native structured taxonomy support in future versions. Feedback and real-world usage will inform these discussions.
Each .yml file in a facet folder defines a single term. Here's the recommended structure:
name: web-development
description: Software for building websites, web apps, and APIs.
examples:
- react
- nextjs
- rails
related:
- frontend
- backend
aliases:
- webdev
- webdevelopment
ecosystems:
- npm
- rubygems
tags:
- html
- css
- javascript-
name(required): A unique identifier for the term in kebab-case (e.g.,web-development,cli-tool). Must be unique within its facet. -
description(required): A clear, concise human-readable explanation of what the term represents. Should be 1-2 sentences. -
examples(optional): A list of well-known software projects, tools, or libraries that exemplify this term. Include 2-5 recognizable examples to help users understand the term's scope. -
related(optional): Other taxonomy terms (from any facet) that are conceptually connected or commonly used together. Helps build relationships across the taxonomy. -
aliases(optional): Alternative names, synonyms, or common variations of the term. Useful for search and discovery (e.g.,jsforjavascript,k8sforkubernetes). -
ecosystems(optional): Package managers or software ecosystems where this term is commonly found. Must use valid ecosystem identifiers from packages.ecosyste.ms. See below for the complete list. -
tags(optional): Freeform keywords for enhanced searching, filtering, and categorization. Can include related concepts, use cases, or characteristics.
These ecosystem identifiers come from packages.ecosyste.ms and align with Package URL (purl) types:
npm, go, docker, nuget, pypi, maven, packagist, cargo, rubygems, cocoapods, pub, bower, cpan, alpine, actions, cran, clojars, conda, hex, hackage, julia, swiftpm, spack, homebrew, puppet, openvsx, deno, elm, racket, vcpkg, bioconductor, carthage, postmarketos, elpa, adelie
We welcome contributions! To add or improve a term:
- Pick the appropriate facet folder.
- Add or edit a
.ymlfile for the term. - Include a
description,examples, and optionallyrelatedterms. - Run
generate_combined_taxonomy.rbto update the combined file. - Open a pull request.
CC0 1.0 Universal Public Domain Dedication. You can use this taxonomy freely without restrictions. See LICENSE for details.