OpenAlex organizes scholarly works into roughly 4,500 research topics, arranged in a four-level hierarchy: domains (4), fields (26), subfields (254), and topics (~4,500). Every work with enough metadata gets assigned to one or more topics automatically.
This system was developed in collaboration with CWTS at Leiden University, extending their Open Leiden Rankings approach.
The four-step methodology
Here's how topic assignment works:
1. Cluster the citation network
We start with works that have incoming and outgoing citations, and cluster them based on citation relationships. Works that cite each other frequently end up in the same cluster. These clusters naturally correspond to research communities — groups of scholars working on related problems.
2. Label the clusters with an LLM
Once we have meaningful clusters, we use a large language model to generate topic names and descriptions for each one. This gives us human-readable labels that capture what each research community is about.
3. Train a deep-learning classifier
We then train a deep-learning model to assign topics to any work based on its title, abstract, citations, and journal name. This is the model that does the heavy lifting in production. Importantly, it handles missing data gracefully — it can classify new works that don't have incoming citations yet, using just the title, abstract, and source.
4. Map topics to the hierarchy
Finally, each topic gets mapped to a subfield, field, and domain based on Scopus's ASJC categories. This gives every topic a place in the four-level hierarchy.
How topics appear on works
The model scores each candidate topic for a given work. The highest-scoring topic becomes the work's primary_topic. Additional high-scoring topics appear in the topics array. For example:
- Domain: "Health Sciences"
- Field: "Medicine"
- Subfield: "Health Informatics"
- Topic: "Artificial Intelligence in Medicine"
Some works don't have enough data to classify — if there's no title, no abstract, and no citations, the model doesn't have enough to work with, and those works won't have topics assigned.
Topics coverage
Most works are assigned topics — as well as domains, fields, subfields, and keywords — using the methods above. Some works, however, don't have enough associated data to be classified. The following table from the methods paper shows how many works were classified and how many were excluded:
Learn more
- Methods paper: OpenAlex: End-to-End Process for Topic Classification
- Code and model: openalex-topic-classification on GitHub
- API docs: Topics, Domains, Fields, Subfields