Works in OpenAlex are tagged with Topics using an automated system that takes into account the available information about the work, including title, abstract, source (journal) name, and citations.
There are around 4,500 Topics. Topics are grouped into subfields, which are grouped into fields, which are grouped into top-level domains. This is shown in the diagram below, along with the counts for each.
Works are assigned topics using a model that assigns scores for each topic for a work. The highest-scoring topic is that work's "primary topic". Each topic has one subfield, one field, and one domain, so each of these may also be used to classify the work, depending on the level of granularity you want. For example:
Example Topic: "Artificial Intelligence in Medicine"
-
Domain: "Health Sciences"
-
Field: "Medicine"
-
Subfield: "Health Informatics"
-
Topic: "Artificial Intelligence in Medicine"
We developed the method for classifying our works in collaboration with CWTS at Leiden University, extending the methods they used in their Open Leiden rankings, which they explain in this article. Here is an outline of the overall method:
-
Cluster the citation network for works that have incoming and outgoing citations. This provides meaningful clusters of works that strongly correspond to research communities focused on different topics.
-
Use a Large Language Model (LLM) to get labels and descriptions for these clusters.
-
Use this labeled data to train a deep-learning model that can assign topics using titles, abstracts, citations, and journal name.
-
This model can handle cases with missing data, so we can use it to classify most of our works, including new works that don't have any incoming citations.
-
-
Assign each topic to subfields, fields, and domains, which are based on Scopus's ASJC categories
For a detailed description of the methods, see our paper: "OpenAlex: End-to-End Process for Topic Classification". The code and model are available at https://github.com/ourresearch/openalex-topic-classification
.
Topics coverage
Most works are assigned Topics—as well as domains, fields, subfields, and keywords—using the methods above. Some works, however, don't have enough associated data to be able to assign Topics. The following table taken from the methods paper (linked to above), shows how many works were classified with at least one Topic and how many works were excluded from Topic classification for various reasons:
Technical documentation
You can find more information about how OpenAlex Topics are included in the API and snapshot data in our technical documentation Topics pages.