OpenAlex is a map of the world's research ecosystem, linking components (like papers, institutions, journals, topics, SDGs, authors, etc.) to one another.
Research outputs are the main artery of the system. When a researcher publishes an article, book, dataset, etc. information about those outputs are registered with registry agencies like Crossref and DataCite or institutional and national repositories, like HAL. We pull information on those outputs from these sources and others* and then try to make that information to known entities in PID (persistent identifier) providers. For instance: matching affiliation text to known institutions in ROR, authors to ORCIDs, or journal titles to ISSN. This form the foundation of the knowledge graph. But we also link research outputs to other outputs (by extracting reference metadata) and run text classifiers on title and abstract text to understand what the research is about, linking it to known topics, subjects, and even SDGs (more on that).
With information on nodes (papers, people, etc) and the connections between them established and stored, OpenAlex becomes a map of the research ecosystem. As new works are published (or new records of old works are minted), they get added in and the database continues to evolve hourly. Users can then download the full dataset (info on that) or explore and find specific information using our free and open REST API or User Interface. Because it's a connected network map, users can search for papers on specific topics of research, browse authors at a specific institutions, even investigate trends in collaboration among institutions on a topic over time, and much more.
*Currently (as of January 2025), the list of core sources we pull this type of information includes Crossref, DataCite, PubMed, HAL, DOAJ, ORCID, MAG, arXiv, Dergipark, OSTI, RePEc, UNC Carolina, University of Michigan Deep Blue, Zenodo, other institutional repositories (full list), parsing of 60M open access PDFs, some journal landing pages, directly from some publishers, and from our users through community curation requests. We're working on a significant rewrite to the OpenAlex guts code that will enable us to add many new core sources starting in march 2025.