Introduction
Patent literature offers a significant source of information in the field of chemistry. Asche’s research (1) has revealed that patents constitute a substantial repository of information across the vast majority of chemistry-related disciplines. The proportion of chemical compounds in which patents are the primary source of publication has been increasing over time. Patents are also the most common source of information regarding the use (or the application) of chemical compounds. In instances where information is initially disclosed in a patent, it frequently becomes the sole source of knowledge on that subject. Asche’s findings further reveal that 56.6% of chemical compounds, quantified as Registry Numbers (RN), are exclusively present in patents, while they are absent from alternative informational sources, categorized as non-patent literature (NPL).
Therefore, it is essential to conduct a patent search prior to initiating a new research project to ascertain whether a particular compound has already been discovered and whether its utilization would not infringe upon the intellectual property rights of third parties. Furthermore, in the event that the compound results in a patentable invention, conducting a prior art search can speed up the patent application process, facilitate the formulation of more precise claims, and reduce the number of Office Actions between an applicant (or its representative) and a patent examiner, thereby lowering the overall cost associated with obtaining a patent.
In this brief report, the emphasis will be on the retrieval of patent information of a specific compound, serinol pyrrole.
Patent databases
In the field of chemical patents, the chemical structures disclosed in the description or claimed are often the most salient information (2). Patent documents present chemical structures in the form of a Markush structure, or a precise formula or as a list of specific compounds by means of chemical nomenclature.
Patent information is disseminated by national (or regional) patent offices or by independent producers through databases accessible on the Internet.
These databases can be categorized as either bibliographic or full text, with some offering free access and others requiring a subscription (3).
A non-exhaustive list of patent databases is available on the WIPO website (4). The list can be searched according to various parameters, including the chemical structure. As illustrated in Figure 1, the result of the aforementioned search is displayed.
It is evident that the number of available patent databases is restricted; a mere two are retrieved (Patentscope and PSS). However, the list is not exhaustive. It was observed that SureChEMBL, one of the most widely utilized databases, was not present. Similarly, PubChem was found to be missing.
Furthermore, it should also be noted that the link to the Patent Search and Analysis System (PSS), a free tool developed by the China National Intellectual Property Administration (CNIPA), is currently nonfunctional.
It is notable that Espacenet and Google Patents, widely regarded as the most accessible tools for conducting prior art searches, have not been included in the aforementioned list. This exclusion is primarily attributed to the absence of a designated field for searching chemical formulae within these platforms.
It is noteworthy that the text fields permit the entry of only the trade or IUPAC nomenclature of chemical compounds.
As an alternative approach, a search can be conducted by utilizing classification symbols, with a particular focus on C-Sets, created in a limited number of technical fields (4).
After determining which of the databases is the most suitable for chemical compound searches, the subsequent step is to plan how to use it.
In a recent article published on WPI, Joerg Ohms (5) argued for the necessity of a combination of all tested databases to obtain an almost complete search result.
The search was divided into three distinct phases. In the initial phase, SciWalker, SureChEMBL, and Patentscope were utilized. The initial two databases yielded distinctive results. In the subsequent phase, the FullPat algorithm was employed, resulting in the identification of 104 new families that were not identified in the initial step (it should be noted that 10 patent families were not found in FullPat). In the final phase, CAPLUS was implemented, resulting in the acquisition of nine unique patent families that were not found in any other database.
Given the inherent incompleteness of all patent databases, it is advisable to utilize all possible and available databases.
Example of finding patent information on a specific chemical compound: 2-(2,5-dimethyl-1H-pyrrol-1-yl)-1,3-propanediol (serinol pyrrole)
For the purpose of illustrating a patent search for chemical compounds, the serinol pyrrole will be used as a target compound.
Serinol pyrrole [2-(2,5-dimethyl-1H-pyrrol-1-yl)-1,3-propanediol] is an aromatic heterocyclic compound with a five-membered ring in which the heteroatom is nitrogen, prepared from a glycerol derivative, serinol, through the Paal-Knorr reaction with 2,5-hexanedione (7). This reaction is carried out in the absence of solvents and catalysts.
1. Search on PubChem
A search of the PubChem database using the keyword “serinol pyrrole” yielded a total of six results in the literature and one patent. A subsequent search on PubChem employing the IUPAC nomenclature (2-(2,5-dimethyl-1H-pyrrol-1-yl)-1,3-propanediol) yielded, as a result, seven patents and three scientific papers.
2. Search on Espacenet
As previously delineated, a search on Espacenet can be conducted using either keywords or classification symbols.
As illustrated in Table 1, the results obtained with Espacenet have been summarized.
As illustrated in Table 1, employing distinct nomenclature assigned to a specific compound can yield divergent outcomes, rendering keyword searching ineffective for a comprehensive patent information retrieval.
Moreover, the best classification symbols (C07D 207/325 and C08K 5/3415– see Figures 2 and 3) retrieved have a broad definition and encompass different substituted heterocycles.
Therefore, it can be concluded that Espacenet is not the most suitable database for this type of searching.
3. Search on Patentscope
It is possible to use identifiers (listed in https://pubchem.ncbi.nlm.nih.gov/compound/12339287) in Patentscope and SureChEMBL for a comprehensive search of patent documents (see Table 2).
A patent search in Patentscope yielded 28 results when utilizing the following query: CHEM:(YWGOFJMQFYROKZ-UHFFFAOYSA-N). The selection of the “Single Family Member” option yielded a total of 13 results.
The same results were obtained using the “Structure editor” (see Figure 4).
4. Search on SureChEMBL
SureChEMBL is a database that integrates patent data from five authorities (USPTO, WIPO, EPO, JPO and CNIPA). Two search methods are available in SureChEMBL. The first one involves the use of keywords or classification symbols (or a combination of them) and is supported by a query assistant. The second method involves the use of a structure search.
Following the drawing of the serinol pyrrole structure (cf. Figure 5) and after having clicked on the “Search” button, a list of 41 patents is generated. The results are not aggregated into patent families.
In order to access the complete list of patents, it is necessary to navigate to the designated section titled “Patents for Compounds.” (Refer to Figure 6).
Compound details and results are available at https://surechembl.org/chemical/17335758
A comparison of Patentscope and SureChEMBL reveals that they employ different compound extraction methodologies (8). Consequently, results obtained from these databases may exhibit variations.
Conclusions
The implementation of keyword searches to identify prior art related to chemical compounds has proven to be an ineffective method.
A thorough search can only be conducted by utilizing databases that allow for searches by structure or by chemical identifiers.



References and Notes
1. Asche G. “80% of technical information found only in patents”- Is there proof of this? World Pat Inf. 2017; 48: 16 – 28.
2. Downs G M, Barnard J M. Chemical patent information systems. WIREs Comp Mol Sci. 2011; 1(5): 727 – 741
3. Barbieri M. Hydrogen Peroxide Industrial Production: A Patent Landscape Study. Eng. Proc. 2024, 67, 88.
4. WIPO INSPIRE – Database Reports. Available at: https://inspire.wipo.int/wipo-inspire (Accessed on 1st October 2025)
5. Masson PK. Searching with combination sets in CPC: An efficient way to retrieve relevant documents. World Pat Inf. 2018; 54: S93 – S98
6. Ohms J. Current methodologies for chemical compound searching in patents: A case study. World Pat Inf. 2021; 66: 102055
7. Barbera V. et al. Domino Reaction for the Sustainable Functionalization of Few-Layer Graphene. Nanomaterials 2019; 9(11): 1-23
8. Ohms J. Validity of PubChem compounds supplied by Patentscope or SureChEMBL. World Pat Inf. 2022; 70: 102134