Encoding Patent Bibliographic References

Encoding bibliographic data of patent documents

= Introduction: =

This Recommendation provides a guidance for encoding bibliographic data of patent documents in a structured manner according to the TEI  recommendations.

Patent documents are one of the largest public data sources in the world. Over 70 million patent documents have been published to date [1]. They are also a unique source of information: according to the estimations of the WIPO (World Intellectual Property Organization), about two-thirds of the technical information revealed in patents is never published elsewhere [1].

Further, patent documents also have increasing technical and strategic importance - approximately 25% of all scientific or technical publications produced each year originate in patent offices around the world - most of which can be searched as any other kind of literature in databases. During the last 10 years the number of patent filings has been constantly bigger than the published scientific and technical journal articles [1]. This gives a clear indication about the importance of the patent literature as a source of documentation.



The importance of the patent literature has also increased tremendously due to the fact that most of the full text patent documents have been made freely available to the general public through web based applications, like esp@cenet, Google Patents, FreePatentsOnline or U.S. Patent & Trademark Office, which allow the users to easily search and retrieve the patent documents.

Therefore, there is an urgent need to define a encoding mechanism in the frame of the TEI standards which can be applied to this important source of documents.

= Definitions: =

According to the terminology used in the patent literature [2][3][4], for the purposes of this Recommendation, the term the term “patent” includes such industrial property rights as patents for inventions, plant patents, design patents, inventor's certificates, utility certificates, utility models, patents of addition, inventor's certificates of addition, utility certificates of addition.

For the purposes of this Recommendation, the expression “patent application” or “application for a patent” is a request made to the patent office for the purpose of obtaining a patent. Within the patent application process, inventors must give a complete and technical description of the invention that they wish to patent.

The term "patent documents" means documents containing bibliographic data and other information with respect to such industrial property rights as patents for inventions, plant patents, design patents, inventors’ certificates, utility certificates, utility models, patents of addition, inventors’ certificates of addition, utility certificates of addition, and patent applications therefor;

The terms “publication” and “published” are used in the sense of making available a patent document to the public for inspection or supplying a copy on request; or of making available multiple copies of a patent document produced on, or by, any medium (e.g., paper, film, magnetic tape or disc, optical disc, online database, computer network, etc.).

The term “publication level” is defined as the level corresponding to a procedural stage at which normally a document is published under a given national industrial property law or under a regional or international industrial property convention or treaty.

= Patents as open access source of information: =

The publication and accessibility to the new technical and scientific developments is crucial for scientific dialogue and plays a key role in promoting the innovation and the transfer of scientific knowledge. Many studies have indicated the importance of the open access to the scientific literature as a key tool to bring together people and ideas in a way that catalyses science and innovation [5][6][7]. Open access is defined as the practice of providing free of charge access to scientific literature for anyone via the internet.

From this point of view, the patent literature also plays a central role because the patenting process is also seen as an effective way of disseminating knowledge in open access mode [8].

A patent is essentially an agreement between a government and an inventor where the inventor agrees to publish the invention for the world to see (and thereby stimulate research and development) in return for a time-limited exclusive right on the invention.

Therefore, a key aspect in the patenting process is the requirement of making the invention known to the public, so improvement of the technology may occur. This is achieved by publishing the patent application and making it available to the general public. In most countries in the world the patent application is published and made available to the general public 18 months after filing the request to get the patent.

The public availability of the patent literature is guaranteed through the many free resources for searching and consulting granted patents and published applications, like for example Espacenet of the European Patent Office, USPTO of the United States Patent and Trademark Office, State Intellectual Property Office (SIPO) of the People's Republic of China, Patent Lens, DNA Patent Database, Google Patents, PatentScope of the WIPO, IPDL of the Japanese Patent Office etc...

= Specificities of the patent documents: =

When considering the encoding of patent documents, one should bear in mind that the information contained on these documents, which is of a very detailed and applied technical type, is written in a highly standardized format. When considering the bibliographic data, it should be taken into account that the patent document citations have also its own peculiarities. For example, in contrast with other scientific documents like articles or scientific papers, the title of a patent may be more of a descriptive title rather than a formal title, making this bibliographic element a non-defining element when encoding the bibliographic information associated to a patent document.

Therefore, most of the bibliographic references to patent documents do not include only common bibliographic elements like title or author, but other patent specific elements, like the name of the issuing authority, an application number, a publication number or a kind code.

This recommendation will explain how to encode this additional patent specific bibliographic elements in the TEI standards.

= Live cycle of a patent: =

Although procedures vary amongst patent offices, the process of getting a patent can be illustrated as follows [9]:


 * Filing: An applicant chooses a filing route, i.e. national, regional or international, and files an application to an patent authority. The initial filing is considered the “priority filing” from which further successive national, regional or international filings can be made within the “priority period” of one year under the Paris Convention for the Protection of Industrial Property.


 * Formal examination: The patent authority ensures that all administrative formalities have been complied. If the patent application meets the minimum formal requirement, it will be given a filing date and a serial number, called application number, by a notification to the applicant. These data elements are essential to uniquely identify the patent application.


 * Prior art search: In many countries, but not all, the patent office carries out a search of the prior art, i.e., of all relevant technological information publicly known at the time of filing of the patent application. Using extensive databases and expert examiners in the specific technical field of the application, a “search report” is drafted, which compares the technical merits of the claimed invention with that of the known prior art. In most countries, the patent application is published 18 months after the priority date, i.e., after the first filing date.


 * Substantive examination: If a prior art search report is available, the examiner checks that the application satisfies the requirements of patentability, i.e., that the invention is novel, inventive and susceptible to industrial application, compared to the prior art as listed in the search report. The examiner may either grant the patent application without amendments, may change the scope of the claims to reflect the known prior art, or may refuse the application.


 * Grant/refusal: The examiner may either grant the patent application without amendments, may change the scope of the claims to reflect the known prior art, or may refuse the application. Once the patent is granted a publication of the granted patent is made.


 * Opposition: Within a specified period, many patent offices allow third parties to oppose the granted patent on the grounds that it does not in fact satisfy patentability requirements.


 * Appeal: Many offices provide the possibility of appeal after the substantive examination or after the opposition procedure.



According to certain national industrial property laws or regulations or regional or international industrial property conventions or treaties,  the same patent application may be published at various procedural stages . Each of these published documents are called the "patent publications" and the contents of these publications can vary from stage to stage. Each of these "patent publications" are uniquely identified by a serial number and a specific code, the publication kind code, which informs about the specific stage of the procedure corresponding to the patent publication. If, at a particular procedural stage, a copy of the document is first made available to the public for inspection or copying and is then, at the same procedural stage, made available in multiple copies produced on, or by, any medium, only a single publication is considered to have been produced.

If, on the other hand, multiple reproduction results from a new procedural stage, this reproduction is considered to be a further publication of the document, even if the texts at the two stages are identical. The following example disclose a typical patent process where two publications of the same patent application are issued at different stages of the procedure:



Therefore, when dealing with a patent documents, two different bibliographic references can be found [10]:
 * 1. Reference to a patent application: the patent application is identified by the application number which is a number assigned to a patent application when it is filed. The application number is made up of two basic bibliographic elements:
 * country/organization code : is the code of the Patent Authority where the National, International or Regional application was filed. The country code is normally represented with two letters according to the standards fixed by the WIPO [11].
 * serial number : is a number issued by the Patent Authority when the National, International or Regional application was filed [4].


 * Further patent specific bibliographic elements could be needed to fully identify a patent application:
 * application kind code : is a code that identifies the kind of application [4]. Application kind codes are some times needed, for example, because some countries accept both regular and utility model applications, and use serial numbers differentiated only by the application kind. Typical values of the application kind codes are 'A' for patents, 'U' for utility models and 'P' for US provisional applications. The precise values of the application kind code depends however on the patent authority.
 * filing date : identifies the date when the of the patent application was filed to the patent authority. The calendar dates are normally represented according to the standards fixed by the WIPO [12].




 * 2. Reference to a patent publication: the patent publication is identified by the publication number which is the bibliographic reference that identifies the various publications of the patent application in the different procedural stages. The application number is made up of two basic bibliographic elements:
 * country/organization code : is the code of the Patent Authority where the National, International or Regional application was filed. The country code is normally represented with two letters according to the standards fixed by the WIPO  [11].
 * serial number : is a number issued by the Patent Authority publishing the patent document. All the patent publications made by a Patent Authority of one patent application in the different stages of the patenting process share the same serial number. This serial number is normally different than the serial number used for the patent application. The numbering system of published patent documents normally follows the standards fixed by the WIPO [13].


 * Further patent specific bibliographic elements are normally needed to fully identify a specific patent publication:
 * ­publication kind code : is a code that identifies the kind of publication and informs about the stage in the patent procedure when the document was published. The publication kind code is normally represented with a letter followed in some cases by a one-digit numerical code according to the standards fixed by the WIPO [14]. Usually, kind codes beginning with the letter A refer to documents published prior to the examination of the application (first publication level); kind codes beginning with the letter B refer to documents corresponding to granted versions (second publication level) and kind codes beginning with the letter C corresponds to versions of the patent corrected after re-examination (third publication level).
 * publication date : identifies the date when the  of the document was published by the patent authority. The calendar dates are normally represented according to the standards fixed by the WIPO [12]. Historically it was considered that a published patent document could be uniquely identified by the country/organization code, the publication number and the kind code. However, with the growing interest in the availability of corrected documents including their publication on electronic media (CD-ROM, Internet, etc.), this situation no longer necessarily applied. The use of the same combination on both the corrected document and the original version of the published document has occurred [2]. Therefore, the use of the publication date may also be needed to uniquely identify a specific publication.



= TEI encoding of patent bibliographic data: =

Patent bibliographic references, whether applications or publications, typically contain, at least, two main elements: (1) the name of the patent authority usually represented as a country or a regional code and (2) a document number which depends on each authority. Additionally, (3) a date, and (4) a kind code may also be present. These bibliographic elements are encoded under  as follows:

 [code of patent authority] [document number] [kind code] [date] 

­
 *  contains the following attributes:
 * @type: specifies the type of patent document. Possible values are (non-closed list):
 * - 'patent' for Patents of invention
 * - 'utilityModel' for Utility models
 * - 'designPatent' for Industrial design
 * - 'plantPatent' for Plant patents
 * @status: specifies whether the patent bibliographic reference is directed to an application or a publication. Possible values are (closed list):
 * - 'application' for a patent application
 * - 'publication' for a patent publication


 *  specifies the code of the patent authority. Contains the following attribute:
 * @type: specifies whether the patent authority is a national patent office or a supra-national patent organization. Possible values are (closed list):
 * - 'national' for national patent office, like the patent office of Germany or France
 * - 'regional' for supra-national patent organization, like for example the EPO or WIPO


 * < idno type="docNumber> specifies serial number assigned to the application or to the publication by the corresponding patent authority.


 *  specifies the kind code of the patent document. The attribute @scheme identifies the classification system or taxonomy in use


 *   specifies the date of the patent document. If the patent document is a patent application, this date would correspond to the filing date of the application, whereas if the patent document is a patent publication this date would correspond to the publication date. Contains the following attributes:
 * @type: specifies whether the date is a filing date or a publication date
 * - 'applicationDate' for the filing date of the patent application
 * - 'publicationDate' for the publication date of the patent publication
 * @when: supplies the value of the date in a standard form

These patent specific bibliographic elements can also be combined in the  with other bibliographic elements, like the title of the document, the author (encoded as inventor for patent documents), etc...

= Examples of encoding the patent citations: =

When citing patents in a text (article, research paper, etc.), there are different formats used, depending on the style of citation used. Following are the most common ways of citing patents and the corresponding TEI encoding (all the examples refer to citations of the patent shown in figure 6):




 * Citation by simply referring to the patent publication:


 * United States patent US 6,885,550 B1, issued April 26, 2005

 US <idno type="docNumber">6885550 B1</classCode> <date type="publicationDate" when="2005-04-26">April 26, 2005 </biblStruct>


 * Citation by simply referring to the patent application:


 * Application No. 09/648,405, filed on Aug. 24, 2000

<biblStruct type="patent" status="application"> <orgName type="national ">US<orgName> <idno type="docNumber">09/648405 <date type="applicationDate" when="2000-08-24"> Aug. 24, 2000 </biblStruct>


 * Citation according to the Chicago Manual of Style recommendations:


 * Williams, Dave. 2005. Screw less clip mounted computer drive. U.S. Patent 6,885,550, filed August 24, 2000, and issued April 26, 2005

<biblStruct type="patent" status="publication"> Screw less clip mounted computer drive Williams, Dave <orgName type="national ">US<orgName> <idno type="docNumber">6885550 2005            <date type="applicationDate" when="2000-08-24">August 24, 2000 <date type="publicationDate" when="2005-04-26">April 26, 2005 </biblStruct>


 * Citation according to the MLA recommendations:


 * Williams, Dave. "Screw less clip mounted computer drive." Patent 6,885,550. 26 April 2005

<biblStruct type="patent" status="publication"> Screw less clip mounted computer drive Williams, Dave <orgName type="national ">US<orgName> <idno type="docNumber">6885550 <date type="publicationDate" when="2005-04-26">April 26 2005 </biblStruct>


 * Citation according to the IEEE recommendations:


 * D. Williams,"Screw Less Clip Mounted Computer Drive." U.S. Patent 6,885,550, issued April 26, 2005

<biblStruct type="patent" status="publication"> Screw less clip mounted computer drive D. Williams <orgName type="national ">US<orgName> <idno type="docNumber">6885550 <date type="publicationDate" when="2005-04-26">April 26, 2005 </biblStruct>


 * Citation according to the Bluebook recommendations:


 * U.S. Patent No. 6,885,550 (issued Apr. 26, 2005)

<biblStruct type="patent" status="publication"> <orgName type="national ">US<orgName> <idno type="docNumber">6885550 <date type="publicationDate" when="2005-04-26">Apr. 26, 2005 </biblStruct>


 * Citation according to the CSE recommendations:


 * Williams D, inventor; 2005 Apr. 26. Screw less clip mounted computer drive. United States patent US 6,885,550

<biblStruct type="patent" status="publication"> Screw less clip mounted computer drive Williams D            <orgName type="national ">US<orgName> <idno type="docNumber">6885550 <date type="publicationDate" when="2005-04-26">2005 Apr. 26 </biblStruct>

= Publications of the patent during the patenting procedure: =

The TEI provides a powerful mechanism for encoding documents that have been reprinted multiple times (see section 3.11.2.4 of the TEI P5). This mechanism can be used in a natural manner in the case of patent documents republished during the different stages of the patenting procedure. For example, the exemplary patent provided in Figure 3 can be ancoded as follows:

<biblStruct type="patent" status="publication"> <orgName type="national">EP<orgName> <idno type="docNumber">1558513 A1</classCode> <date type="publicationDate" when="2005-08-03"/> B1</classCode> <date type="publicationDate" when="2009-09-09"/> </biblStruct>

This case discloses the different publications of the patent EP1558513 during the patenting procedure. The first publication on 3 August 2005 has the kind code “A1” indicating that it is a published patent application comprising the European search report issued after carrying out the search at the European Patent Office, whereas the second publication on 9 September 2009 has the kind code “B1” indicating that it is a publication after the patent application has been granted.

= References =


 * 1) Czajkowski, A.: The Importance and Role of Patent Information. PowerPoint presentation. Jerusalem (21 June 2010)
 * 2) World Intellectual Property Organization: Standard ST.1: Minimum data elements required to uniquely identify a patent document, WIPO (May 2001)
 * 3) World Intellectual Property Organization: Standard ST.9: Bibliographic data on and relating to patents and SPCs, WIPO (February 2008)
 * 4) World Intellectual Property Organization: Standard ST.13: Numbering of applications for IPRs, WIPO (February 2008)
 * 5) Bailey, Charles W., Jr.: Transforming Scholarly Publishing Through Open Access: A Bibliography. Houston, TX: Digital Scholarship, 176 p. (2010)
 * 6) Swan, Alma: Open Access and the Progress of Science. American Scientist, Volume 95, pp. 198-200 (May-June 2007)
 * 7) Alavi, M. and Leidner, D.E.: Knowledge Management and Knowledge Management Systems: Conceptual Foundations and Research Issues, MIS Quarterly, Vol. 25 No. 1, pp. 107-46 (2001)
 * 8) Commission Communication (2012), Towards better access to scientific information: Boosting the benefits of public investments in research, Commission Recommendation, (17 July 2012)
 * 9) World Intellectual Property Organization: Guide to Using Patent Information, WIPO (2012)
 * 10) Lopez, P.: Automatic Extraction and Resolution of Bibliographical References in Patent Documents. In: Advances in Multidisciplinary Retrieval, Lecture Notes in Computer Science, Vol. 6107, pp. 120-135, Springer, Heidelberg (2010)
 * 11) World Intellectual Property Organization: Standard ST.3: Two-letter codes for the representation of states, other entities and organizations, WIPO (November 2011)
 * 12) World Intellectual Property Organization: Standard ST.2: Manner for designating calendar dates, WIPO (May 1997)
 * 13) World Intellectual Property Organization: Standard ST.6: Numbering of published patent documents, WIPO (December 2002)
 * 14) World Intellectual Property Organization: Standard ST.16: Identification of different kinds of patent documents, WIPO (May 1997)