Database Download

The databases downloadable here (from version 3.0 on) have been created using the new database schema (The schema and database information is available here). For the previous versions of the database, click here. The previous database schema is available here.

Note that the database file named "WHOLEDB" contains the entire database except the ENTITY table, which the new database provides. The individual tables are also provided separately. Users can download the entire database at once or the individual tables separately depending on their needs. The file names consist of four parts: database name _ R (or A) _ table name _ PubMed to date. The letter R represents that the database was generated with standard SemRep options, whereas A denotes that it was generated with the anaphora resolution option.

The new database schema differs from the previous one (versions 2X) in the following ways:

  1. We simplified the schema significantly by removing CONCEPT, CONCEPT_SEMTYPE, PREDICATION_ARGUMENT, and SENTENCE_PREDICATION tables. The relevant contents of these tables can still be derived from PREDICATION if needed.
  2. A GENERIC_CONCEPT table has been added to the schema. This table contains generic concepts, as indicated by SemRep. The concepts that are not in this table are considered novel.
Starting with version VER30, we plan an annual release of the database (VER30_A and so on) of predications generated by SemRep using the sortal anaphora resolution. This feature is expected to increase the number of predications slightly, while also making others more specific. For sortal anaphora resolution in SemRep, see our BMC Bioinformatics paper.

New Item For the latest version to date, semmedVER31_R, we have generated anew the entire database, to resolve an issue reported in the previous version in which some predications were found to have the wrong combination of subject name and subject semantic type or object name and object semantic type. semmedVER31_R has also been enhanced with two new columns, SECTION_HEADER and NORMALIZED_SECTION_HEADER, added to the SENTENCE table. These columns are used to store the section information of structured abstracts if the original citations provide that information.




Database name: semmedVER31_R (Processed up to December 31 2017) New Item

Semrep version: Regular semrep version 1.7
Number of citations processed: 27851419
Number of predications: 93876632
* This database was obtained from SemRep results without the anaphora feature turned on.

TABLE NAME START DATE END DATE Size Download linksha1summd5sum
Entire Database 1865 Dec 31 2017 17.1G download download download
CITATIONS 1865 Dec 31 2017 136M download download download
ENTITY 1865 Dec 31 2017 37.1G download download download
GENERIC_CONCEPT N/A N/A 129K download download download
METAINFO N/A N/A 778 download download download
PREDICATION 1865 Dec 31 2017 2.41G download download download
PREDICATION_AUX 1865 Dec 31 2016 3.05G download download download
SENTENCE 1865 Dec 31 2017 11.5G download download download




Database name: semmedVER30_R (Processed up to June 30 2017)

Semrep version: Regular semrep version 1.7
Number of citations processed: 27283927
Number of predications: 91567597
* This database was obtained from SemRep results without the anaphora feature turned on.

TABLE NAME START DATE END DATE Size Download linksha1summd5sum
Entire Database 1865 June 30 2017 16.3G download download download
CITATIONS 1865 June 30 2017 129M download download download
ENTITY 1865 June 30 2017 34.0G download download download
GENERIC_CONCEPT N/A N/A 129M download download download
METAINFO N/A N/A 764 download download download
PREDICATION 1865 June 30 2017 2.34G download download download
PREDICATION_AUX 1865 June 30 2016 2.96G download download download
SENTENCE 1865 June 30 2017 10.8G download download download




Database name: semmedVER30_R (Processed up to December 31 2016)

Semrep version: Regular semrep version 1.7
Number of citations processed: 26737750
Number of predications: 89230566
* This database was obtained from SemRep results without the anaphora feature turned on.

TABLE NAME START DATE END DATE Size Download linksha1summd5sum
Entire Database 1865 Dec 31 2016 15.8G download download download
CITATIONS 1865 Dec 31 2016 129M download download download
ENTITY 1865 Dec 31 2016 30.8G download download download
GENERIC_CONCEPT N/A N/A 129M download download download
METAINFO N/A N/A 764 download download download
PREDICATION 1865 Dec 31 2016 2.24G download download download
PREDICATION_AUX 1865 Dec 31 2016 2.89G download download download
SENTENCE 1865 Dec 31 2016 10.5G download download download




Database name: semmedVER30_A (Processed up to December 31 2016)

Semrep version: Regular semrep version 1.7
Number of citations processed: 26723252
Number of predications: 89173359
* This database was obtained from SemRep results with the anaphora feature turned on.

TABLE NAME START DATE END DATE Size Download linksha1summd5sum
Entire Database 1865 Dec 31 2016 16.2G download download download
CITATIONS 1865 Dec 31 2016 129M download download download
COREFERENCE 1865 Dec 31 2016 450M download download download
ENTITY 1865 Dec 31 2016 30.8G download download download
GENERIC_CONCEPT 1865 Dec 31 2016 129M download download download
METAINFO N/A N/A 764 download download download
PREDICATION 1865 Dec 31 2016 2.29G download download download
PREDICATION_AUX 1865 Dec 31 2016 2.87G download download download
SENTENCE 1865 Dec 31 2016 10.5G download download download