PhiloBiblon: From Siloed Databases to Linked Open Data via Wikibase: Proof of Concept

We are very pleased to announce a pilot project funded by the U.S. federal government’s National Endowment of the Humanities (NEH): “PhiloBiblon: From Siloed Databases to Linked Open Data via Wikibase: Proof of Concept.” It will begin June 1, 2021. and end May 30, 2022 and will be hosted by FactGrid.

The project is focused on PhiloBiblon. a forty-year-old database for the study of the medieval history and literatures of the Romance cultures of the Iberian Peninsula: Portuguese and Galician-Portuguese, Castilian, and Catalan. It contains four subsidiary databases:

  • BETA: Bibliografía Española de Textos Antiguos: Medieval texts in Spanish.BIPA: Bibliografía de la Poesía Áurea: Golden Age (16th-17th c.) Poetry in Spanish.
  • BITAGAP: Bibliografia de Textos Antigos Galegos e Portugueses: Medieval texts in Galician, Galician-Portuguese, and Portuguese,
  • BITECA: Bibliografia de Textos Antics Catalans, Valencians i Balears: Medieval texts in Catalan.

The project is designed to solve one of the most vexing problems facing long-standing digital projects: maintainance of the software platform. PhiloBiblon started out in 1975 as an ancillary database of the Dictionary of the Old Spanish Language project, carried out at University of Wisconsin, Madison, by Lloyd Kasten and his student, John Nitti. In Madison the database management system (DBMS) used was FAMULUS, created, ironically, in Berkeley in 1964, for the Pacific Southwest Forest and Range Experiment Station. Since then, technological transformation has been a constant: from the CD-ROM discs of ADMYTE (Archivo Digital de Manuscritos y Texts Españoles ) to a first web version in 1997 to the current 2014 version. Each transformation has usually required multiple grant applications to agencies and foundations, especially to NEH.

PhiloBiblon currently runs under Windows on Revelation Technology’s MultiValue database, OpenInsight. PhiloBiblon’s 1987 implementation was designed on Revelation G., the ancestor of OpenInsight, by John May, a graduate student in History of Science who eventually left the academy for a business career. John has maintained and enhanced the PhiloBiblon DBMS for more than 35 years, building it out to encompass ten relational tables (texts, witnesses, primary source manuscripts and imprints, copies of imprints, persons, institutions, toponyms, and secondary references). Among them these ten tables contain 1246 data elements (fields), 98 controlled vocabulary lists with more than 3000 properties, 110 search indexes, and 30 data entry screens.

PhiloBiblon and its relational DBMS existed before Tim Berners-Lee invented the Worldwide Web at CERN in Geneva in 1989. It was evident from the advent of the first really useful commercial browser, Netscape, that the WWW offered a vastly superior vehicle for making information available to students and scholars, much better than print and CD-ROM. In PhiloBiblon’s first web version (1997), data were exported from the OpenInsight DBMS and uploaded to a web server at Berkeley in HTML format. Users could search only for authors, titles, or keywords; and the result of the search was just a list of manuscripts or editions, each of which had to be opened, one-by-one, to find the item of interest.

In the current version OpenInsight data is still exported—usually every two or three months—and uploaded to a web server at Berkeley (and to a mirror site at the Universitat Pompeu Fabra in Barcelona) in ten separate files, one for each table, in the more advanced XML format. On the server the eXtensible Text Framework (XTF) program, created by the California Digital Library of the University of California, parses each file into individual records, indexes them, and serves them to users as a result of a search request from the PhiloBiblon web site. The search mechanisms are much more powerful, offering not only keyword searching but also a set of search boxes tailored to each entity.

Thus for texts (works): keyword (simple search), author, title, incipit, explicit, associate person, date and place of composition, subject.

For manuscripts and editions: City and library holding the item, shelfmark, date and place of production, printer and publisher, scribe and patron, previous owner or other associated person:

The time lag between data input and its appearance on the web is annoying. Moreover, the process required to export data and upload it to the web to the web is is neither elegant nor efficient. Aside from this and from the problem of maintaining and enhancing both the Windows DBMS and the web software, the most urgent issue facing PhiloBiblon is its status as an information silo, with no organic relationship with other information sources. The web 3.0, the semantic web, is designed to make use of Linked Open Data and the Resource Description Framework (LD/RDF) in order to make possible automatic links to other information resources, like the Virtual International Authority File (VIAF).

Since 2014 the PhiloBiblon research teams have prepared a series of unsuccessful grant proposals, separately as well as in collaboration with other projects, to NEH and to Spanish, Catalan, and European agencies and foundations with the goal of funding the transformation of PhiloBiblon into an LD/RDF resource.

Last year, instead of proposing the creation ex professo of a new web-based DBMS for PhiloBiblon, on the advice of our neighbors at Stanford University, the University of California, Davis, and the international library consortium OCLC, we decided to explore a radically different solution: to incorporate PhiloBiblon into the wiki world. PhiloBiblon already cites Wikipedia constantly, more than 1400 times in nine different languages. Of more interest than Wikipedia as a model, however, is the more structured but still open environment of Wikidata. Wikidata, however, is too open. PhiloBiblon requires more control over the individuals who can contribute to it.

This led us to FactGrid, which is ideal for our purposes. Open only to members, it offers a perfect sandbox for the PhiloBiblon staff to explore the relationship between PhiloBiblon’s highly structured data model and the elegant and infinitely extensible model of triplestores based on Q# entities and P# properties. We are enormously grateful to Olaf Simons not only for his generous offer to make it available for this purpose but also for agreeing to serve on the Advisory Board of the current NEH project.

When this project ends May 31, 2022, we hope to have shown that the Wikidata model is viable for PhiloBiblon over the long term and that we can make use of its standard input processes, modified as necessary, to map PhiloBiblon’s 421,000 records into the corresponding FactGrid entities and properties, creating new ones as necessary. This work will be carried out primarily by data analyst Adam Anderson, whose academic specialization is Assyriology and cuneiform studies. He will also study Wikibase’s LD access points to and from libraries and archives and test the Wikibase data export module for JSON-LD, RDF, and XML on PhiloBiblon data.

TABLA BETA BITAGAP BITECA BIPA
ANALYTIC (witnesses) 14692 52084 12239 89926
REFERENCES 7270 21558 5976 472
PERSONS 7423 32309 3473 3418
GEOGRAPHY 1814 4759 840 211
INSTITUTIONS 794 3297 585 4
LIBRARIES 915 455 420 119
MANUSCRIPTS & IMPRINTS 5168 5886 1971 1572
COPIES OF PRINTED BOOKS 4157 1146 1473 137
SUBJECT HEADINGS 339 34 149 126
WORKS (texts) 6034 31962 6173 89913
TOTAL 48606 153490 33299 185898 421293

In addition, software engineer Josep María Formentí (Barcelona), after evaluating the Wikibase data entry module and report format, will create prototypes of more user-friendly query and data entry screens and report formats.

All of this work will be carried out in collaboration with the twenty members of the PhiloBiblon volunteer academic staff and, we hope, numerous volunteers from the Hispano-medievalist community.


Image: Rueland Frueauf the Elder (1440–1507) The Education of the Infant Christ (1506) Wikimedia Commons.

Leave a Reply

Your email address will not be published. Required fields are marked *