Friday week before last, we received the news that so many working groups had been eagerly awaiting: the 4Memory consortium (of historical studies) will become part of the Nationale Forschungsdateninfrastruktur (NFDI), the German National Research Data infrastructure.
This is exciting news for FactGrid, just weeks before its fifth birthday. We will be acting as an official repository for historical data in the upcoming NFDI structure. German projects can now make a good case that FactGrid is the optimal platform for their data.
Changing the rules of our present research data management
The German National Research Data Infrastructure aims to bring transparency and sustainability to all research fields, from microbiology to computational linguistics. Whether researchers are still collecting data entirely for themselves in private Word documents and Excel spreadsheets, or whether they are working on digital platforms that are more or less designed like conventional books, designed to be read and looked at – they will face new questions in their research grant applications: Do they produce data? Do they correct publicly available data? If so, the new questions will be: How do they make sure that others can actually work with their data? The idea that new information ends in footnotes of books and articles will not convince the funding institutions any longer. A CSV or JSON data file located on a library server will not do either. Linked Open Data is the only data that is easily reusable – that is what Wikidata has made clear. New platforms are therefore needed – platforms approved by the National Research Data Infrastructure.
The DFG that pushed the process has acted wisely. The different research disciplines had to determine how they would respond to its call for action. They had to create or join umbrella organisations in order to submit proposals for further funding. NFDI4Culture was one of the first groups in the German humanities to receive funding; Text+, for all textual studies, was also among the first arrivals, in 2021. The historical studies collective founded the 4Memory consortium and received the green light in the second round on Friday 4th. Funding will start in March 2023. The Gotha Research Center the 4Memory “participant” on behalf of the FactGrid community in this process.
An international resource as part of a national infrastructure?
It took us a while to feel comfortable with the invitation to participate in this process – back in 2020. At that time we had created a little more than 100,000 items with a handful of participants. Wikimedia Germany was our natural partner. The German National Library was the first major player to collaborate with us in a joint exploration of the Wikibase software. FactGrid from the beginning had invited international collaboration, with projects from France, the United States, Spain, Hungary, and Switzerland. Could we risk a nationalisation of the platform?
The project partners on FactGrid were open to the idea: It would benefit everyone to take the step. The process would open doors to important discussions. We could discuss data standards used worldwide and be able to think of international alliances on this new stage.
Our asset? – Wikibase
Following the NFDI debates,we soon understood why we had been asked to join: We were using Wikibase, the software platform that all members of the nascent consortia were discussing behind the scenes as the very software that could build the bridges between the working groups.
- Wikibase invites cooperation. Its data modelling is uniquely flexible.
- Versioning of all editing processes enjoys unprecedented transparency.
- Wikidata demonstrates that seemingly incompatible fields of knowledge can be managed together in a single graph database.
- Getting data from a Wikibase platform is as easy as it is to put data into it.
- Wikibase instances can be federated – we can diversify the scenery without using one single Wikibase instance.
FactGrid was ahead of its time. We were running a functional Wikibase platform while other groups were simply proposing to evaluate the option.
And yet still at the beginning
Over the last two years we have more than quadrupled to 457,000 items. FactGrid is doubling almost every year and there is no reason to believe that this will change in the near future. Projects that are presently preparing data uploads are in the scope of the entire current platform; with our upcoming projects we remain on a global trajectory – we are becoming more international, the platform is learning new languages.
The NFDI process comes just in time because, despite all that growth, we are still right at the beginning, and in urgent need of technological development, which is where we put the focus in our 2020 and 2021 grant proposals. We are not alone in this situation. Wikidata, our elder sister, is still in its initial phase – a peculiar statement, given the fact that Wikidata is celebrating its 10th birthday these days with more than 100 million database objects.
Wikidata is massive. It has rocked the library world as a revolutionary development, but despite that it is still an unknown giant hiding somewhere behind the Wikipedia curtain. Nobody has ever spoken of the data-technical Pentecost miracle which Wikidata actually is. The very name of the project has remained hidden: “Wikidata – you mean Wikipedia, don’t you?”
It is understandable that Wikidata has remained a virtually unknown child. There is neither a search tool leading a wider public to Wikidata information nor is this information readable once you have reached it. The SPARQL query service is a nightmare for normal users. Even if you know how to read computer code– which most of us do not–, how do you find out what information the database can supply? Right, by asking your first specific question with knowledge of the content (the very knowledge that you still do not have). One day an internet-savvy user contacted us with the note that our Query Service had crashed. The Query Service seemed fine; I suggested a video call to get an idea of what the man was seeing on his screen – and it turned out that he was looking at the regular search script. “Send it off, press that blue button!” – He did and received the requested data set. “Ah, I had seen this code stuff but thought it was an error message…”
Wikibase needs two enhancements: An attractive search interface as simple as the Google search box (though with an additional advanced search engine and a SPARQL-search option on top) and browsing software that generates information from the Wikibase or, better still, from several combined Wikibases. The present Wikibase query engine leads you right to the item-pages in the default Wikibase presentation mode, where you can then manually correct or amplify information, but no one seriously enjoys the reading experience. Magnus Manske’s Reasonator, Markus Krötzsch’s SQID, Michael Ringgaard’s KnolBrowser, and Bruno Belhoste’s FactGrid Viewer have shown how Wikibase information can be presented: in pages that present their information concise, well structured, fast to access and easy to exploit. So far, however, all four browsers have remained patchwork solutions. They do not amalgamate platform information in greater depth, and (this is the larger issue) they are as yet not coupled to intelligent search engines. The problem is that we have not yet arrived at independent new resources, at resources whose pages are Google landing points, with pages that amalgamate information from various Wikibases such as Wikidata and FactGrid, and that keep their users on the platform – providing in depth information on request, generating visualisations on the spot, offering downloads of information which users have been accumulating on their tour.
We will get multilingual and attractive Wikibase aggregates. They will integrate information from various resources and they will offer this information in any language requested, identical across all the cultural and political divides. The German NFDI will have to create prototypes of such instruments if they should actually federate Wikibases in a new broader research oriented structure, even if that should start as a national structure.
Opportunities and risks
The time for a broader research data infrastructure is ripe. Researchers are still handling “their” data on personal hard discs; they copy and paste dates from Wikipedia pages when they could have complete data sets ready to download. Data correction remains fortuitous. Do you write an email to the producers of an online catalogue which you have been accessing with the request to correct a mistake? Do you give the correct date in a footnote of your next article and expect librarians (and Wikipedians) to take note of your work? – We need online resources that allow researchers to correct mistakes right on the screen, in real time; and these resources should be the same ones, which users employ to organise their research. Wikibase is the software that can help to make this possible. How will we get there? Wikibases will have to become the go-to scholarly resources to consult; that is when they will turn into the workbench for the very projects that are using their data.
The landscape of NFDI-consortia comes with its own internal risks. We will need resources to do highly specialised jobs: resources to store and mine texts, resources for the machine readable information which we need in order to make 3D reproductions of objects, and we need resources for historical statements. FactGrid is focusing on this latter need. It cannot become the all-in-one service for historical research. We need the services of other consortia and we should offer our particular services to the other consortia wherever they handle historical statements.
The much more delicate risk of fragmentation looms on the international stage: Will the German expert on French history find herself asked to store her data on a German platform since her funding is German – while her French colleagues with whom she shares the research objects will be delivering their data into a French database? We could, of course, harvest information from 150 national research data infrastructures but that will not provide the same experience for those who generate the information. Working on FactGrid you are about to notice when a colleague in France or China adds to your data. You will contact the colleague with a note of delight about the archival sources that had escaped your notice. Wikibases are joint platforms and should be used as such.
The question of a plurality of national research data infrastructures becomes even more thorny as soon as we look beyond the privileged horizon. We need global platforms to provide equal access to research and to the debates surrounding research. Wikimedia has created Wikidata with the explicit aim of having a software compound on which users from all over the world can work together – accessing and expanding the same pool of global information. We, the international scientific community, the heirs of the international respublica litteraria, shouldn’t fall behind the Wikimedia project.
The fact that FactGrid, an explicitly internationally oriented resource, has entered the NFDI4Memory structure is an interesting development – a chance to get more than one National Research Infrastructure on board.
- NFDI4Memory Homepage https://4memory.de/
- NFDI 4Culture, Workshop Report, 1st Wikibase Workshop, 23 February 2021 https://nfdi4culture.de/news-events/news/wikibase-workshop.html
Header image source: Robert Charles Dudley (British, 1826–1909) Interior of One of the Tanks on Board the Great Eastern: The [Transatlantic Cable] Cable Passing Out 1865/66, Watercolor over graphite with touches of gouache (bodycolor) https://www.metmuseum.org/art/collection/search/383834