Are our Wikibase QueryServices about to mess up two millennia of historical dates?

It was in February 2019 at a conference dinner of medievalists in Jena when I was first confronted with the calendar problem which Wikibase had been posing ever since it had digested its first Julian calendar dates. I had given a Wikibase demonstration earlier that day and now I was sitting next to a medievalist who was ready to destroy me: “Wikibase”, he stated, “is a genuine disaster without anyone understanding it.”

I demanded to hear why that should be the case and the man asked me to show him just one medieval date from Wikidata. I had activated my phone and landed on biography c. 1500.

“See that small print?” he asked, “these dates are all noted as Gregorian before 1584.”

The qualifier was indeed peculiar. Why would they set a Gregorian date before 1582 and then mark it as such? “Well, you know, that the Gregorian calendar was only introduced in 1582, do you?!”

Of course I knew. I am an 18th-century person and Britain had introduced this calendar as late as 1752. The reform had by that time to close a gap of 11 days. But I could also point out that Wikibase allowed the fast correction: “You can easily switch between the calendars, and the machine will actually understand the implications on any timeline” I showed him my screen:

The man was in agony: “Too late. Wikidata is already in big shit”. I realised that I was lacking the full astronomical background and that I did not know the story of these peculiar Wikidata redactions.

Why we needed the Gregorian calendar in the first place

Both, the Julian calendar of 46 BC and the superior Gregorian calendar first introduced in 1582, are approximations. A solar year is one circle around the sun whilst the globe is spinning at about 365.2422 revolutions per year — year after year our planet ends its tour with a different slice pointing towards the sun. We are, in fact slowing down, thanks to the friction which the moon’s gravitation is generating in a constant movement of ebbs and tides, but that is another story. 365.2422 turns per year is our present spin more or less exactly but difficult to generate in a procedural long term pattern of constant adaptations.

The Julian calendar, as it was introduced under Julius Caesar in 46 BC, added one day every four years — in the so called leap years — a rule that boiled down to an additional quarter of a day per year. The approximation of 0.25 days against 0.2422 missed its mark just by 0.0078 days per year, less than a hundredth of a day — negligible one might think — but that one hundredth of a day is a day in a hundred years. In a millennium this discrepancy is piling up to 7.8 days, in two millennia to half a month, moving Christmas further and further away from the longest night until we can finally celebrate Christmas and Easter on the same day; and that was why the Gregorian calendar was finally introduced in 1582 with its far more complex regime of leap years:

  • add one day every four years (as you did under the Julian calendar to create a year of 365.25 days)
  • omit every leap year that is divisible by 100 to get a lower number
  • let this leap year, however, happen if it is divisibly by 400 in order to get a year of 365.2425 days.

The Gregorian calendar reduced the aberration to a surplus of 0.0003 days per year — it will now take 3333 years until we need an additional day to be back in tune with the solar year. The Vatican in Rome adopted the calendar on the 4th of October 1582 — jumping over night into Friday the 15th of that year. Christianity, however, was at that point no longer ready to obey a Papal decree. Eastern Orthodox churches stayed on the Julian calendar right into the the 20th century; Protestant territories and realms would decide one by one. Prussia (with its complex ties into catholic Poland adopted the new calendar in 1612 whilst most of the other Protestant territories stayed Julian for the next 88 years. The United Kingdom took the step in 1752. Lithuania, Russia, and Greece were to switch as late as 1915, 1918 and 1923 respectively.

The following map is from reddit:

When Europe switched from Julian to Gregorian calendar.
byu/coneyislandimgur ineurope

…and it is far from getting the full complexity. The following list gives the growing FactGrid table:

Europe was fragmented. Travelling across Germany in 1699, you could date your letters switching back and forth at every customs house on your tour:

Map of the Holy Roman Empire 1648. Wikimedia Commons

How we solved the problem — and created an even bigger one

Wikibase is a bright software. The tools — the QueryService and QuickStatements — are (or were) not immediately that bright, and that caused the mess the medievalist had noted. QuickStatemens, the tool for mass imports, simply did not offer a Julian calendar switch before February 2023. Instead it would mark all dates as Gregorian without asking — which, looking backwards, was not all that bad…

…why could we all live with the erroneous labelling of Julian dates as Gregorian on Wikidata? Because Wikidata was with this negligence basically doing what we all had been doing up to that point.

Johann Sebastian Bach was born on the 21st of March 1685. Germany’s central database, the GND, is stating this date up until now without the slightest remark on the calendar. The date is Julian because Eisenach’s church register was keeping records in the Julian calendar for another 15 years. The composer himself will not have shifted his birthday to the 31st of March in 1700, the year of the great reset. We all ignore the shift and keep copying dates from documents without any interference. Calendar experts might be interested in the “real” day and they can create Julian/Gregorian calendar matches in those rare cases in which they have to create an exact timeline of events with dates of both calendars.

The Wikidata community had been unaware of the problem. The Gregorian label on all the Julian days was foolish, but the input was actually stabilising our historical tradition as the QueryService will not do anything odd with dates that are entered as Gregorian.

I was far from seeing these advantages after my conversation of 2019 and that was why I warned the PhiloBiblon team in 2022 that QuickStatements would label all their Julian dates as Gregorian against all better intentions once they were imported to FactGrid. Charles Faulhaber immediately asked their programer, Josep Maria Formentí, whether he could not take a look into QuickStatements to solve that little problem. Weeks later Josep introduced the /J-switch that is now available to mark any date as Julian in mass inputs:

+ 1751-06-16T00:00:00Z/11/J

You can now feed thousands of medieval or early-18th-century British dates into your Wikibase and your machine will present all these dates in timelines in perfect synchrony with Gregorian dates. This is extremely nice if you are editing a correspondence whose partners were signing their letters under various calendars. Your machine will give you the exchange of letters in their true course.

So why the alarm?

Wikibase is an intelligent software; it brings objectivity into your statements. Feed a Julian date into your Wikibase and that day will be noted as Julian on the Wikibase itself.

Things get messy wherever we retrieve Julian dates from the QueryService, since this is where the production of funny (and eventually of erroneous) dates will be begin. The QueryService will convert all Julian dates into mathematically correct Gregorian dates.

Martin Luther is known to have died on the 18th of February 1546 — under the Julian calendar, that needs not to be stated, and our Wikibase is giving that date without any calendar stamp on it. But ask the QueryService for Luther’s birthday and it will tell you that the church reformer actually died on the 28th of March, 10 days later — a Gregorian calendar date (without indication) (no big issue you might think, now that you know).

And now think of masses of data which we will be moving between Wikibases in the brave new world of “federates Wikibases”. If there are “Julian” dates among them, then these will get secret additional days wherever they are extracted with the help of a regular SPARQL-Query on the QueryService.

This is what will happen to Luther’s date of death as it is now no longer a subject of safe copying. We will see it in an increasing number of variants — namely as:

  • 18 February 1546 (Greg.) — mistaken QuickStatements input artefact
  • 18 February 1546 (Jul.) — the historically correct date
  • 28 February 1546 (Greg.) — unorthodox but correct Wikibase QueryService output
  • 28 February 1546 (Jul.) — Wikibase output mistakenly saved as Julian
  • 10 March 1546 — the previous converted to Gregorian
  • 20 March 1546 — the previous after the next im- and export

and so on and so on.

Can we stop the wave of uncontrolled additions of days on Julian calendar dates?

I am not quite sure how. We need a QueryService that will never ever offer a historical date without the corresponding calendar statement (now that we have a machine that does both calendars).

But not only the QueryService is posing a problem here. Our Wikibases should have a third option, because our documentary evidence is usually lacking calendar information. Eisenach’s church register of 1685 is using the Julian Calendar (without further notice), that is something we can determine — but we cannot say what calendar an author of a typical letter was using in 1685 if that date comes without a localisation. Our documents do not tend to have calendar statements on them.

What we need here is a third — a “calendar format unknown” — option. Tits complicated, I am afraid.

Links and more

  • Header Image from Ολυμπία δώματα, or, An almanack for the year of our Lord God 1752 (London: Printed by T. Parker, for the Company of Stationers, 1752), from the digitisation at Archive.org
  • English Wikipedia List of adoption dates of the Gregorian calendar by country https://en.wikipedia.org/
  • See also: Maniphest T207705, Implement the Extended Date/Time Format Specification, https://phabricator.wikimedia.org/T207705
  • Lydia Pintscher, calendar model screwup, 30 Jun 2015. [https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/thread/Y7OEHUYV66DHRVZ6JCSODWAYZ25SLUHM/ https://lists.wikimedia.org/hyperkitty]
  • Julian and Gregorian dates from Wikidata, question asked on https://opendata.stackexchange.com/, Apr 18, 2018 at 0:33 [https://opendata.stackexchange.com/questions/12723/julian-and-gregorian-dates-from-wikidata https://opendata.stackexchange.com/]

One Reply to “Are our Wikibase QueryServices about to mess up two millennia of historical dates?”

  1. Thank you for this writeup. As a Wikidata contributor, I’ve encountered the calendar option but haven’t had much opportunity to use it in the main Wikidata UI yet. There really should be a more straightforward way to convert timestamps to an alternative calendar within a SPARQL query, though I don’t know that the built-in XML Schema data types should really be involved in automatic conversions. The Julian–Gregorian conversion is straightforward compared to some other calendars, some of which aren’t even naturally represented by a lexicographically ordered format mimicking ISO 8601. This seems like a job for custom SPARQL functions, or something more sophisticated involving the items that Wikidatans are creating about individual days of the year in various calendars (https://www.wikidata.org/wiki/Wikidata:WikiProject_Hijri_Calendar ).

    The “stabilizer” property in https://blog.factgrid.de/archives/3541 reminds me of the solution that the OpenHistoricalMap project (which I also contribute to) has adopted for soft-introducing EDTF into the software stack. EDTF parsing is still an unsolved problem for OHM, so we’ve created a parallel set of properties like https://wiki.openstreetmap.org/wiki/Key:start_date:edtf that can annotate the “resolved” YYYY-MM-DD dates with a more flexible EDTF date.

    Speaking of OpenHistoricalMap, I wonder if it could help to mitigate the uncertainty around whether a given source used Julian or Gregorian dates. To the extent that these decisions were based on an event’s location or a source’s publication place, one could consider the historical boundaries in OHM, which are linked to Wikidata items, thence to FactGrid items, which can have statements about a jurisdiction’s adoption of the Gregorian calendar. It’s a lot of indirection but possibly more scalable than relying on each individual date to be properly annotated.

Leave a Reply

Your email address will not be published. Required fields are marked *