Using: Difference between revisions

From ottgaz.org
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 78: Line 78:


==== Dealing with dates ====
==== Dealing with dates ====
Wikidata has a rich set of date formats, many of which are described [https://www.wikidata.org/wiki/Help:Dates here]. Ottgaz poses certain challenges even to this diverse body of representations, however, despite its aspiration to universality. It's critically important to include as many dates as possible, because the presence of dates enables much richer querying. Therefore I've attempted to include as much information as possible in date format, qualifying it using plain text notes where necessary.
Wikidata date formats specify precision. For the most part, Ottgaz dates are precise to the year. Sometimes, however, Sezen uses century precision. To encode "16. yy." (sixteenth century), use <code>1600C</code>.
Part of the problem is that Sezen is not clear about the source(s) of his dating, and himself uses multiple formats. In an earlier version of Ottgaz, I attempted to map dates onto the reigns of Sultans, using the [https://perio.do/ Periodo] tool. I describe that effort [[Periodizing the Ottoman Gazetteer|here]].
Wikidata distinguishes Julian and Gregorian dates, and defaults to Julian for earlier periods. It's not clear which calendar Sezen used. My uploading of data has not been absolutely consistent. This is an area for future revision. Other calendars could and should be used in order to convey the information in its native format.
Wikidata distinguishes Julian and Gregorian dates, and defaults to Julian for earlier periods. It's not clear which calendar Sezen used. My uploading of data has not been absolutely consistent. This is an area for future revision. Other calendars could and should be used in order to convey the information in its native format.


==== Sequence of property/value pairs ====
==== Sequence of property/value pairs ====
Quickstatements does not handle successive qualifiers well. I describe the problem [https://www.wikidata.org/wiki/Wikidata_talk:Tools/OpenRefine#Adding_a_sequence_of_identical_property/value_pairs_with_different_qualifiers here]. The solution is to configure statements in the schema dialogue thus: Editing mode: Add or merge; Matching strategy: Property, value and qualifiers. Then, "upload statements to Wikibase" rather than export to Quickstatements.
Quickstatements does not handle successive qualifiers well. I describe the problem [https://www.wikidata.org/wiki/Wikidata_talk:Tools/OpenRefine#Adding_a_sequence_of_identical_property/value_pairs_with_different_qualifiers here]. The solution is to configure statements in the schema dialogue thus: Editing mode: Add or merge; Matching strategy: Property, value and qualifiers. Then, "upload statements to Wikibase" rather than export to Quickstatements.

Latest revision as of 16:42, 13 July 2023

There are three primary uses for Ottgaz at present.

Reference and disambiguation

Ottgaz provides persistent unique identifiers for Ottoman places. This is especially helpful when placenames are ambiguous, for instance when the same name is used for several places. For example, there are dozens of Ottoman and Turkish places named Yenice. Although we possess various lists of these places (English wikipedia, Turkish wikipedia), we have no consistent system for disambiguating them. By adding an Ottgaz Qid to a written text, you can specify exactly which place you are discussing. "He was sent to Yenice, south of Bursa (Q7472)."

Quantitative analysis

Something

How to

Openrefine reconciliation

I used Openrefine to prepare most of the data I've added to Ottgaz. I will not describe parsing lists (such as Sezen) into Openrefine here, though I have limited documentation of this process elsewhere. This section assumes that you know how to use Openrefine and Docker, and describes the tools and process for reconciling and uploading data.

Run a local reconciliation interface

Download the OpenRefine Wikibase package, which you can run on Docker. The documentation provided with the package is useful.

Add Wikibase instance

Use this manifest to add Ottgaz to OpenRefine.

{
    "version": "2.0",
    "mediawiki": {
      "name": "Ottgaz",
      "root": "https://ottgaz.org/wiki/",
      "main_page": "https://ottgaz.org/wiki/Main_Page",
      "api": "https://ottgaz.org/w/api.php"
    },
    "wikibase": {
      "site_iri": "https://ottgaz.org/entity/",
      "maxlag": 5,
      "max_edits_per_minute": 60,
      "tag": "openrefine-${version}",
      "properties": {
        "instance_of": "P1",
        "subclass_of": "P2"
      },
      "constraints": {
        "property_constraint_pid": "P2302",
        "exception_to_constraint_pid": "P2303",
        "constraint_status_pid": "P2316",
        "mandatory_constraint_qid": "Q21502408",
        "suggestion_constraint_qid": "Q62026391",
        "distinct_values_constraint_qid": "Q21502410"
      }
    },
    "oauth": {
      "registration_page": "https://ottgaz.org/wiki/Special:OAuthConsumerRegistration/propose"
    },
    "entity_types": {
      "item": {
         "site_iri": "https://ottgaz.org/entity/",
         "reconciliation_endpoint": "http://localhost:8000/${lang}/api",
         "mediawiki_api": "https://ottgaz.org/w/api.php"
      },
      "property": {
         "site_iri": "https://ottgaz.org/entity/",
         "mediawiki_api": "https://ottgaz.org/w/api.php"
      },
      "mediainfo": {
         "site_iri": "https://ottgaz.org/entity/",
         "reconciliation_endpoint": "http://localhost:8000/${lang}/api"
      }
    },
    "editgroups": {
      "url_schema": "([[:toollabs:editgroups-commons/b/OR/${batch_id}|details]])"
    }
  }

Begin reconciling

I used the head placenames in Sezen as the key index to create new items in Ottgaz. (You will have to "add standard service" using the http://localhost:8000/en/api URL.)

Organize schema

This is the most complicated step. In order to proceed, first I had to create an item for every status that Sezen used, and an item for every hierarchy that Sezen used. I did this without doing any clustering or organizing of redundancy; I will merge these later from within wikibase, once the whole dataset is in wikibase.

I needed to do a lot of parsing as I moved items into wikibase. The biggest distinction was between seat and region. Mostly, this distinction was obvious. In many cases, however, Sezen's classification is ambiguous--and this probably reflects the ambiguity of Ottoman space itself.

Dealing with dates

Wikidata has a rich set of date formats, many of which are described here. Ottgaz poses certain challenges even to this diverse body of representations, however, despite its aspiration to universality. It's critically important to include as many dates as possible, because the presence of dates enables much richer querying. Therefore I've attempted to include as much information as possible in date format, qualifying it using plain text notes where necessary.

Wikidata date formats specify precision. For the most part, Ottgaz dates are precise to the year. Sometimes, however, Sezen uses century precision. To encode "16. yy." (sixteenth century), use 1600C.

Part of the problem is that Sezen is not clear about the source(s) of his dating, and himself uses multiple formats. In an earlier version of Ottgaz, I attempted to map dates onto the reigns of Sultans, using the Periodo tool. I describe that effort here.

Wikidata distinguishes Julian and Gregorian dates, and defaults to Julian for earlier periods. It's not clear which calendar Sezen used. My uploading of data has not been absolutely consistent. This is an area for future revision. Other calendars could and should be used in order to convey the information in its native format.

Sequence of property/value pairs

Quickstatements does not handle successive qualifiers well. I describe the problem here. The solution is to configure statements in the schema dialogue thus: Editing mode: Add or merge; Matching strategy: Property, value and qualifiers. Then, "upload statements to Wikibase" rather than export to Quickstatements.