Thursday 21 May 2015

Semantic Web for the Working Ontologist: chapter 7

I am now much, much further into a technical book than I've ever been before in my life. That feels like an achievement: even if I don't come away from this with any more than a fragmentary grasp of all things RDF and beyond…

So – this week's chapter is concerned with RDF Schema. It is all about sets which, somehow, feels more comfortable and intuitive than all the RDF graphs, even though I've forgotten all the mathematical set-related signs (a very long time ago I did a pure maths with stats A level very badly). It explores the following questions. "Which individuals are related to one another and how? How are the properties we use to define our individuals related to other sets of individuals and, indeed to one another?"And it answers the question of how to express those relationships in a way that allows inferred triples to be constructed from asserted triples.

And "schema"? The term schema was, according to Oxford Dictionaries, coined in the 18th century. It's derived from the Greek "skhÄ“ma … [meaning] …'form, figure'", and the definitions the dictionary provides are
1. technical A representation of a plan or theory in the form of an outline or model:
2. Logic A syllogistic figure.
3. (In Kantian philosophy) a conception of what is common to all members of a class; a general or essential type or form. 

Or, in RDFS, according to Allemang and Hendler,  "The schema is information about the data." It is information about information. "The key idea of the schema in RDF", they continue, "is that it should help provide some sense of meaning to the data."

It does this "by specifying semantics using inference patterns". Which means that it expresses relationships in triples, using defined terms. So "the basic construct for specifying a set in RDFS is called an rdfs:Class. So for a subject say :FloweringPlant you'd have the predicate rdf:type and the object rdfs:Class. There's also rdfs:subClassOf, and, importantly – because it can include verbs as well as nouns – rdfs:subPropertyOf.

"In general, rdfs:subPropertyOf allows a modeler to describe a hierarchy of related properties". The more specific a property, the lower down the hierarchy it sits; the more general, the higher up. So, "whenever any property in the tree holds between two entities, so does every property above it". In other words, if entity 1 is the subproperty of entity 2 above it, it must also be a subproperty of entities 3 - n above entity 2. And we only need to assert the first relationship in a triple for the rest to be inferred.

"RDFS," say the authors, "'extends' RDF by introducing a set of distinguished resources into the language." I'm assuming that they're using "distinguished" to mean "distinct", rather than either "Very successful, authoritative, and commanding great respect" or "Dignified and noble in appearance or manner" (Oxford Dictionaries again). But what do I know?



Apparently this is an image of "Distinguished Gentleman's Ride London". Glossing over any issues with the word "gentleman" for the purposes of this caption, it's possible there may be something I don't understand about the word "distinguished".
Meanwhile, back in the book I'm supposed to be getting my head round, the authors are introducing the other key concepts for this chapter: rdfs:domain and rdfs:range. These describe a property "that determines class membership of individuals related by that property".

In the margin of p130 (for this, dear reader, is how far we have come), I've scrawled in big pencilled caps REREAD THIS, as I evidently failed to understand it first time round. This time it seems fairly clear. Essentially the use of the terms "domain" and "range" is inspired by their use in maths, where "the domain of a function is the set of values for which it is defined, and the range is the set of values it can take." In RDFS, "A property P can have an rdfs:domain and/or an rdfs:range." And the two terms provide information on how a "P"is to be used: "domain refers to the subject of any triple that uses P as its predicate, and range refers to the object of any such triple".

EDIT 28/05/2015. After reading this blog, Paul Rissen provided the following, much improved explanation for domain and range




It's also significant that there's no way in RDF to say that something isn't a member of a particular class: this means that "there is no notion of an incorrect or inconsistent inference". So modelers have to be careful in defining set relationships.

Most of the rest of this chapter is concerned with application of these concepts/terminologies, to show, for instance, how you can relate entities by using triples to define them as subsets of one another: doing this both ways ensures that an item defined as a subset of one will automatically be considered a subset of the other, or if they're related, but not hierarchically to each other, by creating another entity (or set) that sits higher up the hierarchy and that they original two entities are subsets of.

Conceptually it's comparatively easy – applied IRL situations, I imagine, requires a lot of careful thought.

Next week… RDFS-Plus. So I guess that'll be like this week, only more so…




No comments:

Post a Comment