Thursday 23 April 2015

Semantic Web for the Working Ontologist: chapter 3

Now I'm onto "RDF—The Basis of the Semantic Web"

It's when it starts getting down to this sort of nitty-gritty stuff and I have to internalise new meanings for familiar (or semi-familiar) words that I feel in need of intellectual flotation aids. But I'm gonna dive straight in and hope to avoid a belly flop and every other variety of lousy extended metaphor.

Chapter 3 starts with an explanation of what RDF – the Resource Description Framework – is for, which is "managing distributed data", so that anyone can be able not only "to make a statement about any entity", but also to "specify any property of an entity". An "entity" is known, in this context, as a resource.

RDF is designed to enable that to happen by setting rules for how the properties of an entity are defined/expressed. These are in three parts, called a "triple" and follow the form subject – predicate – object, where the predicate is the quality linking subject and object. This framework allows for numerous ways to describe a resource/entity.

But this practice is only effective if entities have an agreed Uniform Resource Identifier (URI): "(which specifies things like server name, protocol, port number, file name etc) to locate a file (or a location in a file) on the Web". This "provides a global identification for a resource that is common across the web".

Ok - think I'm still hanging on in here. URIs tend to be long, so conventions have evolved for abbreviating them in print. The one the authors are using is called qnames.

Qnames have two parts - a "namespace" and an "identifier", (written Namespace:Identifier).
The namespace signifies what type of thing the identifier is; the identifier specifies the entity's, well, identity. Each part of each triple has its own qname, so that the different properties of a resource can be combined in different ways. So, in the RDF table the authors provide titled "Geographical Information as Qnames", one row is

geo:Scotland     geo:PartOf   geo:UK

And, in the following table, titled "Triples Referring to URIs with a Variety of Namespaces", we find entries including

lit:Shakespeare          lit:wrote          lit:KingLear
bio:AnneHathaway   bio:married     lit:Shakespeare

and

lit:Shakespeare         bio:livedIn     geo:Stratford

For these to work, there need to be a range of standardised namespaces, so these have been specified by W3C. Incidentally, it's a complete coincidence that all these references to Shakespeare are appearing in a blog published on his birthday.

The authors discuss one – rdf:type – at some length. "rdf" indicates that "type" (which translates into human English as "is an instance of") is an identifier used in RDF (resource description framework), rather than in RDFS (the RDF Schema language) or OWL (the Web Ontology Language).

I managed to follow that. But now, we're on to "higher-order relationships", where, for instance, we want to say that someone says something about something. The example the authors give is "Wikipedia says Shakespeare wrote Hamlet".

While we can, as the authors show, express the statement "Shakespeare wrote Hamlet in 1601" in three triples:

"bio:n1       bio:author                  lit:Shakespeare .
 bio:n1        bio:title                      "Hamlet" .
 bio:n1        bio:publicationDate   1601" .
saying that Wikipedia says Shakespeare wrote Hamlet requires a whole other layer of information, rendered here as
"q:n1  rdf:subject  lit:Shakespeare ;
           rdf:predicate  lit:wrote ;
           rdf:object  lit:Hamlet .

web:Wikipedia m:says q:n1 ."
The authors suggest readers notice that the "reification triple" doesn't necessarily mean that Shakespeare did write Hamlet, just that Wikipedia says he did.

What I noticed is that I am being gently introduced to some coding conventions.Things like q:n1, that are dropped in without explanation. And the space before the ".". This is explained a page or so later, during the discussion of "alternatives for serialization", including N-Triples – which refer "to resources using their fully unabbreviated URIs", making them difficult to print on paper – and Turtle: the method used in the rest of the book.

Turtle uses qnames, so before using it to express triples, each (local) qname needs to be linked to its (global) URI, using the form

"#prefix rdf: http//www.w3.org/1999/02/22-rdf-syntax-ns#"

Although I'm slightly confused, as above that is an example from an earlier illustration in the book:

"#prefix mfg:
<http://www.WorkingOntologist.com/Examples/Chapter3/Manufacturing#> (the link doesn't work, btw).

So I'm not clear whether we need the < & > as in HTML coding, or not… Anyone care to enlighten me?

Nearly at the end of the chapter - just a few more things I need to remember before I attempt chapter 4: "Semantic Web application architecture" (urk!).

1. Turtle uses contractions/abbreviations so that
a. when several triples share both subject and predicate, they can be represented economically. For example
lit:Shakespeare b:hasChild b:Susanna .
lit:Shakespeare b:hasChild b:Judith .
lit:Shakespeare b:hasChild b:Hamnet .
can be boiled down to
 lit:Shakespeare b:hasChild b:Susanna, b:Judith, b:Hamnet .
Or, to represent an ordered list (birth order, in this case)
 lit:Shakespeare b:hasChild (b:Susanna b:Judith b:Hamnet) . 
b. rdf:type is usually abbreviated to "a", so
lit:Shakespeare a lit:Playwright
rather than
lit:Shakespeare  rdf:type  lit:Playwright .
 The authors then quickly refer to RDF/XML - a method of representing RDF serialisations for the web, and also to "blank nodes", which allow for the representation of resources with no Web identity. They give the example of Shakespeare's mistress, the inspiration for sonnet 78 (at which point I got distracted and went off to read about poetry, as that sonnet was inspired by a young man – most likely Henry Wriothesley or William Herbert – not a woman). I will probably regret that by half way through chapter 4, but heigh-ho.

Anyway, "if we don't want to have an identifier for the mistress … RDF allows for a "blank node" or bnode for short" which "is indicated by putting all the triples of which it is a subject between square brackets as in
[ rdf:type bio:Woman;
        bio:livedIn  geo:England ]
Or, as it should be
 [ rdf:type bio:Man;
        bio:livedIn  geo:England ]
 And that's it for chapter 3. If I hadn't made a commitment to blog my way through this book, I'd probably have given up already…


 That's me, that is.

Wish me luck for chapter 4…

No comments:

Post a Comment