Thursday, 7 May 2015

Semantic Web for the Working Ontologist: chapter 5

Remarkably, I have managed to read my way through all 51 (okay 50-and-a-half) pages of chapter 5. I had to do it in very short bursts, as my ability to retain – or make sense of – even fairly straightforward information of this particular kind is extremely limited. I have no idea whether this is down to lack of experience and/or focused application, or something to do with the way my brain is made and functions.

Anyone care to enlighten me?

Chapter 5 is all about SPARQL, an acronym for SPARQL Protocol And RDF Query Language. Someone, somewhere made the brain-knotting decision to make the first letter of the acronym the first letter of the acronym for reasons that can only be to make it a homophone of sparkle. Because otherwise it would be called PARQL. Or xPARQL, WHERE x= the first letter of something that sounds less Escheric than SPARQL but makes more sense.

There follows 50 pages of basic SPARQL explanation and instruction, setting the scene with descriptions and illustrations of increasingly sophisticated and flexible tell-and-ask systems: from spreadsheets, via relational databases and into RDF and SPARQL, starting from the simple statement that "the basic idea behind SPARQL [is] that you can write a question that looks a lot like the data,with a question word standing in for the thing you want to know."

Most of the chapter is taken up with introducing the basic vocabulary and syntax of writing queries in SPARQL. So, readers gain an understanding of
  • SELECT queries: these have 2 parts "a set of question words, and a question pattern." IN SPARQL, any word can be a question word, as long as it has a ? directly before it. This means that, in order for the question word to do its job, it needs to be defined with a triple. Essentially, "question words' = variables
  • WHERE "indicates the selection pattern" and is written in what the authors call braces and I call curly brackets, because in my brain braces are what my daughter has on her teeth (I am, by-the-by, hugely impressed by the dexterity of orthodontists).
  • DISTINCT filters out duplicate results, and appears after SELECT.
  • FILTER which is used to define which query results will be retained, and which rejected "is a Boolean test, not a graph pattern". So, its operation is written in parentheses, rather than curly brackets. Also "you cannot reference a variable in a FILTER that hasn't already been referenced in the graph pattern" – ie the SELECT/WHERE part of the query.
  • OPTIONAL according to the W3C "tries to match a graph pattern, but doesn't fail the whole query if the optional match fails" (http://www.w3.org/2009/Talks/0615-qbe/). So, as in the example Allemang and Hendler give, if you want to find out the names of actors in a film, and when they died, the query won't exclude anyone who's still alive.
  • UNSAID enables you to exclude some data from the results: so, for example, if you wanted details of only the actors who were in a film and are still alive
  • ASK appears at the beginning of a query, instead of SELECT and is used in instances where a yes/no answer is required.
  • CONSTRUCT – which also appears instead of SELECT –  "introduces a graph pattern to be used as a template in constructing a new graph". In other words, it creates relationships between data items that might not have been previously linked.
  • ORDER BY comes at the end of the graph pattern and does what you'd expect: ie allows you to specify how you would like query results ordered.
  • DESC if this appears after ORDER BY, it organises data in descending order (ascending is default)
  • COUNT, MIN, MAX, AVG and SUM enable data to be aggregated. They appear in parentheses, follow SELECT and are followed by the word AS "followed by a new variable, which will be bound to the aggregated value."
  • GROUP BY allows data to be grouped by a specified variable: this variable "must already have been bound in the graph pattern" - ie: must have been defined with a triple.
  • HAVING allows you to isolate specific results from the overall query results.
  • UNION "combines two graph patterns, resulting in the set union of all bindings made by each pattern".
  • SERVICE "followed by a URL for the SPARQL endpoint before a graph pattern" specifies where the results of a query are to appear. 
  • GRAPH does the same sort of thing in the same sort of way, but for named graphs.

Other important things
  • The order in which triples appear in a SPARQL query has no impact on the results, but can impact on the speed the results can be delivered, as it will vary the amount of data that needs to be processed. So "order triples in a query so that the fewest number of new variables are introduced in each new triple".
  • SPARQL was developed for publishing to the web. "A server for the SPARQL protocol is called a SPARQL Endpoint." This "accepts queries and returns results" and "is the most web-friendly way to provide access to RDF data". It is also "identified with a URL".
  • "The namespace dc stands for 'Dublin Core' a metadata standard used by many libraries worldwide"
  • SPARQL has less need for subqueries than most query languages, because graph patterns "can include arbitrary connections between variables and resource identifiers". But they can still sometimes come in handy.
  •  Assignments – which are "expressed as part of the SELECT clause"and aren't supported under SPARQL 1.0, but apparently will be (or are) under 1.1 – enable a query to write "the value of a variable through some computation". In other words, it assigns "a value to that variable"
  • Queries can be federated: ie, an individual query can query multiple data sources.
So, having read that, could I now write a query in SPARQL. Er, no. But I might, slowly, hesitantly, and with repeated references back to this book or to the W3C SPARQL standard, be just about able to read an easy one… Which is progress.

And here are pictures of James Dean and Elizabeth Taylor who star in this chapter. No, really.



No comments:

Post a Comment