Making HayleyWorld: a new form of biography

Thursday, 20 August 2015

Semantic Web for the Working Ontologist: Chapter 15

This week: "Expert modeling in OWL". In this – the penultimate chapter – Allemang and Hendler give provide a brief outline of four subsets of OWL 2, each of which is tailored to particular modeling requirements. They also describe, tantalisingly, how the OWL 2 standard "is rich in modeling constructs that go beyond the scope of this book".

OWL 2 is backward compatible with 1, which means that all the modeling techniques taught in this book remain valid, and that the additional constructs are additions to/refinements of, rather than replacements for OWL 1 constructs and practices.

Each of the four OWL 2 subsets uses the "same set of modeling constructs": ie: the same properties and classes, the same syntax. They differ in that each is tailored to serve a different purpose

OWL 2DL – D is for decidability
For projects where decidability is key. A system "is decidable if there exists an effective method such that for every formula in the system the method is capable of deciding whether the formula is valid (is a theorem) in the system or not." In other words, it's designed for applications where precise and discrete definitions of things/entities are crucial.

OWL DL is designed to enable modelers to create algorithms that can "determine which classes [in a given model] are equivalent to other classes, which classes are subclasses of other classes, and which individuals of are members of which classes."
OWL 2 EL – E is for executable
For projects that are mostly about federating data from a variety of sources in order to provide an "integrated picture of some sort of domain". In this type of modeling – used, for instance, by search engines that need to provide good rather than perfect answers (partly because they need to take into account the fact that humans ask good rather than perfect questions) – "the model describes how information can be transformed into a uniform structure".

So OWL 2 EL is designed to "improve computational complexity".

OWL 2 QL – Q is for Query
This subset of OWL 2 is designed for working with/leveraging relational databases, that require "fast responses to queries" applied to huge, specified data sets.

"Queries against an OWL 2 QL ontology and corresponding data can be rewritten faithfully into SQL".

OWL 2 RL – R is for Rules
This subset of OWL 2 is restricted to enable compatibility with rules-based processing. This is particularly useful for multipart properties – where properties relate to each other in ways other than the hierarchichal. For instance, the concept of "aunt". I can only be an aunt if I am the sister of someone who is a parent. "Aunt" is thus a multipart property because it is made up of more than one property: parent and sister in that – diasy-chained – order.

Interestingly, multipart predicates were left out of OWL 1 "because they were thought to cause undecidability". But more recent work has demonstrated that "under certain conditions" this need not be the case.

From multipart predicates (which the authors illustrate with an example model of "'A child should have the same species as its parent"')

":Elsie :hasParent :Lulu
:Lulu :hasSpecies :Cow"

So "we can infer that

:Elsie :hasSpecies :Cow"


Elsie and Lulu. Or two other cows.

Incidentally, the authors also briefly discuss metamodeling ("using a model to describe another model), recommending the use of the Class-Individual Mirror pattern for this.

Next week - um, not sure yet. Might write about anxiety, connectedness and correspondence. Or something else. May be suffering from undecidability…

Thursday, 6 August 2015

Semantic Web for the Working Ontologist: Chapter 14

In this week's chapter, Dean Allemang and Jim Hendler cover "Good and bad modeling practices". To be clear, some of the "bad" modeling practices they include, are bad only in the context of the semantic web: they are standard in object systems and only become problematic when ported into an open environment where Anyone can say Anything about Anything, as opposed to a closed data system.

The chapter opens by outlining three ways to start model-building

"find models on the Web that suit your needs" - that way you don't end up wasting time and other resources redoing work that somebody's already been done
"leverage information assets that already have value for your organization": the information you're working with is likely to already be "vetted"
start from scratch, using "standard engineering practices … including the development of requirements definitions and test cases.

Whichever route you choose, there are questions that must be answered: is this model useful? What do we need this model to do? "This poses two issues for the modeler: How do I express my intended purpose for a model? How do I determine whether a model satisfies some purpose?' One way to do this is to frame "competency questions" – ie, questions the model will need to answer – before developing it.

The AAA assumption adds a massive element of complexity to the entire process, because

"On the Semantic Web, it is expected that a model will be merged with other information, often from unanticipated sources. This means that the design on a semantic model must not only respond to known requirements … but also express a range of variation that anticipates to some extent the organization of the information with which it might be merged."

All a bit mind-boggling, really.

The advice the authors give for dealing with this involves quoting the March Hare from Alice in Wonderland "say what you mean and mean what you say"

"Say what you mean and mean what you say"

Which translates into ensuring that

the names you use for entities are meaningful
you follow simple conventions (such as starting class and individual names with uppercase letters, property names with lowercase letters and naming classes with singular, rather than plural, nouns)
you plan carefully in order to distinguish classes from individuals (this can be tricky)

Once assembled, your shiny new model can be tested by by ensuring it answers the competency questions framed beforehand. Analysing "the inferences that the model entails" can determine "whether it maintains consistent answers to possible competency questions from multiple sources.

The remainder of the chapter is taken up with analysis of four common modeling errors:

Rampant classism -– where everything is defined as a class, even if it should be an individual
Exclusivity – the flawed assumption that "the only candidates for membership in a subclass are those things that are already known to be members of a superclass".
Objectification – where a system is built for the web that "has the same meaning and behaviour as an object system", which doesn't take into account "AAA, Open World and Nonunique Naming"
Creeping conceptualization – when good modelers go bad (oh, ok, just get carried away) and "the idea of 'design for reuse' gets confused with 'say everything you can'" as modelers try to anticipate every conceivable use for their model and model all conceivable uses.

Ultimately, the authors say the way of telling if you've built a model that is useful and conforms to the assumptions inherent in the Semantic Web is "by making sure that the inferences it supports are useful and meaningful". Which seems slightly tautological, but hey, what do I know?

Next week I'm on holiday, and, in a shock break with tradition, am staying somewhere with no wifi. So I won't be blogging. Or – probably – coping with the lack of connectivity. Back in a fortnight…

Thursday, 30 July 2015

Move along now: the reader journey gains momentum

If you've stumbled onto this blog about my digital biography/"zoeography" of William Hayley for the first time, you may want to read my initial post about the project – as well as the two I've linked to above – before going any further with this one, so it makes sense.

A few weeks since my last post on the HayleyWorld reader journey, and some of the questions I posed have been quietly answering themselves while I've been getting on a) with writing and b) other stuff. Lots of other stuff.

So, here's the latest version…

1. Reader journey design: phase 3

As you can see from comparing phase 3 (above) to phase 2 (below)

Reader journey design: phase 2

… I've moved on both conceptually and creatively, and also started using Omnigraffle - which is much more suitable (and fun) than Word for this sort of thing.

What's also interesting is the way a slight delay in technical implementation (down to my brilliant development partner Contentment's commercial commitments) has facilitated this surge forwards. Had we been further forward with the technical and visual development of the HayleyWorld app, I would have written more commentary on the edited extracts from Hayley's Memoirs that sit at the centre of the zoeography. But my thinking about how it all fits together, and the different ways in which the app will be personalised and function create the illusion of a relationship between Hayley and his readers wouldn't be anywhere near as advanced.

Next week - Semantic Web for the Working Ontologist: chapter 14 (you can read my take on chapter 13 here).

In the meantime, hurrah for obstacles.

Thursday, 23 July 2015

Semantic Web for the Working Ontologist: Chapter 13

After all last week's, er, 'excitement', it's back to my attempt to grasp the theory of data modelling, RDF and its various iterations, OWL and associated concepts.

First off, I need to say there's something in chapter 13 – title: Ontologies on the Web—putting it all together – that I really don't get, and that's the use, in the section on Dimension checking in QUDT (the acronym for a specific ontology: Quantities, Units, Dimensions, Types) of vectors as a way of representing signatures for compound quantities.

When I say I don't get it – I understand the principle, that "QUDT defines eight basic quantities", including "length, time and mass", and that the process of dimensional analysis requires that units used to measure compound quantities require a signature showing how the … Oh…

Hang on … what I was going to say here was that I didn't understand how the vectors providing the signature for how these compound quantities are comprised from the eight basic quantities/units are calculated. Why, say, if "we write our vectors in the order [length, mass, time] then the vector for velocity is [1, 0, -1]", even though the authors explain how "the magnitude of the vector in that component begin the exponent of the the base quantity in the formula for the compound quantity".

I'd read that page at least four times before giving up. But it's just this second clicked. The vector for velocity is [1, 0, -1] because velocity = length/time. The zero's there because mass isn't a factor. Duh. Remarkably, I did pass maths A level. But only just, and with a lot of tutoring. And it was a loooooong time ago now…

Chapter 13 introduces and outlines the uses of three online ontologies: QUDT, Good Relations and ChEBI – Chemical Entities of Biological Use – which "is published as part of the Open Biological and Biomedical Ontologies Foundry (OBO)".

Developed by NASA, "the goals of QUDT are to provide

A standardized consistent vocabulary, focused on terminology used in science and engineering.
A set of consistent coded identifiers, for human and machine use.
1. nd machines, avoiding problems with uncertainty and misinterpretation.
A collection of foundational vocabularies that can serve a variety of applications.
A framework designed for extensibility and evolution, but model-based (instead of just a typical dictionary) and governed."

Outside of engineering, it's widely used for online currency conversion.

The Good Relations ontology is used for commerce. It provides search engines with rich information about products and services, enabling them to assess the relevance of the the product or service to the "location, time, identity, profile, and preferences of the person behind the query" (from GoodRelations).The authors provide an example of how it can be used by a nail bar that also offers massages, both to publicise its services and to enable browsers to work out which service, provided by comparable beauty salons, offers the best value massage per minute.

And CheBI is

"a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. The term ‘molecular entity’ refers to any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity. The molecular entities in question are either products of nature or synthetic products used to intervene in the processes of living organisms." (from the ChEBI website)

Incidentally the ChEBI website features an "Entity of the Month". At the time of writing it's butyl anthranilate, a naturally occurring substance that scientists are currently testing as a safe insect repellent to protect fruit – and, potentially, people – from being eaten/bitten by flying insects…


ChEBI's Entity of the Month for July 2015. And some blueberries.

The authors also introduce the term owl:imports which allows one ontology to refer explicitly to another, and enables inferences to be drawn from the imported ontology within the ontology into which it has been imported.

That's it for this week. And, just so you know, there are three, short chapters to go…

Saturday, 18 July 2015

Don't have a heart attack in Westfield

"I will call an ambulance on my mobile.… [pause]. It isn't going through. You try too"

We – me, my 15 year old daughter and a Westfield security guard who claims to be a first aider, are outside some toilets in the shopping centre after my daughter has collapsed, dizzy, nauseous and with pains in her head, neck, arms and legs.

There is no signal on the mobile phone, and, for several minutes the guard keeps pressing redial while refusing to ask someone to call an ambulance on a landline, because…er, I'm not entirely sure why, but it's something to do with the fact that she's still breathing and just about conscious. He's suggesting we catch a taxi to the Hammersmith Hospital, fails to do anything about my request for a wheelchair, but responds to my agitation by suggesting that if I want to complain about him, I should go to the concierge and do so.

If you're tempted to shop 'til you drop, don't.

For some strange reason – oh yes, I'm sitting with a sick, scared and distressed teenage girl collapsed across my lap – I choose not to do that, but instead shout at him until he calls his manager. Eventually he does, and Nubia, who arrives shortly afterwards, is calm, decisive and effective. A wheelchair is brought, and we race my semi-conscious child to the taxi-rank, where a taxi driver takes us, reluctantly – "you should have called an ambulance" – "YES I KNOW THAT" – to the Hammersmith Hospital a few minutes away.

The Hammersmith. Highly recommend the Urgent Care Centre there

Staff at the hospital, especially the lovely and on-the-ball healthcare assistant Ifrah – and at St Mary's where they have to send us later because she's under 16 – are wonderful. As are Jonathan, who helps us at Westfield and leaves his mobile number so I can let him know that my daughter's ok, the mixed-race boy who dashes into the hospital to bring us a wheelchair and the young Muslim woman and her two older male companions who are walking past the Hammersmith, and help us into the hospital, providing encouragment as well as physical support, after the taxi driver drops us across the street and my daughter falls on the pavement and can't get up.

I didn't get their names, but am hugely grateful. This kindness of strangers was deeply touching.

***

Luckily, there was nothing seriously wrong: my daughter had had a bad reaction to the beta blockers a specialist had prescribed a couple of days previously as migraine prophylaxis (a standard approach).

In this instance, Westfield's poor emergency response caused no long term harm. But – now my daughter has recovered, and I've caught up with everything I need to do (okay, apart from the housework) – I will complain.

Because if I don't, one day someone might have a stroke or a cardiac arrest in the centre, And the outcome is unlikely to be so happy.

Next week I'll be back with chapter 13 of Semantic Web for the Working Ontologist.

Thursday, 9 July 2015

Semantic Web for the Working Ontologist: Chapter 12

Data modeling gets intricate

This week, it's "Counting and sets in OWL". Intricate stuff for a newbie data modeler, and takes me back to A level maths (I scraped a pass. With tutoring).

As the title suggests there's a lot of set theory here. The chapter kicks off with a reminder about restrictions and how they can be used

"to define notions like Vegetarian"
"to sift information from a table"

and

"to manage groups of people"

Intersections, unions and other class relationships

The rest of the chapter then covers how combining restrictions with the set theory language provided in OWL enables complex and precise relationships and identities to be defined. OWL set theory language includes

intersections: an intersection of two or more classes = a new class owl:intersectionOf
unions: all the members of all the classes combined owl:unionOf
complements: " the complement of a set is the set of all things not in that set". owl:complementOf. This needs to be used with care as say, the complement of the set of, say, everyone in the Iranian women's football squad includes not only everyone in Iran who isn't in the national football squad, but everything else in the known (and unknown) universe: animate, inanimate and other*.
disjoints - two sets with no members in common. eg: ":Meat owl:disjointWith :Fruit"

and is complemented (but not in the set theory sense) by OWL's use of cardinalities: these refer to "the number of distinct values for a particular property some individual has".

Anyone can (still) Say Anything about Any topic

Given the Open World Assumption – Anyone can say Anything about Any topic – this can be a tricky value to establish in many instances. However, there are some numeric values that do stay stable for long enough for, at least, upper and lower limits to be articulated. For example, the number of people on each team at the start of a league football match will be n=11 (owl:cardinality 11). The number of people in the squad from which the team is picked will be >11 (owl:minCardinality 12).

"Cardinality refers to the number of distinct values a property has".

n=11

There is also, naturally, owl:maxCardinaity

Using owl:oneOf closes a set, but, given the Open World assumption, it should only be applied judiciously, "in situations in which the definition of a class is not likely to change – or at least not change very often".

For an example the authors take us back to the planets in the solar system:

ss:SolarPlant ref:type owl:Class
owl:oneOf (ss:Mercury ss:Venus ss: Earth ss:Mars
ss:Jupiter ss:Saturn ss:Uranus ss:Neptune) .

"When combined with owl:someValuesFrom… [this] … provides a generalization of owl:hasValue. Whereas owl:hasValue specifies a single value that a property can take, owl:someValuesFrom combined with owl:oneOf specifies a distinct set of values that a property can take."

This enables an inference to be drawn that tells us that "some triple from a small set holds, but we don't know which one". Owl:oneOf, is also commonly used with owl:AllDifferent to specify that the members of a set are, as one might expect, all different from each other. This is essential, because we can't assume that things in the same set aren't called by different names.

Can't get no satisfaction

The remainder of the chapter (Yep. It was another long one this week.) is devoted to contradictions, unsatisfiable classes, inferring class relationships and reasoning with individuals and with classes.

Contradictions exist when data modeling produces results that are "logically inconsistent": when a model reflects something impossible in the real world. Because "a model is a description of the world" it "can be mistaken".
An unsatisfiable class is one with no members.
Inferring class relationships involves the application of OWL's set language and other restrictions to make inferences about classes.
Reasoning with individuals and classes. Where reasoning with individuals "draws specific conclusions about individuals in a data stream", class reasoning "determines how data are related in general". The combination of the two, in the authors' words means "we have a powerful system that smoothly integrates general reasoning with specific data transformations".

Next time "Ontologies on the Web–putting it all together". That'll be in a fortnight. Not sure, as yet, what's coming next week…

*In other words, ex:AnimateThings may not be owl:complementOf ex:InanimateThings. There might be a third class, say, ex:SchrodingersThings…

Thursday, 2 July 2015

Next steps on the reader journey

When I last blogged about working out how readers will journey through my HayleyWorld zoeography, I was pondering how to:

personalise a reader's journey while still retaining an element of authorial control
make sure that readers can take different paths through the narrative and still read/experience something that feels like a coherent story
enable readers to feel like they're getting to know William Hayley in a way that mimics encountering him in real life.

A few months on, I'm still thinking about those issues – and suspect I'll continue to do so until the work's complete, I've post-mortemed myself almost to death over it*, and am on to the next big project†.

I'm also thinking about my methodology of designing the reader journey in a step-by-step linear way.
Given that I'm trying to make the reading experience feel less linear, is designing it in a linear manner, well, wrong? One aspect that concerns me is that it increases the likelihood that I'll end up losing something – a key aspect of William Hayley's story – en route. Or that I'll miss a vital left turn, one that would allow me and the reader to explore aspects of the tale that my let's-start-at-the-very-beginning approach obscures.

This is where I'm at:

Will need to do page set up: A3 if I want to travel any further…

As for content - I'm halfway through editing/writing the personalised letters Hayley will send to his readers, and am thinking through what and how he'll "ask" readers to provide the information required for Form 2: as I write I'm wondering if this might also be a good point to ask readers for their email address…

My suspicion is that, for now, step-by-step is the most practical way to proceed. It won't be for much longer, though. My guess is that when I'm one or two stages further into the journey, and these stages have been successfully (I hope) implanted into the next iteration of the HayleyWorld app, I'll need to sit down with several huge pieces of paper and map out the various relationships between the people, places, themes and chronologies of the story, and interrelate these to each other. Which will probably make my head pop open like this, but will also make me feel considerably less smiley…

And which is where everything I'm learning by blogging my way through Semantic Web for the Working Ontologist is likely to come in handy…

Very interested to hear others' thoughts on reader journey design…

* Yeah, I know. Am invoking the paradox defence.
† on the backburner are

an anarchist musical
a radio drama
a crowd-sourced online project exploring the limits of kindness.