Categorical Datum
A categorical value is a selection from a flat list or hierarchic tree structure containing a finite number of pre-determined choices ranging from organisms in a taxonomy, diseases, countries of the world or levels of an experimental variable. Here we provide for choices whose values are either ontology terms, or a controlled set of literal xsd:string strings.
Note that synonyms which are normally described using hasSynonym
or hasExactSynonym
etc. annotations can be applied in software to enhance categorical selection options. For example, synonyms such as “sienna, sepia, umber, terra cotta” can be mapped to the ontology-driven categorical brown
.
It is a more complicated matter to attempt equivalencies of categorical values between different ontologies.
Categorical ontology choice
The aim here is to provide a way to point to an ontology class or instance identifiers within existing ontologies as selections for a categorical variable.
The has quality
relation can capture this directly by pointing straight to the phenotypic quality, for example male
is a subclass of phenotypic sex
, and one can express that an instance of type Homo sapiens
(representing John) has quality
another instance of type male
.
One can detail which assay was used to make this assessment:
A categorical value specification can point to the possible choices (which will vary depending on experimental protocol).
This provides a way of further constraining any kind of class-level ‘has quality’ constraint established at a more abstract level.
The complete contextual view:
Missing values
For an instance value being described by a categorical value specification
(CVS) class, if that value matches the CVS class’s expressed specifies value of
target, then no choice has been made, and no information is carried.
Categorical value specification choices as instances
In a different approach, an OBI example using categorical value specification focuses on describing a tumor grading standard histologic grade according to AJCC 7th edition
. Here the value specification class has individuals which are each interpreted as grades, and which could potentially be augmented with data properties that detail their assessment differentiae. This approach is suited to cases where selections are not already established (and would not be in the future) as ontology classes situated within their own hierarchic context.
Complications - punning
A left/right/ambidextrous handedness example shows some complications one can run into visa vis classes and instances / individuals.
Class: 'handedness value specification'
subClassOf 'categorical value specification'
subClassOf 'specifies value of' only handedness
Following this pattern, an instance of handedness value specification
can have a specifies value of
axiom pointing to a handedness
class instance. This involves some extra setup because each handedness
instance selection can’t be referenced directly as a class - it needs to be “punned”. In other words an individual needs to be created to mirror each categorical choice, so for example classes for left handedness, right handedness, ambidextrous handedness all need mirrored individuals - and in this case these are not native to the PATO ontology that the classes originate from.
Punning is accomplished manually in Protege by copying an existing class URI into the “Create a new Named individual” form, with the “new entity options …” set to expect a user supplied name. This preserves the same identifier for both class and individual. As well, Protege, when opening a file and encountering an object property with an instance reference at one end and a class reference at the other, will automatically create an instance for the class, and give it the same ontology URI identifier. This eliminates reasoning errors that would otherwise arise, but also means you may end up with namedIndividual instances you didn’t manually create. A TRUE CHARACTERIZATION ???
The target could be expressed simply as “has specified value
only xsd:anyURI”, thus allowing values like xsd:anyURI right-handedness but this then requires some validation mechanism external to an OWL reasoner for limiting categorical values.
Categorical string choice
If a string must conform to a smaller set of choices, and nothing more needs to be axiomatized about each choice, then this can be accomplished with a value specification that is both string and categorical. The value specification has a ‘has specified value’ component which uses a regular expression to enumerate the permitted strings. Note that in this approach one cannot easily provide other information (label, description) about choice in a user interface.
For example, an “E-coli K antigen value specification” can be represented as:
Class: 'E-coli K antigen value specification'
subClassOf 'categorical value specification'
subClassOf 'specifies value of' only 'K antigen'
subClassOf 'has specified value' only xsd:string[pattern "K(1|2a|2ac|3|4|5|6|7|8|9|10|11|12|13|14|15|16|18a|18ab|19|20|22|23|24|26|27|28|29|30|31|34|37|39|40|41|42|43|44|45|46|47|49|50|51|52|53|54|56|96|55|74|82|84|85ab|85ac|87|92|93|95|97|98|100|101|102|103|X104|X105|X106)"]]
This allows a reasoner to raise the unsatisfiable alarm when an instance of E-coli K antigen value specification
has specified value
‘K17a’.
One can potentially leave the classes has specified value
axiom out, in which case validation enforcement would need to occur outside the OWL reasoning context. (This is especially true if the computation load of validation by reasoner is too high.)
Note that in the past OBI used/tried categorical measurement datum
for enumerating categorical choices, with a has category label
object property (see OBI’s existing handedness value specification
example), but this class and relation is being discouraged.
Ordinal Variables
OBI does not currently have a recommendation about how to define an ordered categorical variable. A ranking data property for each choice could be used; or potentially previous/next relations could be established between choices.
“Other” values
A qualitative survey question may canvass users for an open-ended response if the given selections are inadequate or should be elaborated on. OBI would require a separate has string representation
to capture this input, as it would need to be evaluated for its categorical or numeric potential.
Other approaches
Other ontologies might use their own object properties, which OBI avoids (see reasons).
Here has phenotypic sex
would be an object property - a subclass of has quality
- existing between a BFO independent continuent entity (the bearer) and a specifically dependent continuent that is about an organism’s sexuality. The quality is represented as a categorical value. The range of has phenotypic sex
can be constrained to PATO phenotypic sex
.
Other ontologies may allow a string value (or number code) via a data property, as shown below. One could add a regular expression to validate a string to match possible values of a categorical variable as in above E-coli K antigen example.
Here has phenotypic sex
is a data property existing between a BFO independent continuent entity (a physical organism) and a string literal code representing its sexuality. For axioms to work reliably with these values, the literals must be normalized to categorical values of sexuality.