Following blog posts from Duncan and Allyson, and in preparation for the 2nd OBO Foundry workshop, here is my take on the OBO Foundry principles. The list provided by the OBO Foundry is formatted similarly to the one on their website, bold blue with emphasis on some words in green, and my comments follow after each point.
Disclaimer: I am probably biaised in favor of the Foundry principles, having been working on the Ontology for Biomedical Investigations (OBI) and the Information Artifact Ontology (IAO) quite a bit :)
1. The ontology must be open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers. The OBO ontologies are for sharing and are resources for the entire community. For this reason, they must be available to all without any constraint or license on their use or redistribution. However, it is proper that their original source is always credited and that after any external alterations, they must never be redistributed under the same name or with the same identifiers.
I would like to see a clear licensing policy from each of the ontologies part of the Foundry. If I want to re use part of any of them, having a straightforward way to know what is allowed or not would be great. If there is a wish from the Foundry to constrain to specific degrees of licensing (e.g. keep restrictions to a minimum) a list of acceptable licenses could be proposed.
I understand that principle as Foundries ontologies should be freely available for anybody to use, as long as proper credit is given and information is not distorted. I would think a CC-by or CC-by-sa would fit the requirements.
See also post from Allyson on Choosing a license for your ontology
2. The ontology is in, or can be expressed in, a common shared syntax. This may be either the OBO syntax, extensions of this syntax, or OWL. The reason for this is that the same tools can then be usefully applied. This facilitates shared software implementations. This criterion is not met in all of the ontologies currently listed, but we are working with the ontology developers to have them available in a common OBO syntax.
I like the goal of that one, and I think it is fundamental to have a common language that can be used by all. If we intend to build a community we need to be able to exchange between members and use common tools.
However in practice this proves difficult:
1/ there are several versions of the tools currently available, and they won't all behave in the same way
2/ evolutions of languages implies updating the converters
If I try to open the automatically generated obi.obo file it won't be possible using OBO-Edit 1.101, or even 2.0 beta 50. This file requires OBO-Edit 2.0 beta 54 to be opened. This is just an example, but similar issues exist using Protege 3 or 4 and different versions of OWL. This is no critic of the languages or the tools, on the contrary, it is a good thing that they keep evolving, but maintaining several syntaxes has a non-negligible cost.
3. The ontologies possesses a unique identifier space within the OBO Foundry. The source of a term (i.e. class) from any ontology can be immediately identified by the prefix of the identifier of each term. It is, therefore, important that this prefix be unique.
I think this principle is pretty explicit, and well followed. It could probably be extended to have a common format for URIs in general: we have been working in OBI to establish a clear ID policy and which format to adopt. In short, we chose to rely on our own domain name and use slash as separators, effectively creating URIs like http://purl.obofoundry.org/obo/OBI_0000225. (Dereferenceable URIs by Alan Ruttenberg)
4. The ontology provider has procedures for identifying distinct successive versions.
This is actually my personal favorite one :)
I think it is very important when accessing a resource to be able to evaluate its provenance (cf principle above) and quality - I want to know where it comes from (and choose to trust or not the source) and what is the current version of the resource.
We chose to mint URIs for each released version of the OBI ontology. For example, http://purl.obofoundry.org/obo/2008-03-05/obi.owl: users can choose to import specific versions of the ontology to preserve stability if they need to, or they can instead to always use the latest released version of OBI, always available at http://purl.obofoundry.org/obo/obi.owl.
OWL2 version URIs mechanism seems like a good way of achieving this.
While this is a first step in the right direction, I would in fact go even further. It is currently very hard to get an idea of the status of a resource: even using a different URI for each release, nothing tells me if the ontology is to be considered for production or is still too unstable. What would be helpful for me would be to have an idea of the state of development of the resource - similarly to what is done with software development, where released are labeled alpha, beta etc.
Initially we could rely on self-assessment from the ontology developers, and at a later stage maybe consider some milestones, like deprecation policy implemented, regular releases, etc.
5. The ontology has a clearly specified and clearly delineated content.
The ontology must be orthogonal to other ontologies already lodged
within OBO. The major reason for this principle is to allow two
different ontologies, for example anatomy and process, to be combined
through additional relationships. These relationships could then be
used to constrain when terms could be jointly applied to describe
complementary (but distinguishable) perspectives on the same biological
or medical entity. As a corollary to this, we would strive for community acceptance of a single ontology for one domain, rather than encouraging rivalry between ontologies.
I think this is in principle an important guideline. In order to improve lightweight import of ontological resources we even developed MIREOT, the Minimum Information to Reference an External Ontology Term, with the aim of avoiding duplication of terms within the Foundry. (<commercialbreak> come and see us at the International Conference on Biomedical Ontology! </commercialbreak>) In short, instead of defining for examplea cell in OBI, we instead re-use the one from the Cell Type Ontology.
I can see two issues with that idea however:
1. like mentioned by Ally, how to check on unicity at the level of the terms, and who is in charge of this?
2. this may cause issues when an ontolgoy has been developed by one group, and an other group would like to improve it, or maybe even has already developed a parallel version. Ideally, the two groups would collaborate and produce a merged version of the ontology, but in reality this is a difficult process.
6. The ontology includes definitions for all terms. Many biological and medical terms may be ambiguous, so
terms should be defined so that their precise
meaning within the context of a particular ontology is
clear to a human reader.
Necessary, and too often missing. Most of the people will assume that a label will be enough, and/or that they will have time to update the definition later. In my own experience, definitions are most often needed than not, and it is best to do as much as possible w.r.t. the curation status of a term when adding the term than trusting oneself to go back to it later.
7. The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
Hm - I work with Foundry
ontologies, and I am confused by that one :) I *guess* the idea is that
by using definitions similar to the ones in the Relation Ontology it
will make it easier if required to add said relations into RO?
I quite like having one central Relation Ontology, it makes things much easier when trying to share between resources.
8. The ontology is well documented.
Well, that is indeed very important. It is essential to do it while developing the ontology, in order to 1/ keep track of exchanges and not waste time just trying to remember what the issue was 2/ give a chance to potential new contributors to understand what has been done and what the current status is. Unfortunately, based on my experience, documentation of ontologies is more often than not quite sparse.
I also find it difficult to know how to structure such a documentation. We do try and add information in the editor notes annotation property, but they are often not dated, not updated when new development, and it is hard to keep track of global ontology evolution by adding notes on a per term basis. Websites and wikis are great, but it is sometimes cumbersome to find your way in their maze.
9. The ontology has a plurality of independent users.
Hm - are those real users or are those "declared future users"? Are
developers of the ontology (who in general are interested in developing
it because they indeed plan to use it) considered users?
To be honest I am not sure about that one. What if I build a great
ontology, on a very specific theme? Somebody else may use it if they
wish to do so, but most probably very few people would actually have a
use for it. Does that preclude my ontology to being a Foundry one?
10. The ontology will be developed collaboratively with other OBO Foundry members.
I think this should be extended to non-OBO Foundry members. I actually thought that there was a principle stating that anyone wishing to contribute to a project should be allowed to. Also I assume that in the case where a call for participation has been made and nobody volunteered this shouldn't be precluding the effort to join the foundry.
Now, I know that the above may sound negative - actually it is far from that (see disclaimer above). It is easier to find room for improvement in existing structure tahn actually starting from scracth. I think the Foundry participants really did a great job of putting together the fundations. I am really convinced of the interest of the OBO Foundry, and I would love to see it grow. I honestly believe that having a set of guidelines to help develop and insure some degree of quality is very important. Unifying principles considering the current disparity of the resources is also an ambitious endeavor, and I understand that the principles have deliberately been written as not being too constraining.
I however thing that we are reaching a critical mass in the number of Foundry participants, and an interesting level of technical development with the formalization of policies such as the naming conventions, and the proposals of others, such as the set of annotation properties, MIREOT or the ID policy (and I know that others are being worked upon as well). It would make sense to start tightening things up, and formulate maybe more precisely the recommendations and guidelines to become an OBO Foundry member.
Here are some of my suggestions:
1. Ideally I would like the OBO Foundry use a set of common policies, for example the ID policy or the deprecation policy we developed within the context of OBI (and I am sure other efforts also have internal guidelines that would be beneficial for the community at large if shared).
2. I would also propose to use identical annotation properties - for example the OBI/IAO ones, that we are on purpose shipping in a separate OWL file.
3. I didn't find any mention of using an upper-ontology such as the Basic Formal Ontology (BFO), or how the review itself is conducted. Is there a "public" audit of the resource, or on the contrary is the decision process confidential between OBO coordinators?
4. Finally, I think that expanding the current http://www.obofoundry.org website to actually host the documentation of Foundry ontologies would make sense. Why not add a wiki that ontologies developers could edit directly, but which would have a pre-defined core organization? They would be free to add extra pages, but some commonalities could be pre-provided, like, "developers'", "meetings", "policies", "documentation"... It would make it very easy for users to find their way in such a wiki, and it would also provide some guidance for developers and avoid uncontrolled wiki explosion.
Hm - it appears I have been more prolific than expected :) Well, if you made it until here without being bored to death, I would be happy to hear your thoughts and comments on the matter. If you didn't.. well no chance you are reading this now, is there....? ;)
Michel Dumontier, assistant Professor of Bioinformatics at Carleton University, was visiting Vancouver just before the Canadian Semantic Web Working Symposium and kindly offered to present his work during our third Vancouver Semantic Web Meetup.
During his talk,
he used several examples illustrating some of the issues he currently
faces when trying to unambiguously reference a specific chemical
molecule, whether generic or modified, and how we are currently lacking
efficient way to represent one molecule in different conformation or
different states (i.e. phosphorylated for example).
He then showed how OWL DL can be used to describe funstional groups, and how those group can be modeled using a Chemical Ontology, thus allowing reasoning and classification.
His group worked on integrating PubChem, DrugBank and DBPedia, and exploit this to answer some queries.
For example, the DLQuery: isQualityOf some (Molecule and pubchemcompoundid value 3911) will return the set of descriptors for leuprolide.
The query DLQuery: Alcohol and BiotechDrug and eliminationHalfLife value "Hour" will leverage the 3 resources and fetch chemical that are biotech drugs (DrugBank), have and alcohol moiety (PubChem) and are eliminated within an hour (DBpedia).
Finally he described what he himself calls a "crazy idea": using OWL to describe the molecule, and then using that to generate its identifier. His group started implementing that idea and developed the Biological Identifier Service, which given information like sequence, position of the modifications and species will generate a unique ID.
It was great seeing Michel, and as usual he delivered a great talk. Only drawback? Realizing that my biochemistry knowledge fades with years... :)
Resources:
- Chemical Knowledge for the Semantic Web, . Data Integration in the Life Sciences (DILS2008). Evry, France. Lecture Notes in Computer Science. 2008. Springer Berlin / Heidelberg. ISBN:978-3-540-69827-2. [PDF]
- Increasingly Accurate Representation of Biochemistry (v2) Slideshare of the talk (above pictures where created by Michel as part of his talk)
- SemanticScience several resources developed by Michel and his collaborators
- DumontierLab Michel's lab webpage
- Michel in action - picture taken by Jim Pick during the meetup.
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot - just let me know!
Deep Dhillon, CTO of Evri, was in Vancouver yesterday to talk at the Vancouver Semantic Web group second meetup.
Evri aims at exposing entities on the web, entities being anything that can be expressed using natural language.
They create their knowledge based based on various resources (e.g. Freebase and Wikipedia) and some manual curation, and then use this to parse, index and analyze web documents.Deep spent some time showing us various demos and queries that can be performed via the Evri API.
You can use the Evri Search Query System with for example a query like company > acquire > company (figure on the right).
At the bottom of the page you will find the Evri generated extra information. Above left shows that the article is associated with the entities Barack Obama, Supreme Court, Congress etc. You can then browse further any of those by simply clicking on it, in the above right figure I chose to follow information related to Supreme Court, which in turns pull out links towards other entities, like Senate or Judiciary Committee.
More widgets can be found in the Widget Gallery. An Evri toolbar is also available, and the Evri blog is a good source of information. Finally, being able to play with EvriVerse may be my next official excuse to get an iPhone :)
Update [May 18th 2009]: The slides used during the presentation are posted at http://is.gd/B2FH
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot - just let me know!
It is a joint adventure with Mark, and we hope you will join us (living close to Vancouver may help ;) )
Update [April 27th 2009]: Jim Pick joined us on the organizers team.
[CC picture by tany_kaly]
The Minimum Information to Reference an External Ontology Term
"While the Web Ontology Language (OWL) provides a mechanism to import ontologies, this mechanism is not always suitable. [...] In this paper we propose a set of guidelines for importing required terms from an external resource into a target ontology. We describe the guidelines, their implementation, present some examples of application, and outline future work and extensions."
I am very happy that the MIREOT paper (written with the usual suspects: Ally, Frank, James *, Daniel, Ryan and Alan) has been accepted for presentation at the International Conference on Biomedical Ontology (ICBO) conference later on this year.
See you all there :)
Update [August 12th 2009]: the paper and slides presented during the ICBO conference are available on the Nature Precedings ICBO collection. Special thanks to Alan for his help with the slides and to James who is to blame for the "OBI scared" slide :)
"The Ontology for Biomedical Investigations (OBI), written in OWL DL, brings together a large consortium seeking to provide a cross-domain, shared framework for representing investigations in the biological and biomedical sciences. In this paper we report our experiences and describe our development process as it pertains to the implementation in OWL."
We finally submitted a paper summarizing some of our development techniques, our release process, and some of the issues we faced in the OBI project. Of course, none of this would have happened without my great co-authors team: Allyson, Frank, James *, Daniel, Ryan, Bill and Alan. I won't be describing the content of the paper here: you'll have to
wait for potential future publication - I know, the suspense is
unbearable...or is that relief? I just had a bit of fun using Wordle (thanks Ally for the pointer to the cloud of BioSysBio 09 Tweets. Other visualization tools at http://many-eyes.com/).
I really liked the result and thought you may enjoy it too.
I had a few comments about the pictures I put on that blog until now (well...several actually), mostly about the fact that they may not look beautiful to anybody else but me. Eh well. At least, now, I am certain everybody will have to agree, the OWL of Biomedical Investigations *is* beautiful (or just don't tell me, ok? ;) )
Last week-end, I had the chance to attend a FreeBase meeting here in Vancouver. The event took place at the Irish Heather Gastropub, which was quiet on a Sunday noon. Nice food, and great people :)
To be honest I didn't know much about FreeBase. I knew that it is somehow similar to DBPedia, and thought I would take this chance to meet enthusiast FreeBasers. What could be a best way to learn?
Kirrily was there, and explained briefly the main differences:
- Wikipedia contains information, but only about "important" things. For example, there is no "Melanie Courtot" page (snif :( )
- DBPedia extract this information and store it in a structured way (for example by extracting data from the wikipedia infoboxes)
- FreeBase does the same, but also allows contributors to create ANY kind of information (quick, my page!)
FreeBase use types to structure the information, and they have their own Metaweb Query Language (MQL) to access the information programmatically. Though I didn't try it myself, feedback from other attendees who used the system was "fast and performant".
The part I was interested in, is that they are apparently willing to take on some of the biological resources. During the discussion, we used as an example the NCBI taxonomy, though I also did find a thread regarding ChemSpider afterwards.
The two main drawbacks for me in that case where:
1. anybody can edit FreeBase - how can I use it reliably? If somebody updates the NCBI taxonomy and adds that dinosaurs are mammals, will that lead to problems for me?
FreeBase relies on the community to keep information up-to-date and accurate. There are chances that somebody would flag erroneous information and remove it. More interestingly, you can actually query FreeBase for a specifically dated version, by using as_of_time when executing your MQL query. This is something we also implemented for the OBI project, i.e. we are releasing date tagged versions of the ontology: you can either get a stable version or get the newest one. As an end-user I really like the flexibility this provides. http://purl.obofoundry.org/obo/obi.owl always shows you the latest revision, but you can to link to a specific previous revision too.
2. I am not really looking forward to using yet an other query language, or an other API - I would rather use a normal SPARQL endpoint and query there.
The news that triggered me writing this post is the announcement yesterday by Kingsley that they added FreeBase to the LOD server, and it should therefore be available via their endpoint.
In summary, I had a good time during this meeting, and I learned a bit more about something new.
Last thing to do, update OBI's information :)
Please note that this post is merely my notes on the presentation.
They are not guaranteed to be correct, and unless explicitly stated are
not my opinions. They do not reflect the opinions of my employers. Any
errors you can happily assume to be mine and no-one else's. I'm happy
to correct any errors you may spot - just let me know!
I'm currently trying to find out more about RDF triplestores, querying, browsing...
My experience in this area is pretty basic, few SPARQL queries againt the Neurocommons Virtuoso endpoint, and some poking around to find out more about other systems, like Owlgres or Jena (SDB, TDB, or JenaMulgara), that I haven't tried out yet.
My early (sigh) Sunday morning browsing took me to a blog post reading "Getting started with Sesame - surprinsigly easy".
I'm usually cautious when I see this kind of article: surprisingly easy for an expert may not be actually easy for others. I sometimes start following a tutorial to realize I am missing thousands of dependencies or be stopped by a cryptic step. A quick read-through this article (until the end, just to make sure ;) ), looks not too bad: download Sesame, install Tomcat, test.
Instructions are clear, and I loved the "working out of the box" promise. No configuration issues (except setting the Tomcat manager password - but Tomcat even has a nice error message indicating precisely which file to fetch and how to modify - Marco, this one is for you, I used our good old Tomcat password ;) )
And here it is, a simple SELECT DISTINCT ?p WHERE {?s ?p ?o}, followed by a click on the IAO_0000111 hyperlink.
Beautiful, isn't it?
I am definitively impressed by the easiness of doing it all. I did take easy options on the way, like in-memory storage, but Sesame proposes other options in the list of choices upon creation of the repository, like MySQL RDF Store or PostGreSQL RDF Store. I also liked the separation between the openrdf-sesame server to send RESTful queries and the workbench which provides an easy browsing interface (see the menu on the left of the picture, create repositories, explore... pretty straightforward)
And yes, that was indeed surprisingly easy, even for me :)
Amazingly, I just received an email from my super-hyper-wonderful-extraordinary (flattery never hurts ;) ) friend Catherine in Cambridge, UK. And guess what? She did create her own blog today. I'm looking forward to following her adventures in the remote country of the Yorkshire pudding (some things I will never understand...)
Catherine, I love you. Even though you didn't put the great pic of me diving in Alonissos.
on OBO Foundry principles