Wednesday, March 16, 2016

Playing with dbpedia (Sematic Web wikipedia)

Web 3.0, Semantic web, Dbpedia and IOT are all buzz words that are dealing with the computer's ability to understand the data it has. "Understanding" is the key meaning of the semantic web - a concept and a set of technologies and standards that are "almost there" for more than 20 years, but we are not there yet.

You can find additional info about semantic web very easily.

I know that lots of efforts are invested in it, and google are quite semantic. The schema.org project is an effort to create one uniform ontology for the web. But, still, we are not there yet.

What I do like to do is playing with the semantic representation of wikipedia - dbpedia. You can query dbpedia, using Sparql (the semantic web sql).

That is the link for the sparql endpoint - http://dbpedia.org/sparql

And an example of a cool query:

 SELECT distinct ?p ?ed ?u ?t {  
   ?u a dbo:University.  
   ?p ?ed ?u.  
   ?p dbo:knownFor ?t.  
   ?t dbo:programmingLanguage ?j  
 } limit 5000  
We are looking  for people that have any connection to things that are universities (?p ?ed ?u), and that they are "known for" something that has any programmingw language. The results including universities and people that have some kind of contribution to technologies that we are using.

See how easy it is to generalize, or specify the relations that we are looking for.
Of course, that tuining and understanding the ontology and also sparql, might bring much better results.

Run by yourself to see the results.

The results in a graph (I have used ploty for that)
If a university had 2 different "knownAs" it will count for 2.
And The winner is Massachusetts_Institute_of_Technology! 
(Berkeley is #2)

No comments:

Post a Comment