| RichK.net | Web Search |
|
RDFEtc /
DBpediaQueriesOn this page... (hide) Notes on specific DBpedia queries and related observations. §Objectivedbpedia adds a new dimension to the data in Wikipedia, by querying it as a database. Infoboxes hold structured data that become one component of dbpedia. A challenge is that the data in Wikipedia infoboxes is not always consistent, and thus queries may not extract all of the applicable data. My objective here is to investigate why and if some of it can be fixed in an automated fashion. §SummaryFrom the below, here are some general results:
§GeneralOne way to organize searching for synonyms is to start with the list of Wikipedia infobox templates (for example, here's a list of the Geography infobox templates. DBpedia does not linkify internal wikipedia references (text enclosed in A simple property finder lists the distinct properties of all subjects (or resources) within a given category. This should simplify finding aliases and semantic overlaps.
SELECT distinct ?prop WHERE {
?subj rdf:type <CATEGORY_URI> .
?subj ?prop [].
FILTER regex(?prop, ".*dbpedia2*")
}
Here's an example query for South American Countries. §BuildingsThe first set of observations here is for buildings like the Empire State Building Here are the properties of all objects that are yago:skyscrapers. It would be nice to relate properties that tend (for some definition of "tend") to have the same datatype. The dbpedia2:location of all yago:skyscrapers shows a wide variety of values. This suggests it's ambiguous as far as dbpedia is concerned (but not as Wikipedia is concerned). This would be difficult for dbpedia to parse reliably. § Date RangesBuildings and Periods This query correctly dereferences single date links to dbpedia, but many of the dates are ranges; in this case, just the markup is shown (see General comment above). (here's Buildings and Periods just for yago:skyscrapers. § Number of floors vs. storiesBuilding height by floors floors, floor_count, and height_stories are used. Most use the correct integer annotation, but others add annotations to the base value, like "40 and 2 basements" for the http://dbpedia.org/page/Meritus_Mandarin_Singapore, and thus are not scanned correctly. In several places, extra information is glommed into the infobox field like the above example. If we make an assertion that we expect an integer (based on "most" of the other results), parsing the first integer out of the string fixes many of the problems, although we still need to capture the annotation. In some results for this query, like "70 (North Tower); 63 (South Tower)s" for http://dbpedia.org/page/The_Sail_%40_Marina_Bay, parsing still kind of works. § UnitsThere are a variety of mechanisms used to signify Building height by distance measure: dbpedia2:height, dbpedia2:height_feet, and dbpedia2:height_meters. Here is the Building height units query that shows the combinations. The most appropriate approach seems to specify units in the object (e.g., "259"^^dbpedia:units/Meter). A suggestion for consolidating different units is to look for a suffix to a property that is a length (or other measurement units): _feet, _meters, etc. This rule assumes that these suffixes occur for no other reason. For area, the following parameters are at least identified: dbpedia2:area, dbpedia2:area_ft, dbpedia2:area_land, dbpedia2:area_magnitude, and dbpedia2:area_total. There is also dbpedia2:area_land and dbpedia2:area_water (see the following query). §LocationsEven within DBpedia, there are multiple ways buildings (and related entities) are associated with places. A location such as Manhattan shows building associations as either "city" or "location" properties. Also, the "headquarters" property for an organization (e.g., United Nations) should be related to a building. §People§Birth and DeathLocations such as Manhattan or better yet, the entire U.S. shows a variety of ways to refer to birth and death locations for people: Birth: Origin, birth_place, birthplace, placebirth, place_birth, Birthplace, place_of_birth, PLACE_OF_BIRTH, placeofbirth This list of the properties of novelists shows multiple ways to refer to first name (First, Given, Given1(/Given2), foaf:givenname), last name (Last, Surname, Surname1(/Surname2), foaf:surname); full name (Name, NAME, foaf:name). Ideas to fix this in the general case:
§Sports CarsProperties used based on yago class sports_car and query on mid-engined vehicles. Here is all yago:cars and their properties. The body styles of cars (the first 200 at least) are complex infoboxes. How should Dbpedia interpret this? §LocomotivesHere is a list of subjects that have a Template:infobox_locomotive, that use the locomotive infobox template. The resulting properties of this collection of locomotives shows few aliased/synonymous properties ("released" and "release_date" is the only one I could find). Alternatively, the list of yago:locomotives and the resulting properties show a wider variety of subjects and attributes. §Aircraft§Boeing 747Observations in comparing dbpedia test, wikipedia infobox
|