1 00:00:06,081 --> 00:00:09,419 (woman) Hello everyone. Thank you for being here this afternoon. 2 00:00:09,419 --> 00:00:10,538 We are first going to hear-- 3 00:00:10,538 --> 00:00:13,018 I'm just going to jump straight in to give him plenty of time-- 4 00:00:13,018 --> 00:00:15,864 so we're first going to hear from Peter Patel-Schneider 5 00:00:15,864 --> 00:00:19,925 about barriers to using Wikidata as a knowledge base. 6 00:00:19,925 --> 00:00:21,237 (Peter) Thank you. 7 00:00:22,937 --> 00:00:26,281 I'll skip over the abstract because you've already seen it all. 8 00:00:26,281 --> 00:00:29,205 And I should say a little bit about myself. 9 00:00:31,705 --> 00:00:37,050 I'm much more of a user of Wikidata than an actual editor of Wikidata, 10 00:00:37,050 --> 00:00:41,526 and much more of a user of Wikidata than somebody who contributes to Wikidata, 11 00:00:41,526 --> 00:00:44,891 but I very much believe in the aims of Wikidata. 12 00:00:44,891 --> 00:00:47,473 In particular, it aligns with my research areas 13 00:00:47,473 --> 00:00:50,683 which is knowledge representation, at least in a certain sense. 14 00:00:50,683 --> 00:00:54,990 I worked in description logics for a long time, worked with W3C. 15 00:00:54,990 --> 00:00:58,591 I've worked in Silicon Valley for a while, 16 00:00:58,591 --> 00:01:02,111 largely building what might be called knowledge graphs, 17 00:01:02,111 --> 00:01:03,849 but I don't like the term knowledge graphs-- 18 00:01:03,849 --> 00:01:04,968 I don't like what they mean, 19 00:01:04,968 --> 00:01:07,831 I want to do something better than knowledge graphs. 20 00:01:07,831 --> 00:01:10,426 And I want to put this together from various sources. 21 00:01:10,426 --> 00:01:13,268 So Wikidata is a very, very good one, 22 00:01:13,268 --> 00:01:15,837 but DBpedia is not so good. 23 00:01:15,837 --> 00:01:18,610 Freebase is dead. 24 00:01:18,610 --> 00:01:21,548 Open Street Map, Open Movie Database, things like that. 25 00:01:21,548 --> 00:01:24,532 And then I want to use this store of knowledge 26 00:01:24,532 --> 00:01:26,499 to do something. 27 00:01:26,499 --> 00:01:31,889 And I want to use it as the source of knowledge to do something, 28 00:01:31,889 --> 00:01:36,108 and not only just facts but also organizing my knowledge. 29 00:01:36,108 --> 00:01:39,137 And currently, working where I am, 30 00:01:39,137 --> 00:01:43,245 we're interested in supporting conversational agents. 31 00:01:43,245 --> 00:01:48,609 Not just things that let you play *Avatar,* 32 00:01:48,609 --> 00:01:52,850 but lets you play the movie that's directed by the wife 33 00:01:52,850 --> 00:01:55,571 of the director of *Avatar.* 34 00:01:55,571 --> 00:02:00,454 So how can we build a conversational agent that will do something like that? 35 00:02:00,454 --> 00:02:04,342 Well, you need to know all the facts that go behind it, 36 00:02:04,342 --> 00:02:07,440 but you also need to know that the fact that there are movies-- 37 00:02:07,440 --> 00:02:10,407 not just, we have *Avatar*, but that we have movies-- 38 00:02:10,407 --> 00:02:12,158 we need to know things about movies, 39 00:02:12,158 --> 00:02:15,040 we need to know things about directorships. 40 00:02:15,040 --> 00:02:18,476 We need to know things about humans-- that they're married to each other. 41 00:02:18,476 --> 00:02:21,233 We need to know that there are men and women in the world, 42 00:02:21,233 --> 00:02:25,450 and somehow be able to use this knowledge of what we're saying 43 00:02:25,450 --> 00:02:28,357 to come up with the actual reference to these things, 44 00:02:28,357 --> 00:02:31,609 and then actually do what we were asked to do. 45 00:02:31,609 --> 00:02:34,423 So, though it's one end, 46 00:02:34,423 --> 00:02:36,634 the other thing that we want to be able to do 47 00:02:36,634 --> 00:02:41,114 is if you think of systems like Siri, there are hundreds or thousands-- 48 00:02:41,114 --> 00:02:43,474 actually, maybe Siri's not the best example. 49 00:02:43,474 --> 00:02:49,790 The Amazon system has hundreds or thousands of little programs 50 00:02:49,790 --> 00:02:51,595 that will do something for you. 51 00:02:51,595 --> 00:02:53,495 And the problem that we're interested in 52 00:02:53,495 --> 00:02:56,070 is how do you pick which one can do something. 53 00:02:56,070 --> 00:03:00,997 So for example, which back-end can find me train trips 54 00:03:00,997 --> 00:03:04,754 between San Francisco and Palo Alto. 55 00:03:04,754 --> 00:03:09,227 There may be many systems that will try and sell me train tickets, 56 00:03:09,227 --> 00:03:13,800 but only one or perhaps two of them will sell me that particular train ticket. 57 00:03:13,800 --> 00:03:18,697 And how do I get the system to do that without having to be able to tell it 58 00:03:18,697 --> 00:03:21,428 that I want a Caltrain ticket. 59 00:03:23,128 --> 00:03:29,121 So, what happens is I want to use Wikidata as the source of a lot of this stuff, 60 00:03:29,121 --> 00:03:31,569 and I regularly run into problems. 61 00:03:31,569 --> 00:03:35,861 And from those problems, I have a bunch of suggestions. 62 00:03:37,061 --> 00:03:40,956 You may agree with my suggestions or disagree with them. 63 00:03:40,956 --> 00:03:44,539 Some of them are kind of on their way to being implemented in Wikidata, 64 00:03:44,539 --> 00:03:46,598 some of them aren't. 65 00:03:46,598 --> 00:03:50,220 So, I'm going to do this talk from the back forward. 66 00:03:50,220 --> 00:03:53,949 I'm going to give you the summary, and then an expansion of the summary, 67 00:03:53,949 --> 00:03:58,022 and then some rationale for my suggestions. 68 00:03:58,022 --> 00:04:01,099 And the reason I'm going to do that is if I started with all of the rationale, 69 00:04:01,099 --> 00:04:03,892 I might never get to the end, and the end is the important thing, 70 00:04:03,892 --> 00:04:06,176 at least in my viewpoint. 71 00:04:06,176 --> 00:04:12,117 So, my biggest suggestion, I guess, on the community side is, 72 00:04:12,117 --> 00:04:16,564 gee, guys, speak with a single voice. 73 00:04:16,564 --> 00:04:18,864 (chuckles) 74 00:04:20,064 --> 00:04:23,616 And speak with a voice where I can find it. 75 00:04:23,616 --> 00:04:25,675 So, it turns out that one of my suggestions 76 00:04:25,675 --> 00:04:30,570 is actually implemented, but I only found out about it today, 77 00:04:30,570 --> 00:04:34,748 because it's not used very much at all, and it's hard to find it. 78 00:04:34,748 --> 00:04:39,810 So, I really want you guys-- and me too, in some sense-- 79 00:04:39,810 --> 00:04:44,075 to spend some effort at the beginning when you're creating these classes 80 00:04:44,075 --> 00:04:46,252 and other things that are important, 81 00:04:46,252 --> 00:04:48,502 so that a poor user like me, 82 00:04:48,502 --> 00:04:52,746 who can't afford to go through five years of impassioned discussion 83 00:04:52,746 --> 00:04:57,059 to find out what male actually is, 84 00:04:57,059 --> 00:05:00,860 can actually use it in our system-- in my system. 85 00:05:00,860 --> 00:05:03,282 So that's sort of on the community side. 86 00:05:03,282 --> 00:05:04,845 I'm a formalist. 87 00:05:04,845 --> 00:05:09,164 I really want to-- and my programs are dumb. 88 00:05:09,164 --> 00:05:11,756 I don't write smart programs, I write dumb programs. 89 00:05:11,756 --> 00:05:14,816 Now, they tend to be very fancy dumb programs, 90 00:05:14,816 --> 00:05:20,495 but these dumb programs can't really handle all of the shades 91 00:05:20,495 --> 00:05:25,935 of everything that you have with start time, end time, inception. 92 00:05:25,935 --> 00:05:29,241 I want to have some simple formal mechanism 93 00:05:29,241 --> 00:05:33,358 that will tell my program what's true now, 94 00:05:33,358 --> 00:05:36,262 or what's true in 1987, 95 00:05:36,262 --> 00:05:38,559 without having to search through a bunch of things, 96 00:05:38,559 --> 00:05:41,579 and make a bunch of guesses, and use a lot of heuristics, 97 00:05:41,579 --> 00:05:45,330 or have a machine-learning program that's done for this particular task. 98 00:05:45,330 --> 00:05:50,224 I just want you to tell me this stuff somehow, and have a take. 99 00:05:50,224 --> 00:05:54,090 So, I want to be able to look at something which says 100 00:05:54,090 --> 00:05:57,850 what the things I see in Wikidata actually mean. 101 00:05:57,850 --> 00:06:01,170 And I don't find that these days. 102 00:06:01,170 --> 00:06:02,736 And then, of course, once we have that, 103 00:06:02,736 --> 00:06:06,636 I want somebody-- I'm willing to do some of this work-- 104 00:06:06,636 --> 00:06:11,131 build tools that actually use that formal description and say, 105 00:06:11,131 --> 00:06:15,781 tell me, for example, if I'm an instance 106 00:06:15,781 --> 00:06:21,984 of architectural structure, like the Eiffel Tower, 107 00:06:21,984 --> 00:06:24,135 am I a geographic location? 108 00:06:26,435 --> 00:06:27,474 I don't know. 109 00:06:27,474 --> 00:06:31,352 I mean, Wikidata doesn't tell me whether this is true or not. 110 00:06:31,352 --> 00:06:34,461 I can find nowhere in Wikidata that will do that, 111 00:06:34,461 --> 00:06:35,820 because there's no formal thing. 112 00:06:35,820 --> 00:06:38,468 But once you give me a formal thing then I'm going to write a tool, 113 00:06:38,468 --> 00:06:41,376 which essentially gives the implications of what the formal things are. 114 00:06:42,776 --> 00:06:47,569 The fourth suggestion is about bots. 115 00:06:47,569 --> 00:06:49,533 Bots are great. 116 00:06:49,533 --> 00:06:55,641 Bots have ultimate power and as has been said, 117 00:06:55,641 --> 00:06:58,259 with ultimate power, comes ultimate responsibility. 118 00:06:58,259 --> 00:07:02,380 And I don't believe that bots get very much responsibility 119 00:07:02,380 --> 00:07:05,409 for the things that they do, and they need to have. 120 00:07:05,409 --> 00:07:09,121 We need to be able to control the bots and figure out what they've done wrong, 121 00:07:09,121 --> 00:07:11,884 and essentially, once a bot makes a thousand mistakes, 122 00:07:11,884 --> 00:07:13,911 we want to undo that once, 123 00:07:13,911 --> 00:07:17,344 as opposed to undoing that a thousand times. 124 00:07:17,344 --> 00:07:19,188 Of course, as I said, these are my suggestions. 125 00:07:19,188 --> 00:07:20,912 Other people may have different suggestions. 126 00:07:20,912 --> 00:07:23,980 I'm coming at it from a user viewpoint. 127 00:07:23,980 --> 00:07:25,923 I suppose I could say something like, 128 00:07:25,923 --> 00:07:28,450 I'm coming at it from a binary viewpoint. 129 00:07:28,450 --> 00:07:32,900 I mean, this is a program that really wants yes or no answers. 130 00:07:32,900 --> 00:07:36,137 It doesn't understand much in shades of gray. 131 00:07:36,137 --> 00:07:41,628 So, I would really like you to tell me what's true and what's not true. 132 00:07:42,428 --> 00:07:49,416 So, that's the end of the talk, right? (laughs) 133 00:07:51,730 --> 00:07:55,394 And I sort of expanded on some things 134 00:07:55,394 --> 00:07:58,194 but let me-- oops, where are we, here, yes. 135 00:07:58,194 --> 00:08:00,746 So, here let me expand upon the things that I said. 136 00:08:00,746 --> 00:08:05,662 So formally, I really want a logic for Wikidata 137 00:08:05,662 --> 00:08:09,895 because that let's me know what Wikidata means to me. 138 00:08:09,895 --> 00:08:11,712 I don't want to have data structure 139 00:08:11,712 --> 00:08:16,364 with some sort of English description somewhere that tells me something. 140 00:08:16,364 --> 00:08:19,146 I want a formal statement of what this is. 141 00:08:19,146 --> 00:08:23,750 And maybe it produces the wrong answers, in which case we fix it, 142 00:08:23,750 --> 00:08:26,039 but at least we know what the answers are supposed to be, 143 00:08:26,039 --> 00:08:31,594 as opposed to having to go through five or ten different pages 144 00:08:31,594 --> 00:08:33,253 of people arguing with each other 145 00:08:33,253 --> 00:08:36,131 what this particular part of Wikidata means. 146 00:08:36,131 --> 00:08:40,908 So, in particular, I want to have things that I think are useful, 147 00:08:40,908 --> 00:08:42,445 like disjointness. 148 00:08:42,445 --> 00:08:48,406 I want Wikidata to say that rocks aren't humans, 149 00:08:48,406 --> 00:08:50,877 to pick an example. 150 00:08:50,877 --> 00:08:54,250 Now, there's lots of that stuff in Wikidata at the moment. 151 00:08:54,250 --> 00:08:57,300 There's lots of this *opposite from* things, 152 00:08:57,300 --> 00:08:59,539 but what does it mean? 153 00:08:59,539 --> 00:09:01,328 Somebody who's an opposite-- 154 00:09:01,328 --> 00:09:06,221 there was something this morning about transgender man 155 00:09:06,221 --> 00:09:09,185 is the opposite of transgender woman. 156 00:09:11,985 --> 00:09:15,888 Yes, in some sense, but in what sense are they opposites? 157 00:09:15,888 --> 00:09:19,277 It's not a logical sense, it's something else. 158 00:09:19,277 --> 00:09:23,324 I want to give definitions of classes and to give an example, 159 00:09:23,324 --> 00:09:27,248 I would very much like Wikidata to say 160 00:09:27,248 --> 00:09:31,948 that "woman" is adult, female, human, 161 00:09:31,948 --> 00:09:36,564 because if I query Wikidata-- this is going to the end-- 162 00:09:36,564 --> 00:09:38,864 and I ask how many women are in Wikidata, 163 00:09:38,864 --> 00:09:42,277 I get... any guesses? 164 00:09:43,021 --> 00:09:44,538 (woman) Less than men. 165 00:09:44,538 --> 00:09:46,320 Thirty-seven. 166 00:09:46,320 --> 00:09:47,374 Less than men. 167 00:09:47,374 --> 00:09:48,459 Thirty-seven. 168 00:09:48,459 --> 00:09:53,093 Instances of "woman" in Wikidata-- 37. 169 00:09:53,904 --> 00:09:55,331 That's obviously wrong. 170 00:09:55,331 --> 00:09:56,972 Obviously, obviously wrong. 171 00:09:56,972 --> 00:09:59,310 I know it, you know it, 172 00:09:59,310 --> 00:10:01,876 but my program doesn't know it. 173 00:10:01,876 --> 00:10:05,098 My program says 37-- well, it's not zero. 174 00:10:05,098 --> 00:10:07,354 So it might be right. 175 00:10:09,054 --> 00:10:12,738 I would much prefer there to be something on "woman" 176 00:10:12,738 --> 00:10:16,203 that says, "Hey, if you're trying to figure out the women in Wikidata, 177 00:10:16,203 --> 00:10:20,388 don't look at the things that are stated to be instances of 'woman,' 178 00:10:20,388 --> 00:10:24,323 look at things, well, a SPARQL query or something like that, 179 00:10:24,323 --> 00:10:27,312 find all the humans, find the female one, 180 00:10:27,312 --> 00:10:32,584 the ones with sex or gender which is female or female-ish. 181 00:10:32,584 --> 00:10:34,537 That's kind of difficult there, 182 00:10:34,537 --> 00:10:36,823 and then the ones that are adult-- whatever adult means-- 183 00:10:36,823 --> 00:10:37,986 at least that's a definition. 184 00:10:37,986 --> 00:10:40,120 We can argue whether it's the right definition or not. 185 00:10:40,120 --> 00:10:45,381 But we get a number which is not 37, much better than 37. 186 00:10:45,381 --> 00:10:48,113 So, I want this so that we can actually come up with answers 187 00:10:48,113 --> 00:10:50,196 to some of these questions. 188 00:10:50,196 --> 00:10:53,585 So, and again, tools-- I would really like to have tools 189 00:10:53,585 --> 00:10:55,277 that show implications of claims. 190 00:10:55,277 --> 00:10:59,429 So, that shows that the Eiffel Tower is a location. 191 00:10:59,429 --> 00:11:04,137 Whether it is or not in the real world, is somehow kind of irrelevant. 192 00:11:04,137 --> 00:11:10,015 We can argue whether the Eiffel Tower is a location or has a location. 193 00:11:10,015 --> 00:11:12,780 Philosophers probably have argued for decades 194 00:11:12,780 --> 00:11:14,776 over whether this is the case or not. 195 00:11:14,776 --> 00:11:15,914 I don't care. 196 00:11:15,914 --> 00:11:19,829 Just come up with an answer that makes at least a little bit of sense, 197 00:11:19,829 --> 00:11:23,302 and I'll be happy. 198 00:11:23,302 --> 00:11:24,758 So, I want a tool that'll do that. 199 00:11:24,758 --> 00:11:26,646 I want, essentially, a tool that will tell me 200 00:11:26,646 --> 00:11:28,786 what's true at a particular time. 201 00:11:28,786 --> 00:11:32,893 So, how big is the Aral Sea? 202 00:11:34,533 --> 00:11:38,558 It's certainly not 22,000 square miles. 203 00:11:38,558 --> 00:11:41,531 It's much, much smaller than that, 204 00:11:41,531 --> 00:11:46,855 but the claims on the Aral Sea are historical claims. 205 00:11:46,855 --> 00:11:49,238 What's true now? 206 00:11:49,238 --> 00:11:51,607 I think, 3,000 square miles. 207 00:11:51,607 --> 00:11:56,997 Anyway, it's a mere puddle of its former self, you might say. 208 00:11:56,997 --> 00:12:00,215 I would also like tools that help in cleaning the data. 209 00:12:00,215 --> 00:12:02,296 So, what are inconsistencies? 210 00:12:02,296 --> 00:12:05,383 Is there something that's both a rock and a human. 211 00:12:05,383 --> 00:12:09,343 Well, right now, is that a problem in Wikidata? 212 00:12:09,343 --> 00:12:11,703 Well, there are these constraint mechanisms, 213 00:12:11,703 --> 00:12:13,042 but they're kind of weak, 214 00:12:13,042 --> 00:12:15,835 and they're not used very well in many places. 215 00:12:15,835 --> 00:12:21,556 So, I would really like to have some tool which essentially says, "No! 216 00:12:21,556 --> 00:12:23,793 You can't have a rock and a human! 217 00:12:23,793 --> 00:12:28,541 You can have, perhaps, a human and a Klingon, 218 00:12:28,541 --> 00:12:31,778 but rocks and humans, just, no." 219 00:12:35,978 --> 00:12:39,408 There's an old science fiction story called *The God Makers* 220 00:12:39,408 --> 00:12:42,522 where they take a rock [inaudible], make it into a God, 221 00:12:42,522 --> 00:12:45,084 so maybe a rock could be a person in that sense. 222 00:12:45,084 --> 00:12:47,039 But human, no. 223 00:12:47,883 --> 00:12:49,048 Hm? 224 00:12:52,325 --> 00:12:58,055 (man) Are you asking for exhaustive disjunction? 225 00:12:59,298 --> 00:13:02,052 [inaudible] 226 00:13:02,052 --> 00:13:06,024 (Peter) No, I'm not asking for exhaustive decompositions. 227 00:13:06,024 --> 00:13:07,389 Just junctions. 228 00:13:07,389 --> 00:13:09,400 I mean, in some sense-- 229 00:13:09,400 --> 00:13:10,490 In what? 230 00:13:10,490 --> 00:13:11,623 (woman) That's undecidable. 231 00:13:11,623 --> 00:13:15,357 (Peter) What? No, well, you mean not logically. 232 00:13:15,357 --> 00:13:18,961 So, the question is whether we can actually, 233 00:13:18,961 --> 00:13:22,061 can have exhaustive definition, 234 00:13:22,061 --> 00:13:23,744 exhaustive disjunctions? 235 00:13:23,746 --> 00:13:24,774 Well... 236 00:13:24,774 --> 00:13:28,474 (man) That's pricey, right? To find out that bots are... yeah. 237 00:13:29,874 --> 00:13:32,696 (man 2) To say that rocks are disjoint from humans is easy, 238 00:13:32,696 --> 00:13:36,038 but to do that in all the cases you're going to want it, is-- 239 00:13:36,038 --> 00:13:37,205 (Peter) It's computation. 240 00:13:37,205 --> 00:13:39,963 Yes, now we have a problem with computational costs, right? 241 00:13:39,963 --> 00:13:41,457 Yeah. 242 00:13:42,057 --> 00:13:49,056 The computational cost of deciding it for Wikidata as it exists right now, 243 00:13:49,056 --> 00:13:55,420 is not impossible, it's just computationally non-trivial. 244 00:13:55,420 --> 00:13:58,735 So given that the query service is running out of [inaudible], 245 00:13:58,735 --> 00:14:04,022 so to do this right, requires tools that actually think a little bit. 246 00:14:04,022 --> 00:14:06,769 And that's going to require computation. 247 00:14:06,769 --> 00:14:08,029 How much computation? 248 00:14:08,029 --> 00:14:10,824 Well, it's not the heat death of the universe, 249 00:14:10,824 --> 00:14:14,211 it's tomorrow, perhaps, or two seconds from now. 250 00:14:14,211 --> 00:14:18,416 But two seconds times how many million things are in Wikidata 251 00:14:18,416 --> 00:14:21,672 is getting to be a reasonably big number. 252 00:14:21,672 --> 00:14:22,797 One of the things you can do 253 00:14:22,797 --> 00:14:25,801 is this thing doesn't have to be completely run in one thing. 254 00:14:25,801 --> 00:14:31,206 You can farm these out into other systems. 255 00:14:31,206 --> 00:14:36,416 We don't have to have everything all in one computer. 256 00:14:36,416 --> 00:14:38,419 And, of course, Google just gave us the answer. 257 00:14:38,419 --> 00:14:40,593 We can just put it on this new Google quantum computer, 258 00:14:40,593 --> 00:14:42,105 and it'll do everything forever. 259 00:14:42,105 --> 00:14:44,911 (woman) But it sounds like you're asking for OWL, and-- 260 00:14:44,911 --> 00:14:46,552 (Peter) No, I'm asking for part of OWL. 261 00:14:46,552 --> 00:14:48,868 (woman) You've been asking for a lot of things about OWL, 262 00:14:48,868 --> 00:14:50,359 and that just is not possible. 263 00:14:50,359 --> 00:14:53,554 That's why Wikidata works, is because it's not OWL. 264 00:14:53,554 --> 00:14:55,636 There are actually things that you can compute with. 265 00:14:55,636 --> 00:15:00,518 (Peter) So, I am asking for a bigger part of OWL, 266 00:15:00,518 --> 00:15:02,663 not all of it, yeah? 267 00:15:02,663 --> 00:15:07,311 Well, I mean, so the question is, 268 00:15:07,311 --> 00:15:09,211 is Wikidata going to spend the effort 269 00:15:09,211 --> 00:15:13,776 to buy another, perhaps, ten computers to crunch away on this permanently, 270 00:15:13,776 --> 00:15:18,316 or is it going to spend the effort of having a whole bunch of people 271 00:15:18,316 --> 00:15:20,724 argue about it, or whatever. 272 00:15:20,724 --> 00:15:25,231 And my view is computers are dirt cheap. 273 00:15:25,231 --> 00:15:30,769 I mean, I'm willing to pony up some of my very own money 274 00:15:30,769 --> 00:15:34,457 to buy Wikidata another computer to do this stuff, 275 00:15:34,457 --> 00:15:36,462 because I think it's important. 276 00:15:36,462 --> 00:15:38,062 (man) [inaudible] 277 00:15:38,062 --> 00:15:39,763 Yes. (laughs) 278 00:15:39,763 --> 00:15:42,863 I didn't say I would give it to Wikimedia Foundation. 279 00:15:45,063 --> 00:15:49,340 But I'm not asking for things that are trivial. 280 00:15:49,340 --> 00:15:52,308 I'm asking for things that require compute power, 281 00:15:52,308 --> 00:15:57,041 that require intellectual power, that require the community to do things. 282 00:15:57,041 --> 00:15:58,800 The community is doing some of these things. 283 00:15:58,800 --> 00:16:02,957 I found out that there is this property which essentially says, 284 00:16:02,957 --> 00:16:06,716 "Hey, here's how you're supposed to use this thing." 285 00:16:06,716 --> 00:16:09,222 I forget the exact name of it. 286 00:16:09,222 --> 00:16:12,976 User instructions, I thought it was three words. 287 00:16:12,976 --> 00:16:18,235 Whatever, anyway, it essentially says-- and it's on male. 288 00:16:18,235 --> 00:16:19,851 And there was a big argument about it. 289 00:16:19,851 --> 00:16:21,716 The trouble is it's not supported at all. 290 00:16:21,716 --> 00:16:24,597 There was this plan to have this property and have it supported, 291 00:16:24,597 --> 00:16:25,895 to have it show up everywhere, 292 00:16:25,895 --> 00:16:30,339 so that people would realize that human-- in other words, 293 00:16:30,339 --> 00:16:34,069 you don't use person for humans, right now it's stuck on the description. 294 00:16:34,069 --> 00:16:35,942 And it's stuck on a very short description. 295 00:16:35,942 --> 00:16:38,697 And it's very hard to figure out what it really means, 296 00:16:38,697 --> 00:16:41,713 and only a few classes have these things. 297 00:16:41,713 --> 00:16:45,123 So, we go up in the class hierarchy to these more general things, 298 00:16:45,123 --> 00:16:47,372 it's very hard to figure out what belongs to them, 299 00:16:47,372 --> 00:16:48,801 is what doesn't belong to them. 300 00:16:48,801 --> 00:16:51,734 So it's no surprise that people use them the wrong way. 301 00:16:51,734 --> 00:16:56,417 Because the people in this room-- or metaphorically in this room-- 302 00:16:56,417 --> 00:17:00,684 may understand that geographic location is used for a particular purpose, 303 00:17:00,684 --> 00:17:03,215 but even me-- 304 00:17:03,215 --> 00:17:06,413 I think I have a fairly good background in representing things-- 305 00:17:06,413 --> 00:17:11,066 don't know the answer to that, or at least, it requires me to spend 306 00:17:11,066 --> 00:17:13,466 at least an hour of effort to get a good answer to that. 307 00:17:13,466 --> 00:17:16,703 And that's really not scalable. 308 00:17:16,703 --> 00:17:18,350 So, I'm not asking for nothing, 309 00:17:18,350 --> 00:17:20,472 I'm asking for lots of things, 310 00:17:20,472 --> 00:17:24,155 but the trouble is, I mean, I think-- 311 00:17:24,155 --> 00:17:27,369 well, I think I'm important but anyway, you can ignore me. 312 00:17:27,369 --> 00:17:30,993 I think that I'm a pretty good use case for Wikidata. 313 00:17:30,993 --> 00:17:34,328 I really want, not just a bit of Wikidata, 314 00:17:34,328 --> 00:17:36,333 I want a lot of it. 315 00:17:36,333 --> 00:17:42,498 And I work for a very big company but the part of that company 316 00:17:42,498 --> 00:17:47,906 that needs, or wants, or cares about Wikidata is quite small. 317 00:17:47,906 --> 00:17:52,929 So, if I worked for a company that really cared about data, 318 00:17:52,929 --> 00:17:55,560 and was willing to put hundreds of millions of dollars 319 00:17:55,560 --> 00:17:59,985 into curating Wikidata, and put it into their own knowledge graph, 320 00:17:59,985 --> 00:18:02,649 using Wikidata would be no problem. 321 00:18:02,649 --> 00:18:07,614 My company, perhaps, has a million dollars to take Wikidata 322 00:18:07,614 --> 00:18:09,431 and put it into a knowledge graph. 323 00:18:09,431 --> 00:18:13,097 A million dollars doesn't go very far these days. 324 00:18:13,097 --> 00:18:17,475 So, the problem-- and let me say something 325 00:18:17,475 --> 00:18:22,102 that actually isn't in the slides, but which I really firmly believe in. 326 00:18:22,102 --> 00:18:24,393 The problem with Wikidata not-- 327 00:18:24,393 --> 00:18:27,743 Wikidata's great, 328 00:18:27,743 --> 00:18:33,314 but to really use it, you have to spend a lot of effort. 329 00:18:33,314 --> 00:18:39,777 And most companies, and most individuals, and most groups 330 00:18:39,777 --> 00:18:46,073 can't expend that amount of effort to really use it well. 331 00:18:46,073 --> 00:18:51,710 I think that on the Wikidata side, they should try to be greater 332 00:18:51,710 --> 00:18:54,947 so that more people could really use it. 333 00:18:54,947 --> 00:18:59,458 And that's really, I think, the guts of this presentation 334 00:18:59,458 --> 00:19:04,002 is that if Wikidata community improved Wikidata 335 00:19:04,002 --> 00:19:08,364 so it would be more clear as to what's going on, 336 00:19:08,364 --> 00:19:10,742 then more people could put information into it 337 00:19:10,742 --> 00:19:12,090 without making mistakes, 338 00:19:12,090 --> 00:19:15,395 and more people could use it without having to spend a lot of time 339 00:19:15,395 --> 00:19:17,190 to curate it. 340 00:19:18,090 --> 00:19:23,686 Alright, so, we've gone through lots of this stuff. 341 00:19:25,286 --> 00:19:27,515 Let me just say a few things. 342 00:19:27,515 --> 00:19:33,402 So, I've looked at a fair bit of Wikidata, 343 00:19:33,402 --> 00:19:37,341 and every time I look, I find a problem. 344 00:19:37,341 --> 00:19:39,658 That's bad. 345 00:19:40,858 --> 00:19:42,575 I haven't done a quantitative study, 346 00:19:42,575 --> 00:19:44,258 and somebody should do a quantitative study 347 00:19:44,258 --> 00:19:46,851 of some of these things, it would require a lot of work to do it, 348 00:19:46,851 --> 00:19:50,029 but essentially, I look at something and I find a problem, 349 00:19:50,029 --> 00:19:51,143 and that's not great. 350 00:19:51,143 --> 00:19:52,429 I find missing information. 351 00:19:52,429 --> 00:19:57,568 But I don't have anything to say about adding in missing information. 352 00:19:57,568 --> 00:19:59,456 Yes, Dan? 353 00:20:00,056 --> 00:20:02,358 (Dan) With respect, you always find problems. 354 00:20:02,358 --> 00:20:03,367 (Peter) Yes. 355 00:20:03,367 --> 00:20:04,706 (audience laughs) 356 00:20:04,706 --> 00:20:07,547 I am very good at finding problems. 357 00:20:07,547 --> 00:20:13,305 Actually, so one of the problems that I have, the problem with "woman"-- 358 00:20:13,305 --> 00:20:15,105 (laughter) 359 00:20:15,105 --> 00:20:17,733 The problem with-- I didn't find the problem with "woman". 360 00:20:17,733 --> 00:20:19,933 (chuckles) 361 00:20:19,933 --> 00:20:22,630 Turns out that a co-worker, I showed her a page, 362 00:20:22,630 --> 00:20:25,306 where I had found a different problem and she looked at it 363 00:20:25,306 --> 00:20:27,162 and said, "Oh, 'woman'." 364 00:20:27,162 --> 00:20:28,801 And so she found that problem 365 00:20:28,801 --> 00:20:32,394 on a display that I already found the problem. 366 00:20:32,394 --> 00:20:35,874 So, missing information-- 367 00:20:35,874 --> 00:20:38,230 there just should be more information in Wikidata. 368 00:20:38,230 --> 00:20:39,899 There's factual errors in Wikidata, 369 00:20:39,899 --> 00:20:41,429 but everybody's got factual errors. 370 00:20:41,429 --> 00:20:43,081 Bots make it a little bit worse. 371 00:20:43,081 --> 00:20:45,608 There's problems with the ontology, 372 00:20:45,608 --> 00:20:48,537 which I think is a place that-- 373 00:20:48,537 --> 00:20:53,049 you can expend effort there and really improve quite a lot of things. 374 00:20:53,049 --> 00:20:55,548 And then there's also the problems with qualifiers, 375 00:20:55,548 --> 00:20:57,214 and really temporal qualifiers. 376 00:20:57,214 --> 00:21:00,898 It's very hard to figure out what's true at a particular time 377 00:21:00,898 --> 00:21:03,567 because there's a whole bunch of temporal qualifiers 378 00:21:03,567 --> 00:21:05,688 that could be relevant. 379 00:21:05,688 --> 00:21:09,023 Which ones count and which ones get used, 380 00:21:09,023 --> 00:21:10,550 and are they going to stay the same? 381 00:21:10,550 --> 00:21:12,396 Are we going to add a new one tomorrow? 382 00:21:12,396 --> 00:21:15,474 So then I have to change every one of my programs. 383 00:21:15,474 --> 00:21:18,870 I really think all this kind of stuff, it would be better to hide that 384 00:21:18,870 --> 00:21:21,845 from the consumer so that Wikidata would just say, 385 00:21:21,845 --> 00:21:24,497 "Okay, you want to know what's true at time X? 386 00:21:24,497 --> 00:21:27,648 Here's an interface that tells you what's true at time X," 387 00:21:27,648 --> 00:21:31,386 instead of having me to write all of this stuff. 388 00:21:35,666 --> 00:21:37,350 It's on, I think it's on. 389 00:21:37,350 --> 00:21:39,087 Yeah. 390 00:21:40,287 --> 00:21:47,287 (man) I think you like the idea of what is possible with Wikidata, 391 00:21:47,287 --> 00:21:53,086 but you say that it's not used like your idea. 392 00:21:54,186 --> 00:22:00,558 So if, from my perspective, Wikidata is a collection of statements 393 00:22:00,558 --> 00:22:05,423 from persons and from machines, and so on, and some might be true, 394 00:22:05,423 --> 00:22:09,409 some might be discussable. 395 00:22:09,409 --> 00:22:12,641 What you could do would be, from my perspective, 396 00:22:12,641 --> 00:22:16,298 you could use a computational intelligence 397 00:22:16,298 --> 00:22:21,542 to score the statements if they are... 398 00:22:21,542 --> 00:22:24,042 (speaking German) 399 00:22:24,042 --> 00:22:25,342 ...contradictory, 400 00:22:25,342 --> 00:22:27,669 or if they are common sense. 401 00:22:27,669 --> 00:22:31,726 So you could score them, and then you can filter on the score, 402 00:22:31,726 --> 00:22:34,443 and then you have what you wanted. 403 00:22:34,443 --> 00:22:39,426 (Peter) Possibly, except without a notion of what things mean in WIkidata, 404 00:22:39,426 --> 00:22:42,452 I can't even figure out whether two things are contradictory. 405 00:22:42,452 --> 00:22:44,585 I mean, there's constraints and that helps, 406 00:22:44,585 --> 00:22:48,443 but I don't think that's a full solution. 407 00:22:48,443 --> 00:22:53,418 And common sense-- I don't have much common sense 408 00:22:53,418 --> 00:22:56,674 and my programs have a lot less than I do. 409 00:22:56,674 --> 00:23:01,454 We could write a lot of stuff which tries to say some things 410 00:23:01,454 --> 00:23:05,210 about common sense, but, again, I think that requires an understanding 411 00:23:05,210 --> 00:23:06,653 of what's going on. 412 00:23:06,653 --> 00:23:11,121 And yes, so Wikidata has references which are supposed to be some notion 413 00:23:11,121 --> 00:23:13,923 of what's really supported, 414 00:23:13,923 --> 00:23:20,604 except, here's a problem, and it's very hard to see this. 415 00:23:20,604 --> 00:23:23,289 Here's a problem with Wikidata from a while ago. 416 00:23:23,289 --> 00:23:25,818 This is a movie that's got three directors listed-- 417 00:23:25,818 --> 00:23:29,725 the *Corpse Bride*-- and it's got Mike Johnson, twice. 418 00:23:29,725 --> 00:23:32,723 Different Mike Johnsons. 419 00:23:32,723 --> 00:23:36,367 And they both have a lot of references. 420 00:23:36,367 --> 00:23:40,042 So there's a lot of things that say that *Corpse Bride* 421 00:23:40,042 --> 00:23:44,902 has got two different Mike Johnsons as directors. 422 00:23:44,902 --> 00:23:48,489 And there they are, one is a director, one is a singer. 423 00:23:48,489 --> 00:23:52,269 What happened, some bot went through and accidentally did a bad thing 424 00:23:52,269 --> 00:23:55,462 in Italian Wikipedia-- got the wrong thing in there-- 425 00:23:55,462 --> 00:23:57,803 and then a bunch of other bots piled on 426 00:23:57,803 --> 00:24:00,648 and essentially created false references. 427 00:24:00,648 --> 00:24:02,254 So, this is a real problem. 428 00:24:02,254 --> 00:24:06,139 So, seven references! 429 00:24:06,139 --> 00:24:07,839 That's really good. 430 00:24:07,839 --> 00:24:10,015 And they're not crap references. 431 00:24:10,015 --> 00:24:16,358 They're some movie databases-- real things. 432 00:24:16,358 --> 00:24:18,756 So, that's one of the things. 433 00:24:18,756 --> 00:24:21,363 Here's another one-- there's the Aral Sea. 434 00:24:23,063 --> 00:24:28,337 These are the biggest-- by volume-- lakes in the world. 435 00:24:28,337 --> 00:24:33,219 There's the Aral Sea. That comes from Wikidata, by the way. 436 00:24:33,219 --> 00:24:36,433 There's Lake Michigan-Huron. 437 00:24:36,433 --> 00:24:38,547 I didn't realize there was a Lake Michigan-Huron, 438 00:24:38,547 --> 00:24:40,933 and I live on one of them. 439 00:24:41,866 --> 00:24:43,582 So, here we have two problems. 440 00:24:43,582 --> 00:24:47,167 This is an ontological problem-- what's a lake? 441 00:24:47,167 --> 00:24:50,432 And so is Lake Michigan-Huron a lake? 442 00:24:50,432 --> 00:24:52,813 Well, don't know. 443 00:24:52,813 --> 00:24:56,651 This one here is a temporal qualifier problem-- 444 00:24:56,651 --> 00:24:59,716 how big is the Aral Sea now? 445 00:24:59,716 --> 00:25:02,183 Not 22,000 square miles. 446 00:25:02,183 --> 00:25:04,645 Not 11,000 square miles. 447 00:25:04,645 --> 00:25:10,280 So, what is it? Sorry, 26,000 square miles. 448 00:25:10,280 --> 00:25:13,613 Although this is something from Google, of course, 449 00:25:13,613 --> 00:25:15,609 but that's in there. 450 00:25:15,609 --> 00:25:20,988 So anyway, I got a bunch of other things along these lines, 451 00:25:20,988 --> 00:25:23,264 which you can see if you care, 452 00:25:23,264 --> 00:25:26,636 but I've given you my suggestions already, 453 00:25:26,636 --> 00:25:29,843 you can either like my suggestions or not, 454 00:25:29,843 --> 00:25:32,494 but I've-- woah-- (chuckles) 455 00:25:33,429 --> 00:25:35,859 I think I've sort of supported some things. 456 00:25:35,859 --> 00:25:37,995 So, anyway, I had questions in the middle, 457 00:25:37,995 --> 00:25:40,452 and we are done, are we having a question or not? 458 00:25:40,452 --> 00:25:41,897 - (woman) We're done. - (Peter) Okay. 459 00:25:41,897 --> 00:25:44,402 - (woman) Sorry, that's it. - (Peter) (laughs) 460 00:25:44,402 --> 00:25:47,397 (audience applause)