Connected Data World 2021  All Rights Reserved.

Connected Data is a trading name of Neural Alpha LTD.
Edinburgh House - 170 Kennington Lane
Lambeth, London - SE11 5DP

 

A talk by George Anadiotis, Ashleigh Faith, Katariina Kari, Paco Nathan, Tara Raafat, Panos Alexopoulos, James Phare, Ivo Velitchkov and Andrea Volpini

 

About this talk

 

 

#CDW21 is kicking off with a live discussion among our top-notch contributors and Program Committee members, and you're all invited!

 

Join us as we have a sneak peek through the CDW21 program, and discuss the Connected Data landscape.

The CDW21 Program Committee members will go through the 50+ sessions and 70+ speakers, and talk about:

 

  • The Connected Data landscape
    • Knowledge Graphs
    • Graph Databases
    • Graph Analytics
    • Graph Data Science &
    • Semantic Technology
  •  
  • Topics, speakers and talks that piqued our interest
  • Our own work in the domain and how it cross-cuts #CDW21
  • Community chat and Ask Me Anything

 

  • More in-depth topics as time permits:
    • Hiring a team for building knowledge graphs: required roles and skills, what can be taught? 
    • I want a knowledge graph! What next? The process of starting to build a knowledge graph for an organisation: assessment of need, use cases, support needed from management etc.
    • Triple Store vs Labelled Property Graphs: It's not either-or, it's both and more!

 

With an all-star Program Committee and lineup, this will be a tour de force in Connected Data.

 

Podcast:

 
Video:

 

Transcript:

 

[00:00:00]Welcome to the Connected Data London podcast brought to you by the Connected Data London team. Connected Data London is the leading conference for those who use the relationships, meaning, and context in data to achieve great things. We have been Connecting Data, people and ideas since 2016. We focus on knowledge, graphs, linked data and semantic technology, graphs, databases, AI and machine learning technology use cases and educational resources. Connected Data World 2021 is kicking off with a live discussion among our topnotch contributors and program committee members as they have a sneak peek through the CDW 21 program and discuss the Connected Data landscape.

 

[00:00:44]The CDW 21 program committee members will go through the 50 plus sessions and 70 plus speakers and talk about the Connected Data landscape, knowledge graphs, graph databases, graph analytics, graph data science and semantic technology.

 

[00:01:00]George Anadiotis: Right. Since people have already started unofficially, let's say, coming in without the usual knocking sound, I guess we may as well unofficially start as well. And since I'm the host, I guess it's my job to welcome you. First of all, welcome to all our panelists and esteemed program committee members and big. Thanks for being here today and of course, for being members of the program committee and for contributing to this event in various ways and big. Welcome to the people joining us today as well. And the idea here today is basically that we're going to have light discussion.

 

[00:01:44]: So we're going to go through the topics of the event Connected Data World 2021, and we're going to pick and choose our favorites, let's say and then we're going to just discuss and see what we most look forward to and let us start by introduction. So I'll go first. I'm George, an adjutis. I'm the managing director of Connected Data World. That's actually one of the hot size where I work a number of hats. I also write for a few publications and venture bit analyst working with GIGO.

 

[00:02:26]Speaker 2: Actually, just recently we published a report on graph databases, which may be interesting for some of you. I work as a consultant and a few other things, but let's keep it nice and short and I'll pass the button to the next person. So let me randomly choose as we have the Gallery here, the one to my left. So Andrea Volpini, you go next?

 

[00:02:53]Speaker 3: Yes. Hi, everyone. My name is Andrea. I am cofounder and CEO of Wordlifter. We work by creating Knowledge Graph for automating search engine optimization. So we are in between marketing and the graph world.

[00:03:12]Speaker 2: Great. Thanks, Andrea. Let's keep passing to the next person to the left. Then it's you Katarina.

[00:03:22]Speaker 6: Hello, everyone. My name is Katarinakari. I'm a lead ontologist at Inter Ikea Systems, which maybe don't know. But Ikea is a franchisee. And then the warehouses or the stores are run by different companies to which franchise orders. So the franchisee is basically responsible for the entire concept and also licensing it. And now Ikea is doing big leads in terms of digitalization, even though investor has been around for 20 years. And I'm part of that story. There's a lot of people doing that, but part of that story in terms of building a knowledge graph there.

[00:04:06]Speaker 6: Previously I've worked at Zalando, Europe's biggest ecommerce platform for fashion and lifestyle, and before that I was in classical music and doing various digitalization projects for that art area. And before that I was a project researcher semantic research at the Helston University of Technology. So I've been like 14 years in the domain now. Almost that's me. And I guess the next one is if I then go down and right is Ashley.

[00:04:42]Speaker 2: Yeah, it is. Thank you.

[00:04:45]Speaker 5: So Hello. My name is Ashley Faith. I have a day job at EBSCO Information, which is one of the largest aggregators of scholarly research, and I work primarily on the knowledge graph there behind search recommendations and sense making. And then I have a thing I do in my spare time, which is a YouTube channel where I teach people about knowledge, graphs and machine learning and other fun things in the data science world. I think it's Panos.

[00:05:19]Next.

[00:05:21]Speaker 4: Josh. Hello. My name is Bonsa Alexa Bolos. I also have around 15 years experience in the semantic technology field, and currently I work as a head of Ontology So Text. Kennel is a company that builds semantic software for matching people and jobs about the equipment and labor market domain, and my team is responsible for building a large knowledge graph on this domain.

[00:05:54]Speaker 2: Great. Thank you. Paranoia.

[00:06:00]Speaker 7: Hello, everyone. My name is Ivo. I've been in it in various rows for about 25 years, and my involvement with my Knowledge graphics in the last ten years. I'm an independent consultant. I do also trainings data strategy and guide implementation projects with Knowledge class.

[00:06:22]Speaker 2: Great. Thank you very much, Evo. All right. So I guess the next step in setting the stage trivial, as it may seem to some of us at least, would be. I thought it would be good to start by defining what do we mean by connected data because, well, it's a kind of large east and definitely heterogeneous area. And in our definition there are a few basic pillars. Let's say that defines that. So those are graph analytics and graph data science, semantic technology and Semantic Web knowledge graphs, graph databases, and as of late graph machine learning and graph AI.

[00:07:11]Speaker 2: So as I was having also this discussion of how to set the stage and how to explain what connected data is with a few co panelists the other day because not everyone is familiar with that. One of them had a very good idea. He said, well, how about approaching that in a timeline kind of fashion? So I think if we are to do that, then probably we'll have to go back to the 17th, 18th century when graph theory was founded, the famous Bridges of Kennedy problem which led to the development of graph theory by Leonardo.

[00:07:53]Speaker 2: And so in a way that was the beginning of graph analytics and graph algorithms. And I would define that category. Let's say that sub category of connected data. Well, the use of algorithms that help people derive useful analytics. We have different classes of algorithms like sort of spa proximity. And depending on the class, the different classes of algorithms. This can be very useful in use cases such as anti fraud or recommendations and so on and so forth. So historic that's the first area, and just to give the opportunity to someone else to say something about the different subdomains.

[00:08:44]Speaker 2: I think if we keep the historical timeline, then probably next in line would be Semantic Technology and Semantic Web. So whoever wants to pick it up and say a few words about that, I know you can all do that.

[00:08:59]Speaker 5: Well, before we move on to that, I will have to say that ontologies is actually date farther back than that. The Greek philosophers were talking about ontology as a theoretical way of thinking about things. So I just wanted to point that out because I think that the theory behind it and connecting things in a class like structure, I think, is not incredibly new, even from your perspective.

[00:09:27]Speaker 2: George, that's true. It depends on where you choose to draw the line. Basically, if you want to go all that way back, yes, you can. Definitely. Okay. So who wants to pick up Semantic technologies and Semantic Web?

[00:09:49]Speaker 6: Well, I would also like to just introduce one more before that, which is kind of everything around description. Logic is already in the 70s. That kind of ideas for reasoning that existed there already. I have to honestly say that during my technology studies and research, there wasn't a lot of that taught at universities, which is interesting. So when I arrived at Semantic Web, it was all about semantic weapon here. But that history, what happened in the Seventies, those deterministic ideas of calculating things and getting more truth out of one truth.

[00:10:37]Speaker 6: What now is known as reasoning for us in Semantic community that prework, I think, should also go into that timeline.

[00:10:47]Speaker 2: Yeah. That's a good point.

[00:10:49]Speaker 5: The beginning of the Semantic Web, I think, is right after that. Like, if you're looking at how things are connected together, most people talk about Semantic Web as like Tim Bernersley is like the first, but he was actually building off a lot of things that DARPA and Aloha. Net and some of those things had developed in the late 60s, early 70s, where the Internet wasn't a thing yet other than like MIT, Stanford and whatever that triangle of universities was. But that's where it all began. A lot of the things that Timberley outlined with hyperlinks and cool URLs and all that Uris that all stems from the ability to actually connect other systems together.

[00:11:39]Speaker 2: Yeah. And to add to Catherine's point, I also remember back in the early 90s, and I'm so in my age now, when we had this AI class in the University, it was kind of a French class, actually. Back then, not many people were interested. Not many people took that class, and it was kind of weird with all these symbols and reasoning and all of that stuff. It was to me vaguely, vaguely interesting, but just to add to Katarina's point about description, logics and reasoning and all of that.

[00:12:13]Speaker 2: So yes, it goes back even before the Semantic Web.

[00:12:18]Speaker 7: Two parallel strings. One was one that for expert systems, symbolic reasoning, this kind of work then parallel with that, it was just the idea about interoperability. So we found a way to represent and share content. Then it was time to do the same with data. And this is how LDF appears.

[00:12:45]Speaker 4: A comment to what George said earlier about, as you said, early 90s. It was just in the beginning of a new decade, and then decades before we had already passed an AI window, especially with respect to expert systems, which was a big thing in the 80s. And then they proved not to be scalable, et cetera. I guess they thought, okay, now that we have connected things and we have Internet, let's try to apply the idea of semantic network, which is actually also a very old concept at a higher scale.

[00:13:23]Speaker 4: What I think the Semantic web community may miscalculate it a bit was the challenge of scaling semantics and scaling meaning, because indeed, that came. And it's a dis and standard informat. In my view, it was treated a bit as a Holy grail, in the sense that it's enough to represent things in RDF and semantic operability will come. So let's say the effort on defining RDF and building the first action of RDF and the effort on the ontological aspect. I don't think they were equivalent or at least enough to ensure that.

[00:14:16]Speaker 4: Okay, we can use RDS in the correct way so that we have semantics. And in my view, many of the link data that are out there are not as usable as the semi community proposals. They are. It's really important to have and especially for recent purposes, et cetera. But at least in my experience, I have had very few, very little incentive to either reduce or even worse or even better, to connect to these data sets. And I think that's something that the community needs to take as feedback and consider it and have an introspection quick question.

[00:15:11]Speaker 2: So you refer to Usability, let's say, of Semantic Web standards and data sets, actually, that's my question. Are you talking about the Vocabularies, or are you talking about the data sets or both, maybe.

[00:15:30]Speaker 4: The content that the developers of link data put in the link data. So let's say, for example, RDF, right. Rdf defines some very nice predefined elements, like the subclass taxonomy relation or the labeling relation for lexicalizing entities. There are these things. And actually this is meant to be, let's say, a nice feature that other technologies don't have. So a property graph like me for J doesn't have a schema, doesn't have predefined elements. And what RDF is selling is that look by predefining these elements and agreeing everyone that they have this particular meaning, which then we can use, as Katarina said, for reasoning, we can have nice behavior.

[00:16:26]Speaker 4: The thing is that when they input to this, when the subclass relation in RDF doesn't really reflect the sub class relation conduct wise or the labels that we have for entities are not really seen on each other, as in theory, they should be. We end up with problematic reasoning. And this is expected to happen because conceptual modeling, semantic modeling is really hard, right? It's about agreement. Modern technology. It's about agreement, all of us together. For example, in a given scope, we have to agree on the meaning of things and agreeing on the meaning of things.

[00:17:07]Speaker 2: It's famously hard.

[00:17:09]Speaker 4: Exactly.

[00:17:10]Speaker 6: If we kind of get this. What you're saying punish about the kind of difficulties when you have a lot of things out there you have. So we're at the timeline where we discussed what's before 2001, when that famous Tim Bernard, Tim Hender or Alasila and Semantic Web article came out, and maybe the next in that timeline. Then there's RDF description that's getting those standards up like that will follow, as you said. But then we probably hit the first milestone, which had exactly these difficulties, as you described, which is linked to open data.

[00:17:55]Speaker 6: So the promise I remember that was when I was a researcher in this Valentine community. The EU was putting a lot of money into it. We had so much research into it, and we like linked open data, linked opening data, cloud and applications on top of it. And there was a big height. And I remember retreating to the arts because I felt that the bubble was about to burst. So I retreat for five years to do something slightly different. But I felt that was like around 2008, seven that time around 2010.

[00:18:35]Speaker 4: Let me give an example here. Right? Even not with open data, even with closed data. So let's say that idea where Tatarina works has its knowledge graph. Right. And tomorrow it acquires another warehouse company that has its own knowledge graph. And let's assume also that both these graphs have implemented in RDS, right. Using. And we also assume that they have the best ontologies available every company. Still after the acquisition, if the two teams that have the two knowledge graphs decide to say, okay, let's map them together.

[00:19:17]Speaker 4: Let's merge them. My bet is that it will be at least one year project. It will be an expensive work. It will be worthwhile, but it will be expensive. It will not be a simple thing like, let's do it. It will be done in a month or so.

[00:19:44]Speaker 7: But if you compare them. If they were not having RDS and have to do the same task, then I think we are talking about completely different magnitude of I completely agree.

[00:19:57]Speaker 4: So I'm not saying RDS is useless in that sense. So it's really a step forward. I'm just saying that we are not yet at the stage of being able to have large scale interoperability, especially in the web, and RDS is one foundation, but we need to work further and work further should be on one direction, should be on making the semantics even clear and even working towards the content in a similar way that in the machine learning community. After building all these fancy algorithms, they're not coming back to say our data are problematic.

[00:20:48]Speaker 2: Right. I was just going to say we have a question from the audience. So someone is asking if all knowledge graphs are based on an ontology model.

[00:21:05]Speaker 5: So I answered in the chat a little bit. But no, not all knowledge graphs are built off of an ontology. Most graphs have to have some kind of model because you have to relate two things together and you have to know what they are. But ontologies normally are reserved for these triple stores that are usually RDF or versions of RDF type triple store things based, but with labeled property graph. You have some kind of model behind the scenes. It's not usually an ontology, though.

[00:21:37]Speaker 2: That's a perfect segue to you wanted to say something, but I was just going to say that since we already mentioned a couple of times actually already RDF and knowledge and property graphs. It's a great segue to go to the next subdomain. So graph databases because those are the two key models. But I'll let you finish what you wanted to say, and then we can talk about that as well.

[00:22:04]Speaker 4: I wanted to say that this is something that I don't agree that you cannot tie a term like ontology to RDF or to a particular implementation. Right. An ontology. We see the definition is a conceptual model conceptualization of. If we take the very first definition of group, it's like for me, the terminology in the lexicon ontologies, it's not correct. They have different.

[00:22:37]Speaker 5: So I think that the terminology is what you're saying when you say ontology you're right. It's a model. And that's why I said that most of the time there's some kind of model behind both. But when people talk about ontologies and I would love to hear if people have heard it related to something other than RDS, because I really don't hear that very often.

[00:22:57]Speaker 6: Yeah, I would agree as well. It's definitely the class and property definition part of RDS, and then it even excludes by some more stricter definitions. Instance data and data graph.

[00:23:13]Speaker 5: But obviously like, how is that any different than a note an edge and that's what level property graph uses. It's almost the same thing. It's just the different levels of abstraction that we're talking about.

[00:23:23]Speaker 4: Yeah. It's about having some predefined construct, some predefined elements. So the note on the edge is similar to in RDF to the entity of the relation and the predicate. Let's say in the subject object, it's just that RDS comes and says, okay, we also do a separation between classes and instances, an explicit one in addition to.

[00:23:54]Speaker 5: Yeah.

[00:23:55]Speaker 2: I was just going to say that actually, we spent quite some time on the Semantic Web part, which is interesting. It always seems to be controversial in a way and sparking conversation. But just to move the discussion to end the timeline as well to the next stop, let's say so. Fine. We have all these graphs, whether there are RDS graphs or property graphs, and you need to store them somewhere, obviously. And this is the part where graph database is coming to play. And I would say that historically, in my view, at least the birth of graph databases was pretty much coincides with the birth of the Semantic Web because the first graph databases were triple stores RDF stores, and then eventually property graphs came along as well.

[00:24:42]Speaker 2: And so yes. Now we have this big tent of graph databases with a very big number of vendors, actually remarkably big for one for one technology. And yeah, we have those two comes, let's say RDS and Property graph. So you already started touching upon those two presentations. So I'll just let you continue. I just wanted to introduce the next top in the timeline.

[00:25:12]Speaker 3: Basically, I also share a chart. I don't know if you see that.

[00:25:19]Speaker 2: Yeah. But if you show it, you can share it again because I think it's off screen now.

[00:25:23]Speaker 3: Yeah. So yesterday we published the new version of the latest version of the Web Almuna Construction data. And so this is probably the most comprehensive list of analysis of website for structured data. And it's surprising to see that after so many years, the most popular format remains are DFA, and this kind of gives us an idea also how this evolution from the Semantic Web or even before the Expo system, and then Semantic Web and then Linked open data movement. And then now the knowledge graph. And it's kind of leaving a strays on this public web.

[00:26:10]Speaker 3: And we're creating a very large open graph that is accessible to anyone. And here it's a list of 13 million websites. But it's interesting to see how we got fire with this idea of moving from the web pages to the web of data. So that's one of the charts that I have for you and I have another one that we are about to publish tomorrow, but we can use it later.

[00:26:40]Speaker 2: It is interesting to see that after all this time.

[00:26:43]Speaker 3: RDFa still is the most widespread it has to do with the kind of legacy CMS is using. It the Drupals and the Templating. When we think about these technologies applied to web scale. Then, of course, there is a digital footprint that is left behind by technologies that don't get updated or get slowly updated. And this is quite evident in this report because we can see construct like a friend of a friend that being used in the early days that are still present because they are kind of included in so many static templates.

[00:27:21]Speaker 3: And it's really fun to see how we're going. But it's interesting because we can also see that whether, of course, this is a representation of the public data and the structured data that is primarily driven by, of course, the web search engine we can start to see arising more and more publisher using this data way beyond SEO. So the schema or specific articles in the automotive where a search engine would only grasp a fraction of the amount of data that gets published.

[00:28:03]Speaker 7: Andrea I shared the link in the chat, which is from the last year common crowd corpus, and it gives different ranking of the available data right on the webinar. Rdfa is not top and not second even right.

[00:28:23]Speaker 3: So the way in which the data has been sampled is different in these two data sets. The web archive, it's basically analyzing the homepage, which of course provides limitation in terms of the understanding of the format. So it's a wide different type of data set. I was also surprised to see that difference, but then thinking about the way in which the data is sample, which is looking at their own page, and there are a lot of these legacy system there. It would make sense to see our DFA as predominant format.

[00:29:14]Speaker 7: Now, apart from this common crowd corpus, whenever I grow something from the web, I find much more JSON with.

[00:29:26]Speaker 3: Yeah, of course it's micro. It is growing very fast in D sample. Jason D goes to the 33% of the entire data set, which is pretty huge compared to what it was last year, and also in terms of usage and interoperability, everyone is looking at Jasonld and no one is probably looking at RDFa. Also because RDFa is used for, as I said, these very kind of static descriptive properties, whether in the case of Jasonald, we can finally start to see graph or small graph emerging. Let me share that too is great.

[00:30:08]Speaker 7: But in many implications you would need to intertwine data with HTML. Well, JD is basically disconnected from the HTML and that in some cases problematic. There was such a case with RDFa text editor that when trying to do that with Jason LD, that's really difficult to borderline impossible.

[00:30:35]Speaker 2: Yeah, just to add that since we're theoretically at least talking about graph databases. I've also seen some graph databases starting to support JSON led out of the box lately.

[00:30:51]Speaker 3: Yeah, it's becoming more common. Of course, for us it's kind of a native language doing SEO. So here it's interesting because we can see the chains of entities and the relationship among these entities and how they are distributed. And from this chart we can see the different sectors behind this data. So the generic publishing with generic properties such as web page, and then as kind of the blog posting the article and then, of course, the ecommerce side. So the product classes and the offer. And then as we kind of go down, we can start to see the local businesses and then the automotive sector, the software industry.

[00:31:44]Speaker 3: So it's kind of becoming quite a large amount of data from different verticals, which I mean, in my opinion, it is the semantic web as it was originally designed, that it's happening, and we currently have search engine that are using it the most. And people are looking from kind of the search engine perspective in the first place. But there is so much more that can be done on top of this data that we're just at the beginning of it.

[00:32:17]Speaker 7: Just one comment for what Jordan said. Indeed, it's happening with Jason LD, but he is also important to know the difference between RDF and LPG. It's not only in the format in the way data is structured, but there are different species altogether, because RDF is a conceptual graph, while LPG is particular to graph databases. In that sense, you can have MDF with document database in GraphDB on SQL or triple stores and club stores, and the same with Jason. Maybe there's a way now to handle it in a database, but there are plenty of implementations where just nodes managed by MongoDB Elasticsearch, just a simple file server.

[00:33:13]Speaker 2: Sure, indeed.

[00:33:17]Speaker 5: And let's not forget, you don't have to pick either, or you can use a label property graph and a triple store. They do work together. It's not something that you have to pick a camp.

[00:33:29]Speaker 6: Yeah, I remember when I did like, actually, I'm really not because I just use Neil for jail visualization sometimes because it's just really quick and I have my own script to kind of just pull it out and I like it best. And I had it in a blog post for actual the actual work I did in RDS, but I was just using new for J visualization, and I remember getting hate mail of sorts of like one using W three stack like proper vehicle, which is the right stack.

[00:34:01]Speaker 6: And I was like, Come on, it is not either or it's both and more. There are some amazing graph analytics things that you cannot even do with the triple store. We have these great graph scientists or data scientists who had them. The length of the length of the path was really important either to input or to then get as an output from any kind of query, and that's just not supported by Sparkle. And that's why you cannot do everything with either or use both for whatever and work with them interchangeably.

[00:34:47]Speaker 6: I think a knowledge graph in itself is not an RDF text file. A knowledge graph is so much more. Somebody in the chat already was saying that it's basically instantiating from the ontology, like taking the ontology and then adding the muscle and the fat around the skeleton. But then the knowledge graph can even be even more and very many different kinds of systems. And I think that's when we're going to discuss also enterprise knowledge graph. And I'm always saying it's probably 10% of writing RDS. And then the work to create a knowledge graph is so much more because there's also that thing that we were talking about, which I think you were also kind of referring to, which is the shared conceptualization.

[00:35:43]Speaker 6: And whereas we have graph databases and we have our standards, we don't even have that many tools, as you would expect for creating the shared understanding between humans and supporting the dialogue and building a craft in a community way that schooling kind of doesn't exist, or it exists to some extent in like enabling, commenting for triples or something like that in just little bits and pieces. And then if you're building an enterprise knowledge graph, most of these enterprises are not dictatorships where everything is centralized, but they are kind of like scattered around.

[00:36:34]Speaker 6: Tours are scattered around and how to then create that shared conceptualization in those so that's I think even the bigger question.

[00:36:43]Speaker 5: So one thing I have been involved in is Carnegie Mellon University has something that they are working on to actually facilitate some of that. It's going to be similar to human in the loop machine learning kind of review of training sets, but more on the triple and graph size. So I've been actually working with them on what is the interface look like for non graph people to be able to interact with these things to help refine what we all have in our knowledge graph. So I would encourage you all to go and check out what they're working on.

[00:37:18]Speaker 6: Because it's pretty cool.

[00:37:21]Speaker 4: Nice if I may offer perspective here because the knowledge graph that we have been building when I joined the company is completely a label property graph. It's new for J, even though when I came to the company, if I had the chance, I would probably do it in RDF. I found some situation already. There some existing initiatives which made us create property graphs. So what I realized from this experience is the following. When it comes to building all the routes, there is the perspective of the modeler of the conceptual model who thinks in terms of concept entities without caring about how the things under the hood work.

[00:38:10]Speaker 4: And there is a perspective of the engineer, the engineer who needs to run the queries, the queries to run fast, right, or the algorithms on top to be able to run. These are two perspectives. And for me, the debate about triple stores versus one part of the debate between triple stores and property graphs these kind of databases is related to the engineering aspect. So which queries run faster, which are not what can you run? What are the pros and cons of it conceptually wise. The additional difficulty that I had in my team to do was that I didn't have what LDF already provides.

[00:38:59]Speaker 4: I didn't have the subclass things. I didn't have a class versus individual. I didn't have a label, I didn't have all these predefined elements, so I had to define them from scratch in a nutshok way. So for example, for Jay, when I want to relate for an entity, it's labels, I do have RDFS label. I have just an edge which I call label, and this relates to another edge, which I call lexical term. The problem with the limitation of that is that if I want to publish my knowledge graph to the web so that others use it, they will really have to follow my API.

[00:39:42]Speaker 4: It's not as straightforward as it could be with RDS, but if this is the case, we don't care to do that because we provide our knowledge graph through the API and use it within products. This is not a requirement, so in that sense for the scope of the company and LPG works fine. So it has to do a lot with the scope of interoperability that you want. And this is a factor where RDS is better than the property conceptually wise and wise. I don't have an opinion because I haven't really run any benchmarks, but I guess you have other people who are more expert in that they have compared, for example, how fast equity runs in a triple store and how fast it runs in a graph data base.

[00:40:39]Speaker 6: I have a rule of thumb here that I use for when to pick a triple store and when to pick an LPG and I would like to run it by you. What do you think about this rule of thumb? So the idea is if I need to define an ontology with the classes and properties have those few hundred concepts. Maybe I will also have few taxonomies to really get to populate it in a kind of controlled way or have a controlled vocabulary to share in the organization.

[00:41:16]Speaker 6: I would do that in RDS and I would store it in a triple store. Actually, I would store it in a repository from where I would push it to my triple store, so that's also readable as file. But then if I have a data graph of sort, let's say as I work in Ecommerce, all the products that are sold by it where you have many kinds of in the Ikea context, many kinds of chairs and things to sit on many kinds of tables, et cetera. And they're all repeating their variations of each other like color and size would push those individuals each product I would push into an LPG because then if I need to calculate graph similarity, I think LPG would be much more applicable for that.

[00:42:07]Speaker 6: I think those calculations and those different algorithms would work better. In LPG.

[00:42:16]Speaker 5: I use a very similar approach. So it's so funny to hear you say that because it validates what I do, which is great. I usually use a similar rule of some where I'm not using enterprise. It's a product. It's a search engine for customers that I'm working on. So my source of truth where all of the nitty gritty data and all that modeling and inferencing happens. That is what I used my RDS for. But then when I need to push it up into a data visualization where I just need to show how two concepts are related to one another in a user friendly way.

[00:42:55]Speaker 5: That's where I use my label property graph because it can do those quick calculations without having to go to root source of all the other data I have in my triple store just to add to that.

[00:43:06]Speaker 2: Because that's a point that Katarina also made previously. Actually, that's kind of changing. And it's very recent to me because like I said, I just finished doing this research on the capabilities of different databases. Rdf stores are nowadays also adding graph algorithms basically. So depending on the algorithm you're interested in, your RDF store may be able to give that to you out of the box.

[00:43:34]Speaker 3: Yeah, that's what also my experience, which is somehow limited. But we always see that this graph data in RDF is the foundation. So we agree on that. But then if you need to compute it, then you can create vectors out of that and work with that with these vectors, or even getting the data straight into a search index could be a way of making the data in the back end accessible without going through the property graphs. But that's our limited experience because of course we are mostly focused and of course publishing the data so that the search engine can see rather than using the data.

[00:44:22]Speaker 3: But when we have to use the data creating the vectors out of this data, it makes it very practical to calculate the similarity, the distance, or recommend one product from another.

[00:44:36]Speaker 5: That's another piece that looking at performance. The performance between triple store and a property graph used to be.

[00:44:48]Right.

[00:44:49]Speaker 5: But now I was doing the olden days now doing testing between Neoproge. I did one for Neo Fouge. I did one for Graft and StarDog. Graphibe and StarDog basically are almost the same performance, which is great. You don't necessarily have to pick because of performance anymore. You can pick for your use case.

[00:45:14]Speaker 3: Yeah. Also, the output of these systems now include the support for GraphQL, which makes it very easy to consume the data on the front end without going through too many steps, which also I find it practical. I believe that this distance between the two platforms, it's kind of narrowing down and AI model is being trained on top of the graph. It's becoming kind of the source of truth when you need to compute it rather than the property graph. But.

[00:45:54]Speaker 2: Great. So actually, we already kind of tiptoed, let's say, around knowledge graphs. And what is a knowledge graph? Well, somebody gave definition of sorts. Well, first of all, there's tons of definitions going around, some of them very elaborate, very good. But my personal favorite, let's say, is the simple one that I think somebody also served already so populated ontology. But again, I think we've kind of talked about it. So what is an Orange graph and so on? Andrea, you also touched on the last, let's say, subdomain of connected data.

[00:46:30]Speaker 2: So graph machine learning and graph AI, let's say. And I think you're probably definitely compared to me the most qualified person to talk about that subdomain here.

[00:46:45]Speaker 3: Yeah. So let me share again, my screen, because the only way that I can talk about stuff is by going through examples. I'm just going to share a little bit of another practical use case that we were working on in this context. So let me go grab the screen here. So here we have a three dimensional representation of an RDF graph that is behind our blog. So something very simple. And here we can see, for instance, the entity for connected data. Because, of course, we blog about connected data.

[00:47:46]Speaker 3: We can see that I can quickly compute the closest entities in this graph. And this is made possible because I have used TensorFlow to train a model that uses the statement in the RDF graph to learn what is truth and what is not. And also to provide this representation of the multi dimension of an entity like connected data. And here, of course, we can see that the entities that are related to the event within the context of our editorial plan and team. So this is one of the very basic example of how we can get the data into deep learning and then consume it by making inferences that don't even go to a query in the traditional sense.

[00:48:52]Speaker 3: But use machine learning for predicting new facts or discovering new stuff. And this is also what we're doing with the multi model search. Again, it's getting the data from the graph into vectors and then consuming the vectors for whatever application you have.

[00:49:13]Speaker 2: Right. That's one way. One additional fact, let's say, to point out about graph machine learning is, well, precisely the fact that traditional. Let's say machine learning mostly works with tables with mattresses and graph. Machine learning also makes use of the connection. So that's why. Yeah. Exactly why it's also called geometrical machine learning sometimes. So it's kind of three deal. Let's say if you can imagine it in such a way. So I've also had some people make the claim that because of that, you're basically able to model more information and more features.

[00:49:56]Speaker 2: And they say that, well, pretty soon the best machine learning models are going to be graph based one.

[00:50:02]Speaker 3: Well, in this example that I show you, we are training the model with the edges, so the relationship are playing the role, and that allows me to make a prediction based on the relationship, which it would be otherwise impossible if I wouldn't have the graph structure in the beginning. That's one of the advantage. The other advantage, of course, is the possibility of explaining why the model is reacting in a certain way because I can get back in the data and understand the semantics of the data, which, of course, from a flat table would be missed.

[00:50:40]Speaker 3: And this is also particularly helpful when we start to use this data set, for instance, generating text because I can get back to the semantics of the graph and verify the validity of the text because I do have the semantics which otherwise would be missing. So there is a lot of advantage in kind of moving into deep learning with the graph in the back end.

[00:51:08]Right.

[00:51:09]Speaker 6: And how would you say how well is the model learning the difference between semantic nuances of the different kinds of connections? Because normally I put in corresponding to two categories, you have structural connections, which is all about subclass of instances, part whole relationship, those kinds of things, you know, they kind of belong together. And then you have the associations. And sometimes an Association can be like a negative connotation, and sometimes an Association can be like a strengthening. How is that picked up?

[00:51:49]Speaker 3: So depending on the type of data, we want to make sure that the model is capable of predicting true facts. And the way in which the validation, of course, is that we will start to create a toxic statement to validate if the model is getting it right or not. And when doing so, we will tend to filter a proper relationship between two different entities rather than the property link between the attributes of the same entity, which are, of course less interesting in the use case that we're looking here where we want to recommend similar content or find topical hubs for a website to create pages.

[00:52:40]Speaker 3: One important aspect is focusing on a set of relationships that are bringing the meaning that it's helpful for the use case.

[00:52:49]Speaker 5: Yeah, I take that same approach. Andrea. I'm also working on recommendations and finding things within full text that is helpful for the user. And normally we're focusing on the positives or things that are going to answer a question. So we derive questions from our search logs, and then we find through full text mining what is going to answer that? And those are relationships in and of themselves, which our users find very helpful.

[00:53:19]Speaker 2: Okay, cool. So I think we've more than covered actually what connected data is about. And we even expanded to many different directions, and it's been very interesting. But wearing my moderator, Hugh, I think it's time to move on to the next part of the conversation, which is what we are actually going to see in Connected Data World this year. So for that to have, like, a shared program, let's say with everyone, I just Typed in the chat box, the schedule for the event, for which actually, this is the first part.

[00:53:58]Speaker 2: So we start today with this conversation and people attending can get a pretty good idea of how our workshops look like, because that's the main concept for workshop. We have a group of experts discussing a topic and taking questions from the audience. Somebody's moderating. It's pretty familiar, actually, it's something we all know. I guess tomorrow, however, we have the second part of our pre events, which is we're opening our virtual venue. So this platform or other platform, like other platforms like this, work pretty well for that type of setting.

[00:54:39]Speaker 2: However, you don't get the serendipity that you get in real life events through that. You need someone to operate it. And then best case, you get like, some breakout rooms. And again, somebody needs to take you and put you in that breakout room and then somebody needs to it's kind of clunky. Let's say so the best way we found to deal with that clunkiness is to have something that replicates the real world as close as possible. We're using a platform called Gathertown and a virtual center is called Connected Data World Center.

[00:55:16]Speaker 2: Not very originally, but quite on brand, let's say. And we're opening it up tomorrow. And so what we're going to do is that as some of you may recall last year, we also had, like, every year we have an event, but last year was also virtual. So we've taken all the recordings from the presentations from last year, and we're doing replace in the virtual center tomorrow. So it's going to be open. The program is going to be the same as it was last year. So we open at 08:45 a.m. Gmt, and you have the regular routine welcome address and a couple of keynotes and different tracks.

[00:55:57]Speaker 2: So you're going to be able to watch all the talks from last year. And we're also going to have some live sessions. One of them actually may be interesting to you because it's going to be about an effort one of our team members is doing to build labor knowledge graph. So he started out as a kind of project, let's say. And pretty soon he ran into a number of issues, like with Vocabularies and how to align them and how to populate them and some knowledge graph specific issues and some other issues that are project related.

[00:56:40]Speaker 2: Let's say how to manage contributions, where to go with that. So we thought, okay, let's have a discussion about that. So that's planned for 01:00 p.m.. Gmt tomorrow. That's going to be a live discussion in the virtual venue. And we also have another live part, which is at 08:00 P.m. Gmt. So for that, we're going to show how things are going to work in the actual event. So Besides, the prerecorded talks that you're going to be able to watch all day in the actual event on December 3 on presentations day.

[00:57:16]Speaker 2: Also, we're going to have the presenters in live attendance. That means that sure, we're going to show the presentations. But of course, after that, people will be there for Q and A. And you're also going to be able to do what's one of our favorite things to do in real life events. Follow chase down speakers to the lobby and hang around with other people and do all of that nice stuff. So for the preview of that, join the grand opening of our vitzel venue tomorrow. So with that said, we can move to what the program for the actual event looks like.

[00:57:55]Speaker 2: So it's December 1, December 2 and December 3 on December 1. We have a number of workshops. So some of you are involved in workshops. So I'm looking at, I guess, Catarina and Ashley Antivo, and you all have workshops that you are involved in. So the natural thing to do would be, I guess, to ask you to talk about what you know best, which is the workshops that you're involved in. So who wants to go first?

[00:58:32]Speaker 5: I suppose I can kick off. So me and Katerina and at least one other person are going to be talking about how to create community with folks that are in your organization or even outside of your organization, anybody that feels like they might be on an island. I think there's a lot of us, especially in Grass, since it's kind of like the oddball thing that people work on sometimes at an organization. So we're going to be talking about some of the things that different groups struggle with and how to find others that are like minded with yourself so that you can feel like you belong in a community of graph professionals.

[00:59:19]Speaker 2: Cool. Let me just be the pedantic guy here and say that well, yes, you are going to do it. And it's going to be great. And I definitely want to attend. But actually, that's going to be on December 3. So it's going to be hosted in the virtual venue on December 1. The setting will be like the one we have here. So live webinar platform and panel and all of that stuff. So maybe I'm going to ask Katarina to talk about her workshop with Michael Ashold and a number of other people.

[00:59:55]Speaker 6: Yes. And actually, I'm also really burning for the December 3 one as well. But we're also with Ashley on the December 1 one. So probably that's the kind of confusion and knowledge. And the topic is knowledge graph in the enterprise. What you need to know. And the idea is really what does it take to build an enterprise knowledge graph? What kind of roles does it entail the common question I get a lot is what kind of team do you build? Who should I hire if I want the knowledge graph, what next?

[01:00:36]Speaker 6: And it's drawing up on a lot of experience we have here great practitioners, also colleagues from Semantic Arts. Joining us, which is a company that consults in starting an enterprise knowledge. That experience with consultancy companies is they've seen a lot of cases. So I'm really looking forward to concrete examples, concrete problems and things that we couldn't think of when we were like just learning about RDS and Knowledge graph, but actually what it takes, the sweat and the tears it takes to build an enterprise, norshcroft great.

[01:01:22]Speaker 2: Yeah, I'm really looking forward to that. And as you said, the two of you are going to be sitting in that one is going to be Michael Asphalt from Samantha Karts. And you're also going to be joined by Michael Atkin, who was one of the founders of the Enterprise Knowledge Graph Foundation. And now he's pursuing some other interests with a new consulting company that he has. But he also had, like, a huge experience in helping enterprises building and knowledge graph.

[01:01:52]Speaker 6: So it's always a pleasure talking with Mike as well, because he's also found this way to structure the aspect of enterprise knowledge management, especially. I find that he kind of finds that wording and exactly what he's working with. And I think that's going to be one topic as well is if you want to build an enterprise knowledge graph, you have to get management on your board because this is not going to be a new fancy application that immediately gets you a lot of money. But it's going to be an investment in infrastructure.

[01:02:27]Speaker 6: And that nature then gives you certain challenges.

[01:02:36]Speaker 2: As previously mentioned in the Semantic interoperability part. While some of those things are really intangible, but they're actually at least as important, if not more important than the technological aspect. So going from knowledge graph to something a bit more unusual, and it's actually one of my favorite topics. We are sitting together in a workshop with Evo, and this one, as opposed to enterprise knowledge graphs, is about personal knowledge graphs. And it's a concept that perhaps not many people are as familiar with enterprise knowledge crafts, but we're starting to see that happening as well so far, maybe not in such an organized or widespread way, but at least I'm an enthusiast, and no Evo is one as well.

[01:03:30]Speaker 2: So we're starting that technology. We're starting to see the technology being adopted by not just companies and organizations, but also individuals. And that's a kind of natural consequence, let's say, coming from the fact that the amount of information that we all have to deal with now is multiplied to a level that if you don't apply some kind of method and some kind of technological aid, let's say to that you can very soon lose control. But I'll just let you say a few words about that because I know that you are one of the leaders in this area, let's say and you're doing some very advanced things.

[01:04:16]Speaker 7: Yeah. Well, basically my favorite topic is actually enterprise. No, I'll be very much looking forward to this event that will be there. And I could say that there was not much talk about enterprise knowledge graph in the last few years. There's a lot. But previously there was almost none. And I'm happy that this is happening more and more. That is not the case with personal knowledge graphs, which is another very dear to me topic and to practice. I'm doing that for quite a while, and I think it's following up.

[01:04:55]Speaker 7: Then there was a workshop at the Knowledge Graph conference. There was not a workshop academic one organized by Perimeter and a few other people. She would also be in that panel on the 1 December, and well, things are coming from different directions. The addition to personal knowledge, from the regulatory perspective, from the work on solid and similar decentralized initiatives. And somehow the whole criminal situation accelerated this very much in terms of tools for productivity and for research that manage data as personal knowledge. Perhaps there was an eruption of those tools.

[01:05:45]Speaker 7: I think there were none of you just three, four years ago, and now they're over 50 and new ones are popping out all the time, and they're also evolving from really personal knowledge graphs to collaborative knowledge graphs. So that's a very interesting area. I personally believe also that for enterprises, there are so many people that when they need to have a decision on something related to enterprise knowledge graphs as an approach to integrating different things, they cannot do that because they have no daily experience with knowledge graphs.

[01:06:26]Speaker 7: And if they start managing their personal information as knowledge graphs, I think they would look differently also at enterprise knowledge graphs when they see what they can do with different styles that they have in their world, documents, email BOOKMARKS and highlights and everything. So we are going to have this talk with Jerry Mikowski Morrison and Jeff Tang on 1 December. There are no actually talks apart from this panel in this conference, and I hope in the next editions it will start to grow and be as visible as enterprise knowledge.

[01:07:09]Speaker 2: Yeah. It's an emerging area and one that I would say in some ways has its roots, as you also said, like personal productivity tools or not taking tools and so on. So for a long time, people have been using the traditional, let's say, the legacy systems in that area, and they're starting many of them, at least, are starting to realize that they're kind of restrictive in some way. So as you said, there's like many new tools that are popping up and some of them, to some extent, at least, are trying to support the graph element behind that and semantics, at least to some extent and so on.

[01:07:52]Speaker 2: It's an emerging area. And yeah, I think we have some of the best people around to highlight it. And just so before we go, there's actually a question from someone again in the chat who's asking, what's the difference between personal knowledge graph and an enterprise knowledge graph? And I'll venture my own answer and feel free to jump in as you see fit as well. I would say that you actually kind of touched upon it in your previous in what you said previously. So a personal nodes graph is something that you organize for your own needs.

[01:08:30]Speaker 2: Basically, you may or may not have specific vocabulary. You choose the tools. There's not much of an interoperability issue for that reason. For an enterprise knowledge graphs, there's different ways to arrive at enterprise knowledge graphs, and I think Katarina and Nathalie also kind of touched upon it previously. So usually it's not like a dictatorship top down thing. You have to compromise the needs of many departments, many people and many use cases as well. So actually coming from a personal knowledge graph and trying to bring them altogether may be one way to arrive at enterprise knowledge graphs.

[01:09:12]Speaker 7: Yeah, well, enterprise knowledge graphs. They appear in Ashley. Correct me if I'm wrong in two situations. One is you would like to integrate data from Ketogenic sources, and they could be SQL databases, text and XML and JSON files, and you have sort of a semantic layer that is enterprise knowledge graph. And the second is in enterprise settings, you have a task that which other data approaches are not so good. That's why you go and do it from scratch with enterprise knowledge graph. I think these are the two main big use cases.

[01:09:54]Speaker 7: They're actually packages of use cases. Not just doing that would also change the paradigm of how we do it. Normally, the it is you have requirements, historical requirements with which your session answers. Well, knowledge graphs. They open for the unknown case for something that might change, and their flexibility is what drives their adoption.

[01:10:25]Speaker 2: Right. So again, quick reply to a quick question from someone with Obsidian. Yes, absolutely. It is one of those tools.

[01:10:38]Speaker 7: The interesting thing about tools like Obsidian is they are in a way also data centric, because in fact, you can have your data, you can just start using something you can continue with lots of or dandruff other similar tools that would work with the same kind of thing. And I think that's an important trend to decouple the data from application that is particularly visible in the personal where personal data is personal, and you don't want to really expose it.

[01:11:19]Speaker 6: If I may be a little bit philosophical. Also, in this question about enterprise and then personal knowledge craft, if you look at how work life is changing, it is changing so slowly from it's not the tailorish hierarchical society anymore, or it all depends on the company. But at least knowledge intensive work. It's already looser. And there's even, like, talks about flat Hurricane. There's everything in between what it actually is. But the future of work is a network of most likely self employed freelancers exchanging their work for companies that's already in any HR talk.

[01:12:03]Speaker 6: You're sitting in the enterprise. They're talking about this future, that it's coming in, that people don't really want work contracts anymore. They rather want to work from their own company to another bigger company. And that kind of exchange. So if we look into the future, this is need to be like, what is happening is that an enterprise's knowhow might actually get scattered across personal knowledge graphs like that network of those personal knowledge graphs might be actually forming the enterprise knowledge graph. Why am I saying this?

[01:12:38]Speaker 6: Even when I say that kind of feels weird. But I think that's where it's going, because every time I build an enterprise knowledge graph, HR comes knocking and say they want something silver, they want to capture the skills and know how they want. People in the company need to know who they need to contact and who to talk to. So that need of a network of personal knowledge path already exists in an enterprise. And I think that trend is just going to become stronger.

[01:13:08]Speaker 5: Yeah, NASA has one. David Mesa is somebody probably all have come across his stuff. He's very well known in the knowledge graph space that is his whole job at NASA is creating that and he uses for J for it. But that doesn't mean that's the personal knowledge graph tool of choice. But yeah, I think that even apart from the things that Katerina was talking about, if you are working in an organization where you have a lot of development teams and you need to bring a new team together to develop something.

[01:13:43]Speaker 5: Having a personal address sort of like this where you can draw that you didn't realize according to what somebody's job title is is incredibly important so that you can use the people you have and not have to go in contract always for people.

[01:13:58]Speaker 3: I like to love this idea. So what is the personal knowledge graph tool that I need to start using? Evo or George? I like this kind of vision that Catarina is projected of this blend between the enterprise and the personal is kind of the evolution of the workforce. It's kind of demanding these blends to happen. And also the first question is very practical. And what tools should I use? And then the second one is not philosophical. Again, technologically. Where does solid stands in between? Is there a connection with Solid when we talk about personal knowledge graph?

[01:14:48]Speaker 3: Because I'm very interested.

[01:14:51]Speaker 5: I will say one thing on Solid because it bothers me is I have a beef with it. It is not easy for anyone that doesn't do graph to understand it, which I think is a huge blocker. So if anyone knows people working on it. Please help them with that.

[01:15:05]Speaker 3: We do know some people.

[01:15:09]Speaker 2: I was going to say something similar as well. So yeah, in theory, solid is very relevant to this topic. In practice, however, it's not very easy to use. I mean, I confess for me it's not very easy to use. I don't know. Maybe it is for others, but not for me. So I'll give you my personal take on what I use basically right. I'm an Observian user and fun. I like the fact that it has what I call the least common denominator. So it uses markup. However, there's a downside of that.

[01:15:43]Speaker 2: So with Markup, you don't have semantics and that's a big pain. So you also don't have interoperability with other tools because while other tools are using markdown as well, but it has its own flavor. So you can just easily switch.

[01:16:01]Speaker 3: Can I export the data into my WordLift RDS anywhere? How do I do this thing that Katharina is telling me to do that I want to bring the data of the people into the.

[01:16:18]Speaker 2: With Obseyvian. I'm not sure if for how you can do it.

[01:16:21]Speaker 3: There's other tools that I watched a video of Evo a while ago. I was kind of going through that. What was that evil?

[01:16:32]Speaker 7: Well, first, let me go step back to mention something important from the technological perspective that the personal tools are bringing, and because we talk technology was that we have RDF and LPG and from over 50 personal knowledge graph or just having something to do with that tools. I know only one that is using in some way RDF and only one that is using in some way LPG. The rest are doing graphs in different in novel ways, which I find exciting. So this OPG and RDF is breaking slowly then in terms of usage, it really depends what you would like to achieve.

[01:17:27]Speaker 7: I personally use several tools. Currently, I can explain why I use several tools. I'm an early adopter of Rome research, which is quite powerful.

[01:17:42]Speaker 3: I love the approach and the interface of that. That was one of my options. Yes.

[01:17:50]Speaker 7: So what it has nice is, apart from the standard query language, there is also this data lock possibility for querying, which is extremely powerful. And then there are a lot of really nice features and a lot of extensions that you can do wonderful things, but there's a lot of things that are disturbing. One of it is the security. The other is scalability. Big graphs become a bit slow because everything goes to the client memory and maybe the way it's managed something. So in the third, maybe dimension is that publishing is not so easy.

[01:18:43]Speaker 7: It's additional things to do just publishing from your graph, which I find important. And maybe the last drawback is that it's not really data centric. I would prefer that I can read my graph with any other two, not to be locked in in one application. And for that for all these other use cases, I use other tools like locksik. And there is another tool again written Enclosure Athens, which promises serious collaborative features to bring. And also it promises to bring semantics and some kind of RDF based features they started partnering.

[01:19:27]Speaker 7: I think with Flurry, then that would bring also internal scalability. They promised interesting things because they would like to spread the load between client server when things scale up.

[01:19:42]Speaker 3: So if I understand correct, we don't have yet collaborative support.

[01:19:53]Speaker 2: This is the direction Athens Research wants to take, and actually their specific take is from personal knowledge graphs to collaborative knowledge graphs.

[01:20:03]Okay.

[01:20:04]Speaker 3: Yeah. Because that's pretty important aspect. And then the other one is that everyone agrees that we need solid in between. But everyone agrees that it's not usable at the moment, because that's exactly a use case for solid. Moving the data back and forth from kind of the private cloud into a shared cloud would be a perfect scenario for that. Tim Bernersley, George.

[01:20:34]Speaker 7: Now to be fair, Rome research is already collaborative for over a year, but not in a way that it can really support various cases of cooperation.

[01:20:48]Speaker 2: Okay. So yeah. Apparently personal knowledge graphs, at least for this group. Okay. So before we move to the master classes, in which I'm going to ask palace to talk a little bit about his master class. Andrea, would you like to say a few words about your workshop because you also have one.

[01:21:09]Speaker 3: Right. So our workshop is going to be about building a multi model search for an ecommerce site. The use case is as simple as a product knowledge graph from a small fashion ecommerce site, and how these graph data can be fed into a multimodel search, which means basically a search engine that is capable of dealing with both images and texts at the same time in various form. And we are going to work with Gina. That is an open source framework for mural search that we've been partnering with.

[01:21:59]Speaker 3: And so we're going to kind of dive deep. It's an end on very practical approach to see what can be done and what are the challenges and how we're trying to get it to the end of users. We are targeting next year to be ready for WooCommerce ecommerce store. So we're going to start with a long tail, but of course we have other use cases, so that's going to be the workshop.

[01:22:30]Speaker 2: Okay. Cool. Let me just quickly mention the other workshops that we have on the 1 December, so we have a few actually, we have one on graph machine learning, which we briefly mentioned earlier. We have some of the top experts on that, actually. So that's going to be a good one. We also mentioned reasoning. So we also have a works upon reasoning the momentum behind semantic reasoning. And we have one of our other panel members Tara Rafa moderating. That one. We also have one on the connection between connected data and sustainability, and that one is organized and moderated by my co founder in Connected Data World, James Fair, who works in sustainability and also uses connected data technology.

[01:23:17]Speaker 2: So he's the right person to talk about that. And he also has some people with deep expertise and experience in that something a bit more exotic. We also have a workshop on hardware, actually novel AI hardware architectures for graph processing. So since in the last few years, we are seeing a number of innovative hardware designs popping up, many of them have one thing in common. They're actually built in a way around the notion of processing graphs. So we have a few of those innovators gathered together and one of my colleagues on Cdnett moderating, that Tiernan Ray, who's actually one of the experts in AI hardware.

[01:24:09]Speaker 2: So that's going to be one to keep an eye on. Andrea already spoke about his workshop, and we also talked about the workshop on enterprise knowledge graphs. We have one different angle. So investing in connected data, that's something that I don't think I've seen many people talk about. So this whole technological area is kind of exploding. And there are many reports that are stating that the market is growing, and I guess this is something that intuitively. All of us can attest to like, there's more job openings, more use and more investment.

[01:24:47]Speaker 2: So we have a few people in that workshop with venture capital background, some people who work and manage venture capitals, and they're going to talk about the opportunities that they see some interesting startups to keep an eye on some criteria, how to define the market. And we're also going to have someone who's on the other side of things. So Bob Fanlaud, who's founder in startup that actually just got funding recently, and he also uses those technologies. So that's going to be an interesting discussion from both ends of the spectrum.

[01:25:24]Speaker 2: We also have a few master classes on day one. But before we talk about those, let's talk about your market class and just to give you an intro about the idea of market classes. So as opposed to workshops where you have a number of people and we have a discussion pretty much like we're doing here, master classes are strictly hands on, basically. So you have usually one instructor, sometimes two, but typically one. And the idea in master class is that this person gets to teach the attendees hands on skills that they can use in their day to day job.

[01:26:04]Speaker 2: So Panos, let's hear what you are going to teach people.

[01:26:13]Speaker 4: My master class will be about using natural language processing techniques in order to build knowledge graphs in order to with a knowledge and knowledge graph from abstract resources, typically from text. It's a quite challenging task, but it's really important because we really want to scale the construction of all these graphs. We cannot rely merely on manual work. On the other hand, it's not enough to say, yeah, let's use some information extraction tools or train any kind of machine learning, and it will work magically. So it needs a proper methodology and to pay attention in particular pitfalls that might include in the process.

[01:27:05]Speaker 4: So the master class will consist of two parts of 2 hours each. The first part will start from how we prepare for mining knowledge graphs, and by preparation means defining a very basic schema that will guide us into how we're going to do the mining, selecting our sources, from where we're going to get the data, selecting our tools and things like that. And we will move forward to deeper things, like how we extract entities from text that will populate our classes and our concepts. We extract synonyms and hyponniums in order to be able to create taxonomies and to group strings into things as groups like to say, and then on going to more specific relations associated relations depending on the domain.

[01:28:04]Speaker 4: We're going to be using open source tools space. It will be one of them and other frameworks like I have in mind, like TensorFlow flow or some basic. We're going to be using a lot of pretrained state of yard language models like birth and.

[01:28:30]Speaker 2: Nice.

[01:28:34]Speaker 3: Wonderful. That pardon. I want to follow that.

[01:28:38]Speaker 4: Of course. You're welcome.

[01:28:42]Speaker 7: Sounds great.

[01:28:45]Speaker 2: So yeah. Actually, I think is one of the more advanced, let's say, master classes that we have. We also have some mothers that are more entry level in a way. So we have one from Heather Hedden, who's one of the very well known taxonomies and knowledge graph experts. So she's going to be talking. She's going to be teaching actually precisely about that. So taxonomy as a foundation for building an Orange graph. So her master class is also in two parts. So it's going to be quite extended.

[01:29:22]Speaker 2: We have again another one, which is kind of entry level. It's about technology that I think was only briefly mentioned once so far in our discussion. So GraphQL, we have some people from Apollo GraphQL, which is one of the leading open source vendors in this space. So he's going to be talking about how to use, how to use GraphQL schema and how to model things that way. It's an interesting variation from the kind of things we usually see in this space. Let's see what else. We also have workshop about graph powered machine learning.

[01:30:04]Speaker 2: So that's going to be given by Jerksad from Orange, who is again, quite the expert on that. We have another one again on graph machine learning from Brucen, who is going to be talking about how to use graph MLOps, as she called it. So again, an interesting one. We have a returning guest. So last year, one of our keynote was Kyle Benji, who talked about data. And that was a pretty impressive keynote. And one of the favorites from last year. So he's returning this year. This time, instead of giving a keynote, he's going to be delivering masterclass, and he's going to be sharing the secrets of his trade, basically.

[01:30:55]Speaker 2: So he's going to show how you can use different open source libraries to create data. So that's going to be a really good one. We also talked about Panels workshop. We have another one from Dbpdia. So the people from Dbpdia are going to be giving I know that they do many tutorials and master classes around the year, but we've been in touch with them. And this one is going to be a bit different because most of the tutorials that I've seen so far are kind of more advanced.

[01:31:34]Speaker 2: Let's say so we thought let's make a more beginner friendly one. So they're going to start really from the basics like, okay, what is DBpedia? How can you use it? How do you get started with it? And then they're going to be progressing to a bit more advanced topics. And I think that's going to be a very good one because I know many people who have heard of TV and they'd like to use it, but they don't know how to get started. So this is the right master class for them.

[01:32:08]Speaker 2: So, yeah, lots of variety to choose from. And let's forget the last one, which is a kind of more specialized topic. So if you're into drug discovery, basically, there's a master class for you as well. There's a library called some people have to sign off. So, Ivo, thanks for.

[01:32:33]Speaker 1: We hope you enjoyed the podcast to get more of our own material and to keep up with the latest industry and research news from our domain, we invite you to connect with us. Connected Data London has an omnichannel presence. Besides all major podcast platforms, YouTube and Slideshow, you can find us in Twitter, LinkedIn, Facebook, and Instagram. You can join our meetups or you can keep up with our news and special offers by joining our mailing list.

 

CDW Podcast - S02 E10 | Connected Data World 2021 Program Roundtable

29 November 2021
Connecting Data, People and Ideas since 2016.