Connecting Data, People and Ideas since 2016.
01 March 2021

Transforming the financial media industry: How Dow Jones is reimagining the news as a Knowledge Graph


AI is transforming the financial media industry – impacting everything from content creation to consumption trends. Clancy Childs, former general manager of Dow Jones knowledge enablement unit, shares insights into how Dow Jones is reimagining what the news looks like.

 

Organizations like Dow Jones have loads of data. The challenge is how to leverage that data. How to find all the different pieces of the puzzle to build data and analytics applications.

When Childs started working on some of the projects he was involved in, the goal never was to create a Knowledge Graph. The goal was actually to solve some very important problems and challenges for Dow Jones’ customers.

The bottomline is, you don’t need to reinvent the wheel all the time. You don’t need to make new copies of data in new systems and exacerbate the problem. Building on top of the flexibility provided by knowledge graphs is how to make this work. Then, you can extrapolate from technology decisions to some of the other decisions organizations need to make.

 
 

In the latest episode of the Connected Data Podcast, Childs shares how Dow Jones’ knowledge graph platform, powered by Stardog, enables the company to unify structured and unstructured data from a vast range of news sources and deliver cutting-edge insights for customers and partners globally.

 

Childs led Dow Jones’ professional information product suite as the Chief Data Officer, Chief Product and Technology Officer. Most recently, Childs served as General Manager of an innovation business unit, created to further push the applications of natural language processing and knowledge graph technology within the company.

 

Building on the wealth of leading data and product experience in senior roles at both Dow Jones and Google, Childs recently joined Signal AI, one of the leading companies transforming how business leaders make decisions, as Chief Product Officer.

 

Childs now leads the development of Signal AI’s product suite vision as the company expands into new markets and applies its pioneering technology to further customer use-cases as part of its mission to transform business decision making through trainable AI.

 

If you are more of a visual type, you can also watch the presentation that Clancy Childs gave together with Mike Grove, Stardog co-founder and VP

Engineering. Transcript below.

 

 

 

— Slides available here —

 

— Talk Transcript —

 

Mike Grove

 

OK, everyone ready to start. We are a data unification platform We use Knowledge graphs. We spawn out of Universal Marilyns and Computation Sciences Lab farther than I care to admit. And we’re based in the D.C. area for small, metro funded start-up, trying to tackle the world’s data problems. There you go. It’s a great quote from an excellent review article, that we hear a lot “AI-ML will save the world.

 

But you’re kind of hidden under the covers. There’s I think the key point is this really data-source of all the outcomes that we could get out of using, those technologies are really driven by the data. We’ve got lots of data and we’re creating more of it every day. So I read recently that we’re creating two megabytes of data per person per second every day. So that’s awesome. It’s a data race and we’ve got more data than we could ever want.

 

The problem is that it’s stored like this and stored all over the place and every technology in every format moving in every velocity. So this is forgotten; the big data variety. Right. So the challenge there is how do we leverage that data? How do we find all the different pieces of the puzzle to build the data and analytics application? The Customer 360; the 360 is hidden in here somewhere. So putting that together, create that 360 degree view. And this is hard. Requires you to work you do and you change the title of the talk.

 

This is where I was going to say it’s really hard to imagine. Have to reimagine what your infrastructure looks like. So we think it’s built around three principles flexibility, reusability and independence. The flexibility to handle that landscape and all the different places and ways organizations store data. Reusability, stop making copies of data, don’t proliferate that scenario, be able to build on the first case to help you kind of bootstrap the second and the third and the fourth and continue to move forward.

 

Don’t reinvent the wheel all the time. Don’t make new copies of data and new systems and kind of exacerbate the problem. And then independence that we take on top of the graph that you talk about the power of graphs, but use kind of the flexibility that’s provided by graph. But use uses an abstraction. Graph isn’t necessarily the point. It is going to have to make all of this work. We want to abstract away from all these different technology decisions because some of the decisions those are so is a very good system for very specific use cases.

 

Go ahead and use them to free yourself, make yourself independent from those choices so you can bubble up that information to the people who need it most. And that’s that, I would say, to hear about how how we do this in Stardog. The graph here. This is what the world of the enterprise, the use of the applications, application developers, data scientists, they see a nice, coherent picture of the world, nice clean graph that’s comprised of data from all over the organization.

 

None of this actually matters how it stored, what it looks like, how fast is moving. It doesn’t matter to the data scientists and the analysts who are just trying to answer a business question. Business questions don’t care about that mess. They just need an answer. So we think the way we approach building knowledge graph to do data, unification in the enterprise really enable to push forward modern day driven organizations. This is the right way, how it’s done with that i am gonna turn over to Clancy and talk about how they are using this technology down to reimagine news.

 

Clancy JonesThanks a lot for having me. So I’ll start by saying that kind of suffering up here is a bit of like I feel like an imposter syndrome. Fraudulent in the sense that I know a lot of you know a lot more about knowledge grahps that I’ve probably even learned so far. And that’s because at Dow Jones, when I started working on some of the stuff that I’ll go through, our goal is never to create a knowledge graph was actually to solve some very important problems and challenges for our customers.

 

And no surprise to you all through a lot of those problems and a lot of the sort of opportunities we saw can certainly be enabled by reimagining our news and our data and much more of a graph connected way. So a brief overview of Dow Jones. Dow Jones. Yes, most people what has gone down, as they mentioned, the Dow Jones Industrial Average, we have absolutely nothing to do with the Dow Jones Industrial Average. We actually sold that business several years ago.

But our core business is actually a leader of the Wall Street Journal. We are the publishers of The Wall Street Journal. And Dow Jones has been around for about one hundred thirty years before the Wall Street Journal. We actually were working in Dow Jones Newswires. So this is one we’re actually delivering ticker tapes to on Wall Street, to the banks and the traders. What news was happening at that moment? We also have other publications, Barron’s here in London and also have a financial news.

Market watchers believe the largest trafficked financial news Web site in the world, that right is one of our properties. And then we also have a very, very successful data business, structured data. So one of those products is Factiva. Does anybody here do you use perspective? So Factiva is a news archive, covers up to 50 years worth of unstructured news text from thirty thousand different premium news sources. So all the news that is generally behind paywalls and in print publications, we have that in the Factiva database.

 

We also make that available. The fact a search engine, but also through a platform called DNA data, news and analytics. It’s sort of the same news, the same content, but made available in a way that works better for data scientists, people who want to work with or large archive extractions and streams of all of this content. We also had some of our structure news from our structured data products like our risk and compliance product, basically tables of people.

You probably shouldn’t be doing business with another sort of secretary things, for instance. So one of the things that I think this is probably a common thread that many of you have dealt with, either yourselves personally or through your clients, is that we have all these products. But traditionally they do live in cycles and we have a lot of great knowledge, but it’s sort of packed away in different products and retesting results. So we have a little bit of a look at sort of what are the modern challenges that we have to consider?

 

What are the things that are our customers are really trying to do with the knowledge that we provide? I’m one of those, as I mentioned, is that all of this information is very valuable, is dispersed among these multiple products. So we will have, for instance, an article in The Wall Street Journal talking about let’s say we work, and then we also will have a a record of how we work in one of our other data sets. And then we will probably have another article in another publication or in fact, about the work.

 

If you want to get the whole view of what Dow Jones knows collectively about that and you actually have to jump in and out of these multiple products. So I think it’s a very common problem that many companies are dealing with. We also have a very challenging set of expectations from our customers, of the consumers who read our publications and the professionals in businesses who consume our news content, our data. And that’s so one of the primary challenges that there’s this expectation of news being more contextual, that people don’t really these days expect to go and read the entire Wall Street Journal cover to cover and get out and figure out what they can actually do with that information in there.

 

We’re generally creating more and more content, as Mike mentioned, across the board, but the ability to consume that content is not increasing in any means. And so customers are increasingly saying, how do I actually get more of a personalized view of what actually matters to me in this massive flow of information and knowledge? And then increasingly, this is definitely no surprise to anybody in this room. Our news is not just being consumed directly by humans, but also increasingly by machines, by by the algorithms that are taking either some sort of action on them or helping the human at the end of that that change to actually make better decisions.

 

So making sure that our views and our content is better able to work for both the humans who rely on it, as well as the workflow systems and the tools, it is incredibly important. So when we’re looking at some of these challenges and these are just three of the many that we’ve identified that we were focusing on in different respects, we we started working through what is actually some of the ways we need to structure and deliver the news in the future for our customers.

 

So in a. So I’m going to go into it one specific pain point and then set up a bunch of these, but in this case, it’s at this point that we’ve heard over and over again.

 

And it’s not I don’t think very I don’t think it’s particularly unique, but it’s tell me what I need to know so that I can take appropriate action. And here’s some examples of some customers, actual customer cases that we’ve worked on. There’s an energy company that wants to know what regulations that are surfacing in the news might affect specific customers so that they can actually perform some sort of outreach to them to say, hey, this particular energy regulation may require you to take some sort of action that may involve a renegotiation of your energy contract, for instance, or a sales team.

 

And this is the case. I’m going to go a little bit deeper into what they want to know, for instance, which of their customers are planning a specific product launch so that they can actually get ready to work with them on scheduling the media. And one project that we are working on, as well as rights, sort bankruptcy and restructure. So when knowing when a company is in distress, bankruptcy professionals will want to know who the relevant parties are in that bankruptcy so that they can identify how they might get involved in that as well.

 

But so these are generally the points that we hear from customers over and over again. I need to figure out what I’m supposed to take action on. But traditionally, a company like Dow Jones, which comes from a very long history of journalism, our answer is, well, let’s create more unstructured text. Let’s create more articles. And that’s where we run into a bit of a problem because we’re better customers say I don’t want to read more articles.

 

Actually, this is a quote from a customer. We were originally at one point that she made a customer about a newsletter feature and we said, well, we can take every time one of your customers shows up in the news, we can deliver that to you in the newsletter. When you do some digesting, do some of the human curation. And it said, please don’t do that. Are our competitors not literate? They just don’t want to read more news.

 

It’s just not it’s not part of what and when anybody is expecting from us these days. So the challenge that we’ve been tapping into is around this idea of how can we actually share relevant knowledge with our customers without actually just producing more news. And that’s where, as we’ve been investigating a lot of these ideas, the developments that we’ve been doing with Stano and building our Knowledge Graph have become incredibly useful. So I’ll go into a little bit and I’m not going to go as deeply into some of the stuff that we are working on.

 

But it is kind of obvious that every news story so this is this was a little bit out of date, but the Trump administration, me sanctions Friday, blah, blah, blah, there’s all this information going on. It kind of actually hard to pass, even though it’s not that long. It’s a lot of text that is explaining some really interesting facts about older Kozko and about these sanctions that are happening now. This is our traditional way of delivering knowledge of breaking news is as these blogs of text.

 

But really for the sort of the anthologist in the room, there’s a way that we can take all of this information and actually structure it as a graph. And one of the points you can sort of see here, and this is this is a mock up. This is this is not from our actual graph, but we have the idea of like, well, there’s a news event. And that news event actually then involves certain people. It has certain things like timestamps.

 

There’s also not shown here. We actually have articles that refer to that news event. So this is a big paradigm shift for us of moving away from just doing general Netdata, tagging on documents, actually saying this document is about this news event that’s occurred. And here’s all the structured information about this. So everything that was in that article that we just looked at can be Grauwe and tied to this specific event that occurred, which was a U.S. sanctions event on specific allies of one network.

 

Now what once we have this data, when we put this data into sort of a graph ontologically of of the event, there’s some really useful things we can then do for our customers. So, for instance, if you care, let’s say about this in plus group, whether they are a supplier of yours or whether they’re a customer of yours or whether they’re somehow in your portfolio or you have some sort of stake in it, and plus whether you care about that company.

 

Well. This particular news event, unless you read that article and saw the end plus group mentioned in it, you might not actually know that this event somehow made happy. And actually, it’s just sort of looking at that article even mentions the imposter, because, in fact, many times editorial decisions made in the journalism process, in the news process, where certain facts just don’t make it into the article at all. But actually that news event is important to you.

So if we have that news event actually graphic in this way, what we’re able to do is actually infer that if you care about your group at this news event, it affects the electorate has gotten older and Houska is actually part owner of it. And plus, so this is allows us to start doing some really interesting things around, inferring the impact of news on our customers and actually provides new sort of layer of personalization effectively for free that otherwise might not be possible.

 

We don’t have this sort of structure Kubernetes it also allows us to do things like taking from that previous articles, discovering new facts that might not have otherwise been known. So in this case, it mentions that N plus group has a stake in Ruza. Now that persisting that fact on our graph is valuable for later research among our research teams, one of our journalists, but also for our customers. So there’s sort of two angles that we’re approaching here, which is determining the impact of a news event and how it affects customers, but also extracting facts for later, persisting them or making them available for other needs.

 

So we’re doing this right now, which we we’ve kicked off this. This is a very big project. We’re focusing on specific use cases as we build out on the events that we’re modeling, the news that we’re putting on. And I want to quickly sort of explain what we’re doing. So I previously was the chief technology product officer for our professional information business, the the business that oversees Factiva Newswires. All of our structured data are not consumer publications effectively.

 

Now, I’m working on this internal project where we refer to it as pulser. We are we haven’t had the marketing people come up with a real name yet. So bear with us. And the idea is that we have built a Knowledge Graph platform based on STATO that allows us to do ingestion of facts from all the news that we have access to, as well as connecting it to information that our customers are to provide us. So in a sort of a virtual Grauwe environment where a customer can provide us information like details in their CRM details, their ERP details in their portfolios, content contracts or contacts, and make it so that we can identify those items on our graph and then identify what news events may directly or indirectly affect.

 

Those features are all what we are going to be servicing and they’re going to be serviced in certain products that we have across the Dow Jones portfolio. So this is a bit of an example of something that we’re actively doing. Which is having our new singles as a service, and I think we’ll give a little bit more context to how this works. So I am a customer using sales force or using a marketing automation tool like a Marquito Eloqua. Part of what they say is, well, we have all these companies that we kind of care about out there.

 

We’re an energy company. They want to know any of these companies are doing things that may impact their interest in acquiring a new energy contract, for instance. So they’ve given us a whole bunch of details about their CRM and they basically say, here’s here’s either leads or existing customers and here’s our CRM identifiers and we can just add that into our company mapping API. We take that information and we basically map it and say, OK, there’s a company called Blakroc.

 

Here’s an addresses. We wish everybody used Dawn’s or allies or identifiers, but we have to sort of design for the case up here. Some names and poorly formatted addresses identify those companies on our graph and then we basically are able to connect the customer’s CRM identifier. So this is a major change for us because traditionally we’ve sort of had to have our customers. We put the onus on our customers to map their companies and their data to our identifiers, whereas now we’re able to say, actually, that should be our job.

 

Let us talk on the same symbology that you use to identify your customers. And then as we take these news events, as they actually get ingested, we have a ontology for each type of event that our customers have asked us and we have designed and built with us. And this is an example of like a article that we see. It comes through that Kevo that comes through DNA and it mentions that Blakroc is putting a new innovation center in Atlanta.

 

We run it through a whole bunch of MLP tools that I barely understand. It’s OK. And they generate this event and populate that event on the graph. So we know that this is an event type of facility opening. There’s a city this is tied into a geographic graph. So we know that there’s this concept of Atlanta is in Georgia, which is in the southeastern United States, all that. And then we also know that it’s affecting this company.

 

And this company is a company that we have names. We have done those numbers. We have ticker symbols. We have all the data that Dow Jones traditionally has around that customer. But we also now know in this virtual graph that one of our customers refers to this and by this CRM identifier. And we don’t know that it’s linked as a company interest to that customer. So it becomes very easy for us to take this fact for of putting it on to and through, in some cases we just deliver it as a spreadsheet and format it as on our guides and push it into our customers database.

 

They just this information, they’re able to do things with very little. First of all, put it into things like Marquito, part of their lead scoring system, their marketing automation tools. They’re able to say, well, wait a second, we know this first party data about this league, but now we know that this fact in the news is something that might be interest because they’re opening a new facility in one of our regions. That means that actually our marketing qualified leads score should go up.

 

We should get a salesperson to actually take action on this news. We also do this for the long term for some customers on things like preventing customer churn and debt, identifying potential opportunities for up sell. So a lot of times, for instance, energy companies, if you are moving or have a if your company is launching a new product, especially if you’re an industrial as a reason for them to pick up the phone and renegotiate some of your energy contracts.

 

So if we get this stuff out, it actually means that our customers are then able to say, OK, this is an action we can take, push it into that. That means that all the stuff is actually within these workflows is remove this whole step that previously occurred, which was having been expected to read all of the news, identifying these events that are important to them, figuring out how they’re important to them and what action to take. We’ve basically reduced a large part of that process so that their salespeople can focus on the important task, which is taking action.

 

So that’s basically a brief overview of some of the work that we’re doing, thanks to the technology provided by StarDog and some of the work that we’re doing across all of our tools. Happy to take any questions.

 

— End of Talk transcript —

 

 

 

Connected Data World 2021  All Rights Reserved.


Connected Data is a trading name of Neural Alpha LTD.

Edinburgh House - 170 Kennington Lane
Lambeth, London - SE11 5DP