27 May 2021

Graph-Based Data Science Masterclass

A talk by Paco Nathan

Managing Partner, Derwen, Inc.

Slides available here

Python has excellent libraries for working with graphs which provide: semantic technologies, graph queries, interactive visualizations, graph algorithms, probabilistic graph inference, as well as embedding and other integrations with deep learning.

However, almost none of these have integration paths other than writing lots of custom code, and most do not share common file formats. Moreover, few of these libraries integrate effectively with popular data science tools (e.g., pandas, scikit-learn, PyTorch, spaCy, etc.) or with popular infrastructure for scale-out (Apache Spark, Ray, RAPIDS, Apache Parquet, fsspec, etc.) on cloud computing.

This workshop uses kglab – an open source project that integrates RDFlib, OWL-RL, pySHACL, NetworkX, iGraph, pslpython, node2vec, PyVis, and more – to show how to use a wide range of graph-based approaches, blending smoothly into data science workflows, and working efficiently with popular data engineering practices.

The material emphasizes hands-on coding examples which you can reuse; best practices for integrating and leveraging other useful libraries; history and bibliography (e.g., links to primary sources); accessible, detailed API documentation; a detailed glossary of terminology; plus links to many helpful resources, such as online "playgrounds".

Meanwhile, overall we keep a practical focus on use cases.

Audience

Python developers who need to work with KGs
Data Scientists, Data Engineers, Machine Learning Engineers
Technical Leaders who want hands-on KG implementation experience
Executives working on data strategy who need to learn about KG capabilities
People interested in developing personal knowledge graphs

Key Takeaways

Hands-on experience with popular open source libraries in Python for building KGs, including rdflib, pyshacl, networkx, owlrl, pslpython, and more.
Coding examples that can be used as starting points for your own KG projects
How to blend different graph-based approaches within a data science workflow to complement each other's strengths: for data quality checks, inference, human-in-the-loop, etc.
Integrating with popular data science tools, such as pandas, scikit-learn, matplotlib, etc.
Graph-based practices that fit well with Big Data tools such as Spark, Parquet, Ray, RAPIDS, and so on

Prerequisites

Some coding experience in Python (you can read a 20-line program)
Interest in use cases that require knowledge graph representation

Additionally, if you've completed Algebra 2 in secondary school and have some business experience working with data analytics – both can come in handy.

Preparation

See the installation instructions at https://derwen.ai/docs/kgl/tutorial/#installation

Git clone https://github.com/DerwenAI/kglab
Install required libraries using pip or conda
Install JupyterLab

Outline

Sources for data and controlled vocabularies: using a progressive example based on a Kaggle dataset for food/recipes
KG Construction in rdflib and Serialization in TTL, JSON-LD, Parquet, etc.
Transformations between RDF graphs and algebraic objects
Interactive Visualization with PyVis
Querying with SPARQL, with results in pandas
Graph-based validation with SHACL constraint rules
Graph Algorithms in networkx and igraph
Inference based on semantic closures
Inference and data quality checks based on probabilistic soft logic
Embedding (deep learning) for data preparation and KG construction

Graph-Based Data Science Masterclass

A talk by Paco Nathan

Managing Partner, Derwen, Inc.

Audience

Key Takeaways

Prerequisites

Preparation

Outline

Connected Data World 2021 All Rights Reserved.

Connected Data is a trading name of Neural Alpha LTD.

Edinburgh House - 170 Kennington Lane
Lambeth, London - SE11 5DP

Graph-Based Data Science Masterclass

A talk by Paco Nathan

Managing Partner, Derwen, Inc.

Audience

Key Takeaways

Prerequisites

Preparation

Outline

Connected Data World 2021 All Rights Reserved.

Connected Data is a trading name of Neural Alpha LTD.

Edinburgh House - 170 Kennington LaneLambeth, London - SE11 5DP

Edinburgh House - 170 Kennington Lane
Lambeth, London - SE11 5DP