Lord of the Rings Character Interactions

Overview

This example takes a dataset that tells us when two characters have some type of interaction in the Lord of the Rings trilogy books and builds a graph of these interactions in Raphtory. It’s a great dataset to test different algorithms or even your own written algorithms. You can run this example using either our Python or Scala client.

Pre-requisites

Follow our Installation guide: Install

Data

The data is a csv file (comma-separated values) and is pulled from our Github data repository. Each line contains two characters that appeared in the same sentence in the book, along with which sentence they appeared as indicated by a number. In the example, the first line of the file is Gandalf,Elrond,33 which tells us that Gandalf and Elrond appears together in sentence 33.

Intro Graphic of LOTR slices

Lord Of The Rings Example 🧝🏻‍♀️🧙🏻‍♂️💍

We have detailed each step of our Scala and Python example code below. You can find the code on Github here: Scala and Python.

Setup environment 🌍

Import all necessary dependencies needed to build a graph from your data in Raphtory.

from pathlib import Path
from pyraphtory.context import PyRaphtory
from pyraphtory.vertex import Vertex
from pyraphtory.spouts import FileSpout
from pyraphtory.input import *
from pyvis.network import Network

Download csv data from Github 💾

!curl -o /tmp/lotr.csv https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv

Terminal Output

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 52206  100 52206    0     0   147k      0 --:--:-- --:--:-- --:--:--  149k

Create a new Raphtory graph 📊

Turn on logs to see what is going on in PyRaphtory. Initialise Raphtory by creating a PyRaphtory object and create your new graph.

ctx = PyRaphtory.local()
graph = ctx.new_graph()

Terminal Output

11:10:18.664 [Thread-12] INFO  com.raphtory.internals.context.LocalContext$ - Creating Service for 'violent_rose_mastodon'
11:10:18.678 [io-compute-1] INFO  com.raphtory.internals.management.Prometheus$ - Prometheus started on port /0:0:0:0:0:0:0:0:9999
11:10:19.491 [io-compute-1] INFO  com.raphtory.internals.components.partition.PartitionOrchestrator$ - Creating '1' Partition Managers for 'violent_rose_mastodon'.
11:10:21.700 [io-compute-4] INFO  com.raphtory.internals.components.partition.PartitionManager - Partition 0: Starting partition manager for 'violent_rose_mastodon'.

Ingest the data into a graph 😋

Write a parsing method to parse your csv file and ultimately create a graph.

with open(filename, 'r') as csvfile:
    datareader = csv.reader(csvfile)
    for row in datareader:
        source_node = row[0]
        src_id = rg.assign_id(source_node)
        target_node = row[1]
        tar_id = rg.assign_id(target_node)
        time_stamp = int(row[2])
        rg.add_vertex(time_stamp, src_id, Properties(ImmutableProperty("name", source_node)), Type("Character"))
        rg.add_vertex(time_stamp, tar_id, Properties(ImmutableProperty("name", target_node)), Type("Character"))
        rg.add_edge(time_stamp, src_id, tar_id, Type("Character_Co-occurence"))

Collect simple metrics 📈

Select certain metrics to show in your output dataframe. Here we have selected vertex name, degree, out degree and in degree.

from pyraphtory.graph import Row
df = rg \
      .select(lambda vertex: Row(vertex.name(), vertex.degree(), vertex.out_degree(), vertex.in_degree())) \
      .write_to_dataframe(["name", "degree", "out_degree", "in_degree"])

Terminal output

11:10:42.583 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.querymanager.QueryManager - Source '0' is unblocking analysis for Graph 'violent_rose_mastodon' with 7947 messages sent.
11:10:42.585 [io-compute-8] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Starting query progress tracker.
11:10:42.590 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.querymanager.QueryManager - Query '722153918_5338049307951797618' received, your job ID is '722153918_5338049307951797618'.
11:10:42.596 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.internals.components.partition.QueryExecutor - 722153918_5338049307951797618_0: Starting QueryExecutor.
11:10:43.395 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.internals.components.querymanager.QueryHandler - Job '722153918_5338049307951797618': Perspective at Time '32674' took 790 ms to run. 
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job '722153918_5338049307951797618': Perspective '32674' finished in 810 ms.
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Running query, processed 1 perspectives.
11:10:43.397 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Query completed with 1 perspectives and finished in 812 ms.

Clean dataframe 🧹 and preview 👀

In Python, we need to clean the dataframe and we can preview it. In Scala, we can preview the saved csv file in the /tmp directory, which we set in the .writeTo method, this can be done in the bash terminal.

df.drop(columns=['window'], inplace=True)
df

Terminal Output

11:10:42.583 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.querymanager.QueryManager - Source '0' is unblocking analysis for Graph 'violent_rose_mastodon' with 7947 messages sent.
11:10:42.585 [io-compute-8] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Starting query progress tracker.
11:10:42.590 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.querymanager.QueryManager - Query '722153918_5338049307951797618' received, your job ID is '722153918_5338049307951797618'.
11:10:42.596 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.internals.components.partition.QueryExecutor - 722153918_5338049307951797618_0: Starting QueryExecutor.
11:10:43.395 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.internals.components.querymanager.QueryHandler - Job '722153918_5338049307951797618': Perspective at Time '32674' took 790 ms to run. 
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job '722153918_5338049307951797618': Perspective '32674' finished in 810 ms.
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Running query, processed 1 perspectives.
11:10:43.397 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Query completed with 1 perspectives and finished in 812 ms.

Sort by highest degree, top 10

df.sort_values(['degree'], ascending=False)[:10]
Top 10 Highest Degree Results

timestamp

name

degree

out_degree

in_degree

55

32674

Frodo

51

37

22

54

32674

Gandalf

49

35

24

97

32674

Aragorn

45

5

45

63

32674

Merry

34

23

18

32

32674

Pippin

34

30

10

56

32674

Elrond

32

18

24

52

32674

Théoden

30

22

9

134

32674

Faramir

29

3

29

118

32674

Sam

28

20

17

129

32674

Gimli

25

22

11

Sort by highest in-degree, top 10

df.sort_values(['in_degree'], ascending=False)[:10]
Top 10 Highest In Degree Results

timestamp

name

degree

out_degree

in_degree

97

32674

Aragorn

45

5

45

134

32674

Faramir

29

3

29

54

32674

Gandalf

49

35

24

56

32674

Elrond

32

18

24

55

32674

Frodo

51

37

22

63

32674

Merry

34

23

18

138

32674

Boromir

18

6

17

118

32674

Sam

28

20

17

3

32674

Galadriel

19

6

16

132

32674

Legolas

25

18

16

Sort by highest out-degree, top 10

df.sort_values(['out_degree'], ascending=False)[:10]
Top 10 Highest Out Degree Results

timestamp

name

degree

out_degree

in_degree

55

32674

Frodo

51

37

22

54

32674

Gandalf

49

35

24

32

32674

Pippin

34

30

10

63

32674

Merry

34

23

18

52

32674

Théoden

30

22

9

129

32674

Gimli

25

22

11

118

32674

Sam

28

20

17

56

32674

Elrond

32

18

24

4

32674

Isildur

18

18

0

132

32674

Legolas

25

18

16

Run a PageRank algorithm 📑

Run your selected algorithm on your graph, here we run PageRank and then NodeList. Specify where you write the result of your algorithm to, e.g. the additional column results in your dataframe.

cols = ["prlabel"]

df_pagerank = rg.at(32674) \
                .past() \
                .transform(pr.algorithms.generic.centrality.PageRank())\
                .execute(pr.algorithms.generic.NodeList(*cols)) \
                .write_to_dataframe(["name"] + cols)

Terminal Output

11:10:57.397 [io-compute-9] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:NodeList_7431747532744364308: Starting query progress tracker.
11:10:57.414 [spawner-akka.actor.default-dispatcher-11] INFO  com.raphtory.internals.components.querymanager.QueryManager - Query 'PageRank:NodeList_7431747532744364308' received, your job ID is 'PageRank:NodeList_7431747532744364308'.
11:10:57.416 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.partition.QueryExecutor - PageRank:NodeList_7431747532744364308_0: Starting QueryExecutor.
11:10:57.597 [spawner-akka.actor.default-dispatcher-7] INFO  com.raphtory.internals.components.querymanager.QueryHandler - Job 'PageRank:NodeList_7431747532744364308': Perspective at Time '32674' took 179 ms to run. 
11:10:57.597 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 'PageRank:NodeList_7431747532744364308': Perspective '32674' finished in 200 ms.
11:10:57.597 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:NodeList_7431747532744364308: Running query, processed 1 perspectives.
11:10:57.597 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:NodeList_7431747532744364308: Query completed with 1 perspectives and finished in 200 ms.

Clean dataframe 🧹 and preview 👀

df_pagerank.drop(columns=['window'], inplace=True)
df_pagerank
Preview Dataframe

timestamp

name

prlabel

0

32674

Hirgon

0.277968

1

32674

Hador

0.459710

2

32674

Horn

0.522389

3

32674

Galadriel

2.228852

4

32674

Isildur

0.277968

134

32674

Faramir

8.551166

135

32674

Bain

0.396105

136

32674

Walda

0.817198

137

32674

Thranduil

0.761719

138

32674

Boromir

4.824014

The top ten highest Page Rank

df_pagerank.sort_values(['prlabel'], ascending=False)[:10]
Top Ten Highest Page Rank characters in LOTR

timestamp

name

prlabel

97

32674

Aragorn

13.246457

134

32674

Faramir

8.551166

56

32674

Elrond

5.621548

138

32674

Boromir

4.824014

132

32674

Legolas

4.622590

110

32674

Imrahil

4.095600

65

32674

Éomer

3.473897

42

32674

Samwise

3.292762

118

32674

Sam

2.826140

55

32674

Frodo

2.806475

Run a connected components algorithm

cols = ["cclabel"]
df_cc = rg.at(32674) \
                .past() \
                .transform(pr.algorithms.generic.ConnectedComponents)\
                .execute(pr.algorithms.generic.NodeList(*cols)) \
                .write_to_dataframe(["name"] + cols)

Terminal Output

11:14:59.742 [io-compute-3] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job ConnectedComponents:NodeList_5614237107038973005: Starting query progress tracker.
11:14:59.744 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.internals.components.querymanager.QueryManager - Query 'ConnectedComponents:NodeList_5614237107038973005' received, your job ID is 'ConnectedComponents:NodeList_5614237107038973005'.
11:14:59.745 [spawner-akka.actor.default-dispatcher-7] INFO  com.raphtory.internals.components.partition.QueryExecutor - ConnectedComponents:NodeList_5614237107038973005_0: Starting QueryExecutor.
11:14:59.772 [spawner-akka.actor.default-dispatcher-7] INFO  com.raphtory.internals.components.querymanager.QueryHandler - Job 'ConnectedComponents:NodeList_5614237107038973005': Perspective at Time '32674' took 26 ms to run. 
11:14:59.772 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 'ConnectedComponents:NodeList_5614237107038973005': Perspective '32674' finished in 30 ms.
11:14:59.772 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job ConnectedComponents:NodeList_5614237107038973005: Running query, processed 1 perspectives.
11:14:59.773 [spawner-akka.actor.default-dispatcher-7] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job ConnectedComponents:NodeList_5614237107038973005: Query completed with 1 perspectives and finished in 31 ms.

Clean dataframe 🧹 and preview 👀

df_cc.drop(columns=['window'], inplace=True)
df_cc
Preview dataframe

timestamp

name

cclabel

0

32674

Hirgon

-8637342647242242534

1

32674

Hador

-8637342647242242534

2

32674

Horn

-8637342647242242534

3

32674

Galadriel

-8637342647242242534

4

32674

Isildur

-8637342647242242534

134

32674

Faramir

-8637342647242242534

135

32674

Bain

-6628080393138316116

136

32674

Walda

-8637342647242242534

137

32674

Thranduil

-8637342647242242534

138

32674

Boromir

-8637342647242242534

Number of distinct components

Extract number of distinct components, which is 3 in this dataframe.

len(set(df_cc['cclabel']))

Terminal Output

Out[19]: 3

Size of components

Calculate the size of the 3 connected components.

df_cc.groupby(['cclabel']).count().reset_index().drop(columns=['timestamp'])
Size of the 3 distinct connected components

cclabel

name

0

-8637342647242242534

134

1

-6628080393138316116

3

2

-5499479516525190226

2

Run chained algorithms at once

In this example, we chain PageRank, Connected Components and Degree algorithms, running them one after another on the graph. Specify all the columns in the output dataframe, including an output column for each algorithm in the chain.

cols = ["inDegree", "outDegree", "degree","prlabel","cclabel"]

df_chained = rg.at(32674) \
                .past() \
                .transform(pr.algorithms.generic.centrality.PageRank())\
                .transform(pr.algorithms.generic.ConnectedComponents)\
                .transform(pr.algorithms.generic.centrality.Degree())\
                .execute(pr.algorithms.generic.NodeList(*cols)) \
                .write_to_dataframe(["name"] + cols)
11:15:08.397 [io-compute-7] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413: Starting query progress tracker.
11:15:08.401 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.querymanager.QueryManager - Query 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413' received, your job ID is 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413'.
11:15:08.402 [spawner-akka.actor.default-dispatcher-7] INFO  com.raphtory.internals.components.partition.QueryExecutor - PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413_0: Starting QueryExecutor.
11:15:08.457 [spawner-akka.actor.default-dispatcher-10] INFO  com.raphtory.internals.components.querymanager.QueryHandler - Job 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413': Perspective at Time '32674' took 52 ms to run. 
11:15:08.457 [spawner-akka.actor.default-dispatcher-5] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413': Perspective '32674' finished in 60 ms.
11:15:08.457 [spawner-akka.actor.default-dispatcher-5] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413: Running query, processed 1 perspectives.
11:15:08.458 [spawner-akka.actor.default-dispatcher-5] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413: Query completed with 1 perspectives and finished in 61 ms.

Clean dataframe 🧹 and preview 👀

df_chained.drop(columns=['window'])
df_chained
Preview chained algorithm output

timestamp

name

inDegree

outDegree

degree

prlabel

cclabel

0

32674

Hirgon

0

2

2

0.277968

-8637342647242242534

1

32674

Hador

2

1

3

0.459710

-8637342647242242534

2

32674

Horn

3

1

4

0.522389

-8637342647242242534

3

32674

Galadriel

16

6

19

2.228852

-8637342647242242534

4

32674

Isildur

0

18

18

0.277968

-8637342647242242534

134

32674

Faramir

29

3

29

8.551166

-8637342647242242534

135

32674

Bain

1

1

2

0.396105

-6628080393138316116

136

32674

Walda

10

3

13

0.817198

-8637342647242242534

137

32674

Thranduil

2

0

2

0.761719

-8637342647242242534

138

32674

Boromir

17

6

18

4.824014

-8637342647242242534

Create visualisation by adding nodes 🔎

def visualise(rg, df_chained):
    # Create network object
    net = Network(notebook=True, height='750px', width='100%', bgcolor='#222222', font_color='white')
    # Set visualisation tool
    net.force_atlas_2based()
    # Get the node list 
    df_node_list = rg.at(32674) \
                .past() \
                .execute(pr.algorithms.generic.NodeList()) \
                .write_to_dataframe(['name'])
    
    nodes = df_node_list['name'].tolist()
    
    node_data = []
    ignore_items = ['timestamp', 'name', 'window']
    for node_name in nodes:
        for i, row in df_chained.iterrows():
            if row['name']==node_name:
                data = ''
                for k,v in row.iteritems():
                    if k not in ignore_items:
                        data = data+str(k)+': '+str(v)+'\n'
                node_data.append(data)
                continue
    # Add the nodes
    net.add_nodes(nodes, title=node_data)
    # Get the edge list
    df_edge_list = rg.at(32674) \
            .past() \
            .execute(pr.algorithms.generic.EdgeList()) \
            .write_to_dataframe(['from', 'to'])
    edges = []
    for i, row in df_edge_list[['from', 'to']].iterrows():
        edges.append([row['from'], row['to']])
    # Add the edges
    net.add_edges(edges)
    # Toggle physics
    net.toggle_physics(True)
    return net

net = visualise(rg, df_chained)

Show the html file of the visualisation

net.show('preview.html')

0%

Shut down PyRaphtory 🛑

pr.shutdown()