Lord of the Rings Character Interactions

Overview

This example takes a dataset that tells us when two characters have some type of interaction in the Lord of the Rings trilogy books and builds a graph of these interactions in Raphtory. It’s a great dataset to test different algorithms or even your own written algorithms. You can run this example using either our Python or Scala client.

Pre-requisites

Follow our Installation guide: Scala or Python (with Conda), Python (without Conda).

Data

The data is a csv file (comma-separated values) and is pulled from our Github data repository. Each line contains two characters that appeared in the same sentence in the book, along with which sentence they appeared as indicated by a number. In the example, the first line of the file is Gandalf,Elrond,33 which tells us that Gandalf and Elrond appears together in sentence 33.

Intro Graphic of LOTR slices

Lord Of The Rings Example πŸ§πŸ»β€β™€οΈπŸ§™πŸ»β€β™‚οΈπŸ’οƒ

We have detailed each step of our Scala and Python example code below. You can find the code on Github here: Scala and Python.

Setup environment πŸŒοƒ

Import all necessary dependencies needed to build a graph from your data in Raphtory.

from pathlib import Path
from pyraphtory.context import PyRaphtory
from pyraphtory.vertex import Vertex
from pyraphtory.spouts import FileSpout
from pyraphtory.builder import *
from pyvis.network import Network

Download csv data from Github πŸ’Ύοƒ

!curl -o /tmp/lotr.csv https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv

Terminal Output

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 52206  100 52206    0     0   147k      0 --:--:-- --:--:-- --:--:--  149k

Create a new Raphtory graph πŸ“Šοƒ

Turn on logs to see what is going on in PyRaphtory. Initialise Raphtory by creating a PyRaphtory object and create your new graph.

pr = PyRaphtory(logging=True).open()
rg = pr.new_graph()

Terminal Output

11:10:18.664 [Thread-12] INFO  com.raphtory.internals.context.LocalContext$ - Creating Service for 'violent_rose_mastodon'
11:10:18.678 [io-compute-1] INFO  com.raphtory.internals.management.Prometheus$ - Prometheus started on port /0:0:0:0:0:0:0:0:9999
11:10:19.491 [io-compute-1] INFO  com.raphtory.internals.components.partition.PartitionOrchestrator$ - Creating '1' Partition Managers for 'violent_rose_mastodon'.
11:10:21.700 [io-compute-4] INFO  com.raphtory.internals.components.partition.PartitionManager - Partition 0: Starting partition manager for 'violent_rose_mastodon'.

Ingest the data into a graph πŸ˜‹οƒ

Write a parsing method to parse your csv file and ultimately create a graph.

with open(filename, 'r') as csvfile:
    datareader = csv.reader(csvfile)
    for row in datareader:
        source_node = row[0]
        src_id = rg.assign_id(source_node)
        target_node = row[1]
        tar_id = rg.assign_id(target_node)
        time_stamp = int(row[2])
        rg.add_vertex(time_stamp, src_id, Properties(ImmutableProperty("name", source_node)), Type("Character"))
        rg.add_vertex(time_stamp, tar_id, Properties(ImmutableProperty("name", target_node)), Type("Character"))
        rg.add_edge(time_stamp, src_id, tar_id, Type("Character_Co-occurence"))

Collect simple metrics πŸ“ˆοƒ

Select certain metrics to show in your output dataframe. Here we have selected vertex name, degree, out degree and in degree.

from pyraphtory.graph import Row
df = rg \
      .select(lambda vertex: Row(vertex.name(), vertex.degree(), vertex.out_degree(), vertex.in_degree())) \
      .write_to_dataframe(["name", "degree", "out_degree", "in_degree"])

Terminal output

11:10:42.583 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.querymanager.QueryManager - Source '0' is unblocking analysis for Graph 'violent_rose_mastodon' with 7947 messages sent.
11:10:42.585 [io-compute-8] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Starting query progress tracker.
11:10:42.590 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.querymanager.QueryManager - Query '722153918_5338049307951797618' received, your job ID is '722153918_5338049307951797618'.
11:10:42.596 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.internals.components.partition.QueryExecutor - 722153918_5338049307951797618_0: Starting QueryExecutor.
11:10:43.395 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.internals.components.querymanager.QueryHandler - Job '722153918_5338049307951797618': Perspective at Time '32674' took 790 ms to run. 
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job '722153918_5338049307951797618': Perspective '32674' finished in 810 ms.
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Running query, processed 1 perspectives.
11:10:43.397 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Query completed with 1 perspectives and finished in 812 ms.

Clean dataframe 🧹 and preview πŸ‘€οƒ

In Python, we need to clean the dataframe and we can preview it. In Scala, we can preview the saved csv file in the /tmp directory, which we set in the .writeTo method, this can be done in the bash terminal.

df.drop(columns=['window'], inplace=True)
df

Terminal Output

11:10:42.583 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.querymanager.QueryManager - Source '0' is unblocking analysis for Graph 'violent_rose_mastodon' with 7947 messages sent.
11:10:42.585 [io-compute-8] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Starting query progress tracker.
11:10:42.590 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.querymanager.QueryManager - Query '722153918_5338049307951797618' received, your job ID is '722153918_5338049307951797618'.
11:10:42.596 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.internals.components.partition.QueryExecutor - 722153918_5338049307951797618_0: Starting QueryExecutor.
11:10:43.395 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.internals.components.querymanager.QueryHandler - Job '722153918_5338049307951797618': Perspective at Time '32674' took 790 ms to run. 
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job '722153918_5338049307951797618': Perspective '32674' finished in 810 ms.
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Running query, processed 1 perspectives.
11:10:43.397 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Query completed with 1 perspectives and finished in 812 ms.

Sort by highest degree, top 10

df.sort_values(['degree'], ascending=False)[:10]
Top 10 Highest Degree Results

timestamp

name

degree

out_degree

in_degree

55

32674

Frodo

51

37

22

54

32674

Gandalf

49

35

24

97

32674

Aragorn

45

5

45

63

32674

Merry

34

23

18

32

32674

Pippin

34

30

10

56

32674

Elrond

32

18

24

52

32674

ThΓ©oden

30

22

9

134

32674

Faramir

29

3

29

118

32674

Sam

28

20

17

129

32674

Gimli

25

22

11

Sort by highest in-degree, top 10

df.sort_values(['in_degree'], ascending=False)[:10]
Top 10 Highest In Degree Results

timestamp

name

degree

out_degree

in_degree

97

32674

Aragorn

45

5

45

134

32674

Faramir

29

3

29

54

32674

Gandalf

49

35

24

56

32674

Elrond

32

18

24

55

32674

Frodo

51

37

22

63

32674

Merry

34

23

18

138

32674

Boromir

18

6

17

118

32674

Sam

28

20

17

3

32674

Galadriel

19

6

16

132

32674

Legolas

25

18

16

Sort by highest out-degree, top 10

df.sort_values(['out_degree'], ascending=False)[:10]
Top 10 Highest Out Degree Results

timestamp

name

degree

out_degree

in_degree

55

32674

Frodo

51

37

22

54

32674

Gandalf

49

35

24

32

32674

Pippin

34

30

10

63

32674

Merry

34

23

18

52

32674

ThΓ©oden

30

22

9

129

32674

Gimli

25

22

11

118

32674

Sam

28

20

17

56

32674

Elrond

32

18

24

4

32674

Isildur

18

18

0

132

32674

Legolas

25

18

16

Run a PageRank algorithm πŸ“‘οƒ

Run your selected algorithm on your graph, here we run PageRank and then NodeList. Specify where you write the result of your algorithm to, e.g. the additional column results in your dataframe.

cols = ["prlabel"]

df_pagerank = rg.at(32674) \
                .past() \
                .transform(pr.algorithms.generic.centrality.PageRank())\
                .execute(pr.algorithms.generic.NodeList(*cols)) \
                .write_to_dataframe(["name"] + cols)

Terminal Output

11:10:57.397 [io-compute-9] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:NodeList_7431747532744364308: Starting query progress tracker.
11:10:57.414 [spawner-akka.actor.default-dispatcher-11] INFO  com.raphtory.internals.components.querymanager.QueryManager - Query 'PageRank:NodeList_7431747532744364308' received, your job ID is 'PageRank:NodeList_7431747532744364308'.
11:10:57.416 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.partition.QueryExecutor - PageRank:NodeList_7431747532744364308_0: Starting QueryExecutor.
11:10:57.597 [spawner-akka.actor.default-dispatcher-7] INFO  com.raphtory.internals.components.querymanager.QueryHandler - Job 'PageRank:NodeList_7431747532744364308': Perspective at Time '32674' took 179 ms to run. 
11:10:57.597 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 'PageRank:NodeList_7431747532744364308': Perspective '32674' finished in 200 ms.
11:10:57.597 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:NodeList_7431747532744364308: Running query, processed 1 perspectives.
11:10:57.597 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:NodeList_7431747532744364308: Query completed with 1 perspectives and finished in 200 ms.

Clean dataframe 🧹 and preview πŸ‘€οƒ

df_pagerank.drop(columns=['window'], inplace=True)
df_pagerank
Preview Dataframe

timestamp

name

prlabel

0

32674

Hirgon

0.277968

1

32674

Hador

0.459710

2

32674

Horn

0.522389

3

32674

Galadriel

2.228852

4

32674

Isildur

0.277968

…

…

…

…

134

32674

Faramir

8.551166

135

32674

Bain

0.396105

136

32674

Walda

0.817198

137

32674

Thranduil

0.761719

138

32674

Boromir

4.824014

The top ten highest Page Rank

df_pagerank.sort_values(['prlabel'], ascending=False)[:10]
Top Ten Highest Page Rank characters in LOTR

timestamp

name

prlabel

97

32674

Aragorn

13.246457

134

32674

Faramir

8.551166

56

32674

Elrond

5.621548

138

32674

Boromir

4.824014

132

32674

Legolas

4.622590

110

32674

Imrahil

4.095600

65

32674

Γ‰omer

3.473897

42

32674

Samwise

3.292762

118

32674

Sam

2.826140

55

32674

Frodo

2.806475

Run a connected components algorithm

cols = ["cclabel"]
df_cc = rg.at(32674) \
                .past() \
                .transform(pr.algorithms.generic.ConnectedComponents)\
                .execute(pr.algorithms.generic.NodeList(*cols)) \
                .write_to_dataframe(["name"] + cols)

Terminal Output

11:14:59.742 [io-compute-3] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job ConnectedComponents:NodeList_5614237107038973005: Starting query progress tracker.
11:14:59.744 [spawner-akka.actor.default-dispatcher-9] INFO  com.raphtory.internals.components.querymanager.QueryManager - Query 'ConnectedComponents:NodeList_5614237107038973005' received, your job ID is 'ConnectedComponents:NodeList_5614237107038973005'.
11:14:59.745 [spawner-akka.actor.default-dispatcher-7] INFO  com.raphtory.internals.components.partition.QueryExecutor - ConnectedComponents:NodeList_5614237107038973005_0: Starting QueryExecutor.
11:14:59.772 [spawner-akka.actor.default-dispatcher-7] INFO  com.raphtory.internals.components.querymanager.QueryHandler - Job 'ConnectedComponents:NodeList_5614237107038973005': Perspective at Time '32674' took 26 ms to run. 
11:14:59.772 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 'ConnectedComponents:NodeList_5614237107038973005': Perspective '32674' finished in 30 ms.
11:14:59.772 [spawner-akka.actor.default-dispatcher-3] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job ConnectedComponents:NodeList_5614237107038973005: Running query, processed 1 perspectives.
11:14:59.773 [spawner-akka.actor.default-dispatcher-7] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job ConnectedComponents:NodeList_5614237107038973005: Query completed with 1 perspectives and finished in 31 ms.

Clean dataframe 🧹 and preview πŸ‘€οƒ

df_cc.drop(columns=['window'], inplace=True)
df_cc
Preview dataframe

timestamp

name

cclabel

0

32674

Hirgon

-8637342647242242534

1

32674

Hador

-8637342647242242534

2

32674

Horn

-8637342647242242534

3

32674

Galadriel

-8637342647242242534

4

32674

Isildur

-8637342647242242534

…

…

…

…

134

32674

Faramir

-8637342647242242534

135

32674

Bain

-6628080393138316116

136

32674

Walda

-8637342647242242534

137

32674

Thranduil

-8637342647242242534

138

32674

Boromir

-8637342647242242534

Number of distinct components

Extract number of distinct components, which is 3 in this dataframe.

len(set(df_cc['cclabel']))

Terminal Output

Out[19]: 3

Size of components

Calculate the size of the 3 connected components.

df_cc.groupby(['cclabel']).count().reset_index().drop(columns=['timestamp'])
Size of the 3 distinct connected components

cclabel

name

0

-8637342647242242534

134

1

-6628080393138316116

3

2

-5499479516525190226

2

Run chained algorithms at once

In this example, we chain PageRank, Connected Components and Degree algorithms, running them one after another on the graph. Specify all the columns in the output dataframe, including an output column for each algorithm in the chain.

cols = ["inDegree", "outDegree", "degree","prlabel","cclabel"]

df_chained = rg.at(32674) \
                .past() \
                .transform(pr.algorithms.generic.centrality.PageRank())\
                .transform(pr.algorithms.generic.ConnectedComponents)\
                .transform(pr.algorithms.generic.centrality.Degree())\
                .execute(pr.algorithms.generic.NodeList(*cols)) \
                .write_to_dataframe(["name"] + cols)
11:15:08.397 [io-compute-7] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413: Starting query progress tracker.
11:15:08.401 [spawner-akka.actor.default-dispatcher-6] INFO  com.raphtory.internals.components.querymanager.QueryManager - Query 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413' received, your job ID is 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413'.
11:15:08.402 [spawner-akka.actor.default-dispatcher-7] INFO  com.raphtory.internals.components.partition.QueryExecutor - PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413_0: Starting QueryExecutor.
11:15:08.457 [spawner-akka.actor.default-dispatcher-10] INFO  com.raphtory.internals.components.querymanager.QueryHandler - Job 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413': Perspective at Time '32674' took 52 ms to run. 
11:15:08.457 [spawner-akka.actor.default-dispatcher-5] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413': Perspective '32674' finished in 60 ms.
11:15:08.457 [spawner-akka.actor.default-dispatcher-5] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413: Running query, processed 1 perspectives.
11:15:08.458 [spawner-akka.actor.default-dispatcher-5] INFO  com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413: Query completed with 1 perspectives and finished in 61 ms.

Clean dataframe 🧹 and preview πŸ‘€οƒ

df_chained.drop(columns=['window'])
df_chained
Preview chained algorithm output

timestamp

name

inDegree

outDegree

degree

prlabel

cclabel

0

32674

Hirgon

0

2

2

0.277968

-8637342647242242534

1

32674

Hador

2

1

3

0.459710

-8637342647242242534

2

32674

Horn

3

1

4

0.522389

-8637342647242242534

3

32674

Galadriel

16

6

19

2.228852

-8637342647242242534

4

32674

Isildur

0

18

18

0.277968

-8637342647242242534

…

…

…

…

…

…

…

…

134

32674

Faramir

29

3

29

8.551166

-8637342647242242534

135

32674

Bain

1

1

2

0.396105

-6628080393138316116

136

32674

Walda

10

3

13

0.817198

-8637342647242242534

137

32674

Thranduil

2

0

2

0.761719

-8637342647242242534

138

32674

Boromir

17

6

18

4.824014

-8637342647242242534

Create visualisation by adding nodes πŸ”Žοƒ

def visualise(rg, df_chained):
    # Create network object
    net = Network(notebook=True, height='750px', width='100%', bgcolor='#222222', font_color='white')
    # Set visualisation tool
    net.force_atlas_2based()
    # Get the node list 
    df_node_list = rg.at(32674) \
                .past() \
                .execute(pr.algorithms.generic.NodeList()) \
                .write_to_dataframe(['name'])
    
    nodes = df_node_list['name'].tolist()
    
    node_data = []
    ignore_items = ['timestamp', 'name', 'window']
    for node_name in nodes:
        for i, row in df_chained.iterrows():
            if row['name']==node_name:
                data = ''
                for k,v in row.iteritems():
                    if k not in ignore_items:
                        data = data+str(k)+': '+str(v)+'\n'
                node_data.append(data)
                continue
    # Add the nodes
    net.add_nodes(nodes, title=node_data)
    # Get the edge list
    df_edge_list = rg.at(32674) \
            .past() \
            .execute(pr.algorithms.generic.EdgeList()) \
            .write_to_dataframe(['from', 'to'])
    edges = []
    for i, row in df_edge_list[['from', 'to']].iterrows():
        edges.append([row['from'], row['to']])
    # Add the edges
    net.add_edges(edges)
    # Toggle physics
    net.toggle_physics(True)
    return net

net = visualise(rg, df_chained)

Show the html file of the visualisation

net.show('preview.html')

0%

Shut down PyRaphtory πŸ›‘οƒ

pr.shutdown()