Lord of the Rings Character Interactions
Overview
This example takes a dataset that tells us when two characters have some type of interaction in the Lord of the Rings trilogy books and builds a graph of these interactions in Raphtory. It’s a great dataset to test different algorithms or even your own written algorithms. You can run this example using either our Python or Scala client.
Pre-requisites
Follow our Installation guide: Install
Data
The data is a csv
file (comma-separated values) and is pulled from our Github data repository.
Each line contains two characters that appeared in the same sentence in the
book, along with which sentence they appeared as indicated by a number.
In the example, the first line of the file is Gandalf,Elrond,33
which tells
us that Gandalf and Elrond appears together in sentence 33.
Lord Of The Rings Example 🧝🏻♀️🧙🏻♂️💍
We have detailed each step of our Scala and Python example code below. You can find the code on Github here: Scala and Python.
Setup environment 🌍
Import all necessary dependencies needed to build a graph from your data in Raphtory.
from pathlib import Path
from pyraphtory.context import PyRaphtory
from pyraphtory.vertex import Vertex
from pyraphtory.spouts import FileSpout
from pyraphtory.input import *
from pyvis.network import Network
import com.raphtory.Raphtory
import com.raphtory.algorithms.generic.ConnectedComponents
import com.raphtory.api.input.Graph.assignID
import com.raphtory.api.input.ImmutableProperty
import com.raphtory.api.input.Properties
import com.raphtory.api.input.Type
import com.raphtory.sinks.FileSink
import com.raphtory.utils.FileUtils
import scala.language.postfixOps
Download csv data from Github 💾
!curl -o /tmp/lotr.csv https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv
val path = "/tmp/lotr.csv"
val url = "https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv"
FileUtils.curlFile(path, url)
Terminal Output
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 52206 100 52206 0 0 147k 0 --:--:-- --:--:-- --:--:-- 149k
Create a new Raphtory graph 📊
Turn on logs to see what is going on in PyRaphtory. Initialise Raphtory by creating a PyRaphtory object and create your new graph.
ctx = PyRaphtory.local()
graph = ctx.new_graph()
val graph = Raphtory.newGraph()
Terminal Output
11:10:18.664 [Thread-12] INFO com.raphtory.internals.context.LocalContext$ - Creating Service for 'violent_rose_mastodon'
11:10:18.678 [io-compute-1] INFO com.raphtory.internals.management.Prometheus$ - Prometheus started on port /0:0:0:0:0:0:0:0:9999
11:10:19.491 [io-compute-1] INFO com.raphtory.internals.components.partition.PartitionOrchestrator$ - Creating '1' Partition Managers for 'violent_rose_mastodon'.
11:10:21.700 [io-compute-4] INFO com.raphtory.internals.components.partition.PartitionManager - Partition 0: Starting partition manager for 'violent_rose_mastodon'.
Ingest the data into a graph 😋
Write a parsing method to parse your csv file and ultimately create a graph.
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
source_node = row[0]
src_id = rg.assign_id(source_node)
target_node = row[1]
tar_id = rg.assign_id(target_node)
time_stamp = int(row[2])
rg.add_vertex(time_stamp, src_id, Properties(ImmutableProperty("name", source_node)), Type("Character"))
rg.add_vertex(time_stamp, tar_id, Properties(ImmutableProperty("name", target_node)), Type("Character"))
rg.add_edge(time_stamp, src_id, tar_id, Type("Character_Co-occurence"))
val file = scala.io.Source.fromFile(path)
file.getLines.foreach { line =>
// split csv line by comma
val fileLine = line.split(",").map(_.trim)
// assign parts to source and target node variable and convert to Long, also assign time to timestamp variable.
val sourceNode = fileLine(0)
val srcID = assignID(sourceNode)
val targetNode = fileLine(1)
val tarID = assignID(targetNode)
val timeStamp = fileLine(2).toLong
// add vertex and edges to graph
graph.addVertex(timeStamp, srcID, Properties(ImmutableProperty("name", sourceNode)), Type("Character"))
graph.addVertex(timeStamp, tarID, Properties(ImmutableProperty("name", targetNode)), Type("Character"))
graph.addEdge(timeStamp, srcID, tarID, Type("Character Co-occurence"))
}
Collect simple metrics 📈
Select certain metrics to show in your output dataframe. Here we have selected vertex name, degree, out degree and in degree.
from pyraphtory.graph import Row
df = rg \
.select(lambda vertex: Row(vertex.name(), vertex.degree(), vertex.out_degree(), vertex.in_degree())) \
.write_to_dataframe(["name", "degree", "out_degree", "in_degree"])
val graph = Raphtory.newGraph()
graph
.execute(Degree())
.writeTo(FileSink("/tmp/raphtory"))
.waitForJob()
Terminal output
11:10:42.583 [spawner-akka.actor.default-dispatcher-6] INFO com.raphtory.internals.components.querymanager.QueryManager - Source '0' is unblocking analysis for Graph 'violent_rose_mastodon' with 7947 messages sent.
11:10:42.585 [io-compute-8] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Starting query progress tracker.
11:10:42.590 [spawner-akka.actor.default-dispatcher-6] INFO com.raphtory.internals.components.querymanager.QueryManager - Query '722153918_5338049307951797618' received, your job ID is '722153918_5338049307951797618'.
11:10:42.596 [spawner-akka.actor.default-dispatcher-9] INFO com.raphtory.internals.components.partition.QueryExecutor - 722153918_5338049307951797618_0: Starting QueryExecutor.
11:10:43.395 [spawner-akka.actor.default-dispatcher-3] INFO com.raphtory.internals.components.querymanager.QueryHandler - Job '722153918_5338049307951797618': Perspective at Time '32674' took 790 ms to run.
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job '722153918_5338049307951797618': Perspective '32674' finished in 810 ms.
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Running query, processed 1 perspectives.
11:10:43.397 [spawner-akka.actor.default-dispatcher-3] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Query completed with 1 perspectives and finished in 812 ms.
Clean dataframe 🧹 and preview 👀
In Python, we need to clean the dataframe and we can preview it. In Scala, we can preview the saved csv file in the /tmp directory, which we set in the .writeTo method, this can be done in the bash terminal.
df.drop(columns=['window'], inplace=True)
df
cd /tmp/raphtory
cd Degree_JOBID
cat partition-0.csv
Terminal Output
11:10:42.583 [spawner-akka.actor.default-dispatcher-6] INFO com.raphtory.internals.components.querymanager.QueryManager - Source '0' is unblocking analysis for Graph 'violent_rose_mastodon' with 7947 messages sent.
11:10:42.585 [io-compute-8] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Starting query progress tracker.
11:10:42.590 [spawner-akka.actor.default-dispatcher-6] INFO com.raphtory.internals.components.querymanager.QueryManager - Query '722153918_5338049307951797618' received, your job ID is '722153918_5338049307951797618'.
11:10:42.596 [spawner-akka.actor.default-dispatcher-9] INFO com.raphtory.internals.components.partition.QueryExecutor - 722153918_5338049307951797618_0: Starting QueryExecutor.
11:10:43.395 [spawner-akka.actor.default-dispatcher-3] INFO com.raphtory.internals.components.querymanager.QueryHandler - Job '722153918_5338049307951797618': Perspective at Time '32674' took 790 ms to run.
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job '722153918_5338049307951797618': Perspective '32674' finished in 810 ms.
11:10:43.395 [spawner-akka.actor.default-dispatcher-11] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Running query, processed 1 perspectives.
11:10:43.397 [spawner-akka.actor.default-dispatcher-3] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job 722153918_5338049307951797618: Query completed with 1 perspectives and finished in 812 ms.
Sort by highest degree, top 10
df.sort_values(['degree'], ascending=False)[:10]
timestamp |
name |
degree |
out_degree |
in_degree |
|
---|---|---|---|---|---|
55 |
32674 |
Frodo |
51 |
37 |
22 |
54 |
32674 |
Gandalf |
49 |
35 |
24 |
97 |
32674 |
Aragorn |
45 |
5 |
45 |
63 |
32674 |
Merry |
34 |
23 |
18 |
32 |
32674 |
Pippin |
34 |
30 |
10 |
56 |
32674 |
Elrond |
32 |
18 |
24 |
52 |
32674 |
Théoden |
30 |
22 |
9 |
134 |
32674 |
Faramir |
29 |
3 |
29 |
118 |
32674 |
Sam |
28 |
20 |
17 |
129 |
32674 |
Gimli |
25 |
22 |
11 |
Sort by highest in-degree, top 10
df.sort_values(['in_degree'], ascending=False)[:10]
timestamp |
name |
degree |
out_degree |
in_degree |
|
---|---|---|---|---|---|
97 |
32674 |
Aragorn |
45 |
5 |
45 |
134 |
32674 |
Faramir |
29 |
3 |
29 |
54 |
32674 |
Gandalf |
49 |
35 |
24 |
56 |
32674 |
Elrond |
32 |
18 |
24 |
55 |
32674 |
Frodo |
51 |
37 |
22 |
63 |
32674 |
Merry |
34 |
23 |
18 |
138 |
32674 |
Boromir |
18 |
6 |
17 |
118 |
32674 |
Sam |
28 |
20 |
17 |
3 |
32674 |
Galadriel |
19 |
6 |
16 |
132 |
32674 |
Legolas |
25 |
18 |
16 |
Sort by highest out-degree, top 10
df.sort_values(['out_degree'], ascending=False)[:10]
timestamp |
name |
degree |
out_degree |
in_degree |
|
---|---|---|---|---|---|
55 |
32674 |
Frodo |
51 |
37 |
22 |
54 |
32674 |
Gandalf |
49 |
35 |
24 |
32 |
32674 |
Pippin |
34 |
30 |
10 |
63 |
32674 |
Merry |
34 |
23 |
18 |
52 |
32674 |
Théoden |
30 |
22 |
9 |
129 |
32674 |
Gimli |
25 |
22 |
11 |
118 |
32674 |
Sam |
28 |
20 |
17 |
56 |
32674 |
Elrond |
32 |
18 |
24 |
4 |
32674 |
Isildur |
18 |
18 |
0 |
132 |
32674 |
Legolas |
25 |
18 |
16 |
Run a PageRank algorithm 📑
Run your selected algorithm on your graph, here we run PageRank and then NodeList. Specify where you write the result of your algorithm to, e.g. the additional column results in your dataframe.
cols = ["prlabel"]
df_pagerank = rg.at(32674) \
.past() \
.transform(pr.algorithms.generic.centrality.PageRank())\
.execute(pr.algorithms.generic.NodeList(*cols)) \
.write_to_dataframe(["name"] + cols)
graph
.at(32674)
.past()
.execute(PageRank())
.writeTo(FileSink("/tmp/raphtory"))
.waitForJob()
Terminal Output
11:10:57.397 [io-compute-9] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:NodeList_7431747532744364308: Starting query progress tracker.
11:10:57.414 [spawner-akka.actor.default-dispatcher-11] INFO com.raphtory.internals.components.querymanager.QueryManager - Query 'PageRank:NodeList_7431747532744364308' received, your job ID is 'PageRank:NodeList_7431747532744364308'.
11:10:57.416 [spawner-akka.actor.default-dispatcher-6] INFO com.raphtory.internals.components.partition.QueryExecutor - PageRank:NodeList_7431747532744364308_0: Starting QueryExecutor.
11:10:57.597 [spawner-akka.actor.default-dispatcher-7] INFO com.raphtory.internals.components.querymanager.QueryHandler - Job 'PageRank:NodeList_7431747532744364308': Perspective at Time '32674' took 179 ms to run.
11:10:57.597 [spawner-akka.actor.default-dispatcher-9] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job 'PageRank:NodeList_7431747532744364308': Perspective '32674' finished in 200 ms.
11:10:57.597 [spawner-akka.actor.default-dispatcher-9] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:NodeList_7431747532744364308: Running query, processed 1 perspectives.
11:10:57.597 [spawner-akka.actor.default-dispatcher-9] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:NodeList_7431747532744364308: Query completed with 1 perspectives and finished in 200 ms.
Clean dataframe 🧹 and preview 👀
df_pagerank.drop(columns=['window'], inplace=True)
df_pagerank
cd /tmp/raphtory
cd PageRank:NodeList_JOBID
cat partition-0.csv
timestamp |
name |
prlabel |
|
---|---|---|---|
0 |
32674 |
Hirgon |
0.277968 |
1 |
32674 |
Hador |
0.459710 |
2 |
32674 |
Horn |
0.522389 |
3 |
32674 |
Galadriel |
2.228852 |
4 |
32674 |
Isildur |
0.277968 |
… |
… |
… |
… |
134 |
32674 |
Faramir |
8.551166 |
135 |
32674 |
Bain |
0.396105 |
136 |
32674 |
Walda |
0.817198 |
137 |
32674 |
Thranduil |
0.761719 |
138 |
32674 |
Boromir |
4.824014 |
The top ten highest Page Rank
df_pagerank.sort_values(['prlabel'], ascending=False)[:10]
timestamp |
name |
prlabel |
|
---|---|---|---|
97 |
32674 |
Aragorn |
13.246457 |
134 |
32674 |
Faramir |
8.551166 |
56 |
32674 |
Elrond |
5.621548 |
138 |
32674 |
Boromir |
4.824014 |
132 |
32674 |
Legolas |
4.622590 |
110 |
32674 |
Imrahil |
4.095600 |
65 |
32674 |
Éomer |
3.473897 |
42 |
32674 |
Samwise |
3.292762 |
118 |
32674 |
Sam |
2.826140 |
55 |
32674 |
Frodo |
2.806475 |
Run a connected components algorithm
cols = ["cclabel"]
df_cc = rg.at(32674) \
.past() \
.transform(pr.algorithms.generic.ConnectedComponents)\
.execute(pr.algorithms.generic.NodeList(*cols)) \
.write_to_dataframe(["name"] + cols)
graph
.at(32674)
.past()
.execute(ConnectedComponents)
.writeTo(FileSink("/tmp/raphtory"))
.waitForJob()
Terminal Output
11:14:59.742 [io-compute-3] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job ConnectedComponents:NodeList_5614237107038973005: Starting query progress tracker.
11:14:59.744 [spawner-akka.actor.default-dispatcher-9] INFO com.raphtory.internals.components.querymanager.QueryManager - Query 'ConnectedComponents:NodeList_5614237107038973005' received, your job ID is 'ConnectedComponents:NodeList_5614237107038973005'.
11:14:59.745 [spawner-akka.actor.default-dispatcher-7] INFO com.raphtory.internals.components.partition.QueryExecutor - ConnectedComponents:NodeList_5614237107038973005_0: Starting QueryExecutor.
11:14:59.772 [spawner-akka.actor.default-dispatcher-7] INFO com.raphtory.internals.components.querymanager.QueryHandler - Job 'ConnectedComponents:NodeList_5614237107038973005': Perspective at Time '32674' took 26 ms to run.
11:14:59.772 [spawner-akka.actor.default-dispatcher-3] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job 'ConnectedComponents:NodeList_5614237107038973005': Perspective '32674' finished in 30 ms.
11:14:59.772 [spawner-akka.actor.default-dispatcher-3] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job ConnectedComponents:NodeList_5614237107038973005: Running query, processed 1 perspectives.
11:14:59.773 [spawner-akka.actor.default-dispatcher-7] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job ConnectedComponents:NodeList_5614237107038973005: Query completed with 1 perspectives and finished in 31 ms.
Clean dataframe 🧹 and preview 👀
df_cc.drop(columns=['window'], inplace=True)
df_cc
cd /tmp/raphtory
cd ConnectedComponents_JOBID
cat partition-0.csv
timestamp |
name |
cclabel |
|
---|---|---|---|
0 |
32674 |
Hirgon |
-8637342647242242534 |
1 |
32674 |
Hador |
-8637342647242242534 |
2 |
32674 |
Horn |
-8637342647242242534 |
3 |
32674 |
Galadriel |
-8637342647242242534 |
4 |
32674 |
Isildur |
-8637342647242242534 |
… |
… |
… |
… |
134 |
32674 |
Faramir |
-8637342647242242534 |
135 |
32674 |
Bain |
-6628080393138316116 |
136 |
32674 |
Walda |
-8637342647242242534 |
137 |
32674 |
Thranduil |
-8637342647242242534 |
138 |
32674 |
Boromir |
-8637342647242242534 |
Number of distinct components
Extract number of distinct components, which is 3 in this dataframe.
len(set(df_cc['cclabel']))
Terminal Output
Out[19]: 3
Size of components
Calculate the size of the 3 connected components.
df_cc.groupby(['cclabel']).count().reset_index().drop(columns=['timestamp'])
cclabel |
name |
|
---|---|---|
0 |
-8637342647242242534 |
134 |
1 |
-6628080393138316116 |
3 |
2 |
-5499479516525190226 |
2 |
Run chained algorithms at once
In this example, we chain PageRank, Connected Components and Degree algorithms, running them one after another on the graph. Specify all the columns in the output dataframe, including an output column for each algorithm in the chain.
cols = ["inDegree", "outDegree", "degree","prlabel","cclabel"]
df_chained = rg.at(32674) \
.past() \
.transform(pr.algorithms.generic.centrality.PageRank())\
.transform(pr.algorithms.generic.ConnectedComponents)\
.transform(pr.algorithms.generic.centrality.Degree())\
.execute(pr.algorithms.generic.NodeList(*cols)) \
.write_to_dataframe(["name"] + cols)
graph
.at(32674)
.past()
.transform(PageRank())
.transform(ConnectedComponents)
.transform(Degree())
.execute(NodeList(Seq("prlabel","cclabel", "inDegree", "outDegree", "degree")))
.writeTo(FileSink("/tmp/raphtory"))
.waitForJob()
11:15:08.397 [io-compute-7] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413: Starting query progress tracker.
11:15:08.401 [spawner-akka.actor.default-dispatcher-6] INFO com.raphtory.internals.components.querymanager.QueryManager - Query 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413' received, your job ID is 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413'.
11:15:08.402 [spawner-akka.actor.default-dispatcher-7] INFO com.raphtory.internals.components.partition.QueryExecutor - PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413_0: Starting QueryExecutor.
11:15:08.457 [spawner-akka.actor.default-dispatcher-10] INFO com.raphtory.internals.components.querymanager.QueryHandler - Job 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413': Perspective at Time '32674' took 52 ms to run.
11:15:08.457 [spawner-akka.actor.default-dispatcher-5] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job 'PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413': Perspective '32674' finished in 60 ms.
11:15:08.457 [spawner-akka.actor.default-dispatcher-5] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413: Running query, processed 1 perspectives.
11:15:08.458 [spawner-akka.actor.default-dispatcher-5] INFO com.raphtory.api.querytracker.QueryProgressTracker - Job PageRank:ConnectedComponents:Degree:NodeList_8956930356200985413: Query completed with 1 perspectives and finished in 61 ms.
Clean dataframe 🧹 and preview 👀
df_chained.drop(columns=['window'])
df_chained
cd /tmp/raphtory
cd PageRank:ConnectedComponents:Degree:NodeList_JOBID
cat partition-0.csv
timestamp |
name |
inDegree |
outDegree |
degree |
prlabel |
cclabel |
|
---|---|---|---|---|---|---|---|
0 |
32674 |
Hirgon |
0 |
2 |
2 |
0.277968 |
-8637342647242242534 |
1 |
32674 |
Hador |
2 |
1 |
3 |
0.459710 |
-8637342647242242534 |
2 |
32674 |
Horn |
3 |
1 |
4 |
0.522389 |
-8637342647242242534 |
3 |
32674 |
Galadriel |
16 |
6 |
19 |
2.228852 |
-8637342647242242534 |
4 |
32674 |
Isildur |
0 |
18 |
18 |
0.277968 |
-8637342647242242534 |
… |
… |
… |
… |
… |
… |
… |
… |
134 |
32674 |
Faramir |
29 |
3 |
29 |
8.551166 |
-8637342647242242534 |
135 |
32674 |
Bain |
1 |
1 |
2 |
0.396105 |
-6628080393138316116 |
136 |
32674 |
Walda |
10 |
3 |
13 |
0.817198 |
-8637342647242242534 |
137 |
32674 |
Thranduil |
2 |
0 |
2 |
0.761719 |
-8637342647242242534 |
138 |
32674 |
Boromir |
17 |
6 |
18 |
4.824014 |
-8637342647242242534 |
Create visualisation by adding nodes 🔎
def visualise(rg, df_chained):
# Create network object
net = Network(notebook=True, height='750px', width='100%', bgcolor='#222222', font_color='white')
# Set visualisation tool
net.force_atlas_2based()
# Get the node list
df_node_list = rg.at(32674) \
.past() \
.execute(pr.algorithms.generic.NodeList()) \
.write_to_dataframe(['name'])
nodes = df_node_list['name'].tolist()
node_data = []
ignore_items = ['timestamp', 'name', 'window']
for node_name in nodes:
for i, row in df_chained.iterrows():
if row['name']==node_name:
data = ''
for k,v in row.iteritems():
if k not in ignore_items:
data = data+str(k)+': '+str(v)+'\n'
node_data.append(data)
continue
# Add the nodes
net.add_nodes(nodes, title=node_data)
# Get the edge list
df_edge_list = rg.at(32674) \
.past() \
.execute(pr.algorithms.generic.EdgeList()) \
.write_to_dataframe(['from', 'to'])
edges = []
for i, row in df_edge_list[['from', 'to']].iterrows():
edges.append([row['from'], row['to']])
# Add the edges
net.add_edges(edges)
# Toggle physics
net.toggle_physics(True)
return net
net = visualise(rg, df_chained)
Show the html file of the visualisation
net.show('preview.html')
Shut down PyRaphtory 🛑
pr.shutdown()