PyRaphtory can be easily installed via
pip. This will pull all of the background dependencies for Raphtory, automatically setting up any system paths to point at the correct location. Our only requirement here is you running
python version 3.9.13.
pip install pyraphtory
Once installed, let’s set up the most bare bones PyRaphtory graph, test that we can add some data to it and run our first query. Once this is all working we can move on to some much more exciting examples in the next section!
Before we start, however, you may have noticed that this page looks oddly like a iPython notebook. That is because it is! If you click the
open on github link in the top right of the page you can follow along on your own machine. Right, Back to the code!
First we need to import
PyRaphtory. You may see some references to
Java in the logs here, this is because under the hood Raphtory is written in
Scala. You don’t have to worry about any of that though as its all hidden away!
Creating your first graph
Once Raphtory is installed we can create our first graph! To do this we first need a
context which we can get from the PyRaphtory object.
Our two options here are
remote. As we are just testing it on our laptops we can use
local, meaning the Raphtory code will run within your python process. We will dig into
remote contexts later when you want to deploy in a separate process or scale your graph past what your laptop can handle.
Once we have our context we can call
new_graph(), which we can add data into and run queries on.
context = pyraphtory.local() graph = context.new_graph()
Adding data to your Graph
Once a graph is created, we need to add some data to it if we want run anything interesting. There are loads of ways of doing this in Raphtory, which we will cover in the next section, but for simplicity lets just add some vertices and edges without any properties.
As Raphtory is focused on dynamic and temporal analysis, all events in the graph’s history (adding, updating or deleting nodes/edges) must happen at a given time. This can all be at the same time (if, for example, you are working with snapshots) but we still need a time.
As such, when we add a vertex we have two arguments: the
timestamp and the
vertex ID. Similarly, when adding an edge, we have three arguments: the
source vertex and the
Note: All graphs are directed by default in Raphtory, but can be
projected into an undirected graph - we will go in-depth into graph projections later in the tutorial.
In the following code block we have five updates for our graph, adding three vertices (
3) at time
1 and two edges (
1->3) at time
graph.add_vertex(1, 1) graph.add_vertex(1, 2) graph.add_vertex(1, 3) graph.add_edge(2, 1, 2) graph.add_edge(2, 1, 3)
Running your first Query
Now that our data is loaded we can start interrogating it!
While we can write some very complicated algorithms in Raphtory, lets start off with something simple, getting the
outdegree of our nodes.
For this we call
select on the graph which takes the names of properties we want to extract, running on every vertex to obtain the respective values. This will return a
Table full of
Rows which represent the result for each node. Note, providing no names is seen as the equivalent of
select *, returning all properties for the vertices. Following a call to select we can either write our results to a
Sink (file, database, etc.), which we will cover later in the tutorial, or
convert it into a dataframe for further analysis.
In this example we have called
to_df to get a dataframe.
If you have a look in the logs you can see that your query is given a
Job ID and Raphtory will report how long it took for it to run.
df = graph \ .step(lambda vertex: vertex.set_state("outdegree", vertex.out_degree())) \ .step(lambda vertex: vertex.set_state("indegree", vertex.in_degree())) \ .select("name","outdegree","indegree") \ .to_df()
Checking out the output
Finally, once our query has run and we have got our dataframe, we can take a look at the results.
One aspect which is notable here is that we requested three variables, but we have four columns. This is because algorithms in Raphtory run at set points in time, meaning the values for each vertex must be associated with a
timestamp (in this case the most recent one
As with every other cool feature I have hinted at, you will soon be an expert in queries, time-analysis and much more. All you have to do is continue on to the next page!