Quickstart Guide
In this section, we will see how we can work with the Raphtory Scala APIs to create and analyse graphs. The foundations that we have built in the PyRaphtory introduction are almost exactly the same as for Scala, so here we shall only introduce concepts which are unique to Scala.
Installing Dependencies
To work with a Scala project you would need Scala, Java and SBT as development dependencies. The dependencies to work with Scala Raphtory APIs are Scala version 2.13.7
, Java version 11.0.11.hs-adpt
and SBT version 1.8.0
.
Note: You can manage these dependencies using SDKMAN, if you like.
The Raphtory version that we will be discussing in this section is a “pre-release” version v0.2.0a7
.
Create Project
We can create a new project from Raphtory giter8 template like so:
$ sbt new Raphtory/raphtory.g8
Besides downloading some of the required dependencies and creating a new project from the template, there are a couple of project details (with some sensible defaults) presented for you to change:
name [raphtory-quickstart]:
version [0.1.0-SNAPSHOT]:
scala_version [2.13.7]:
sbt_version [1.8.0]:
organization [com.pometry]:
package [com.pometry]:
raphtory_version [0.2.0a7]:
Create/Retrieve Graph
To start working with a graph, we can either create one with a user-specified graph ID or get it to auto-generate an ID. We can also retrieve an existing graph from its graph id.
To create a new graph
def runWithNewGraph[T](graphID: String = createName, persist: Boolean = false)
Note: The default behaviour is to destroy any new graph upon exiting the scope of raphtory context unless chosen to persist explicitly.
To retrieve an existing graph
def runWithGraph[T](graphID: String, destroy: Boolean = false)
Note: User can choose to destroy the graph upon exiting the raphtory context by set the
destroy
flag totrue
.Destroy graphs
Should user choose to destroy any of the persisted graphs for a given graph id
def destroyGraph(graphID: String)
These APIs are available on RaphtoryContext
which could be summoned differently for local and remote deployments.
For Local Deployment
The main application needs to extend the
Raphtory.Local
abstract class. We also need to override therun
method that brings “local”RaphtoryContext
in scope, which in turn is used to create or retrieve graphs:object Runner extends RaphtoryApp.Local { override def run(args: Array[String], ctx: RaphtoryContext): Unit = ??? }
For Remote Deployment
The main application needs to extend the
Raphtory.Remote("127.0.0.1", 1736)
abstract class. Since the hosts and ports mentioned here are already configured as default values in theapplication.conf
file, we need not provide them explicitly. We also need to override therun
method that brings “remote”RaphtoryContext
in scope, which in turn is used to create or retrieve graphs:object RemoteRunner extends RaphtoryApp.Remote() { override def run(args: Array[String], ctx: RaphtoryContext): Unit = ??? }
Load Data
Should you choose to work with a newly created graph, you must load it with some data:
val path = "/tmp/lotr.csv" val url = "https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv" FileUtils.curlFile(path, url) val source = Source(FileSpout(path), new LotrGraphBuilder()) graph.load(source)
Here we download raw data to
/tmp
and then transform it to vertices and edges using graph builders.If you are already working with an existing graph, you may not need to load any new data and you can jump straight into analysis
Analyse Graph
Once you have created or retrieved an already existing graph against a graphId
, you can run analysis on it and choose to write output to a sink of your choice.
graph
.execute(Degree())
.writeTo(FileSink("/tmp/raphtory"))
.waitForJob()
Here, we run Degree
algorithm on the TemporalGraph
. In the case of a directed graph, the algorithm returns “in-degree” and “out-degree” of nodes i.e., the incoming or outgoing edges. In the case of an undirected graph, the algorithm returns the “degree” of nodes,x representing the total number of edges. Additionally, we chose to write the output to a directory /tmp/raphtory
using FileSink
.
Puting it all together
Working with Local Deployment
object Runner extends RaphtoryApp.Local { override def run(args: Array[String], ctx: RaphtoryContext): Unit = ctx.runWithNewGraph() { graph => val path = "/tmp/lotr.csv" val url = "https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv" FileUtils.curlFile(path, url) val source = Source(FileSpout(path), new LotrGraphBuilder()) graph.load(source) graph .execute(Degree()) .writeTo(FileSink("/tmp/raphtory")) .waitForJob() } }
Working with Remote Deployment
object RemoteRunner extends RaphtoryApp.Remote() { override def run(args: Array[String], ctx: RaphtoryContext): Unit = ctx.runWithNewGraph() { graph => val path = "/tmp/lotr.csv" val url = "https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv" FileUtils.curlFile(path, url) val source = Source(FileSpout(path), new LotrGraphBuilder()) graph.load(source) graph .execute(Degree()) .writeTo(FileSink("/tmp/raphtory")) .waitForJob() } }
Run
Local Deployment
To create graph locally and run analysis
$ sbt "runMain com.pometry.Runner"
Remote Deployment (Standalone)
Start the standalone server instance
$ sbt "runMain com.raphtory.service.Standalone"
To create graph on the remote standalone server instance and run analysis
$ sbt "runMain com.pometry.RemoteRunner"