Deploying Raphtory on distributed Ubuntu servers

Information

  • Raphtory will run on port 1736 which is the port the client will need to connect on. You will need the security group/firewall pointing to the cluster manager to allow tcp connections to this server on port 1736

  • Zookeeper will run on port 2181 and the servers need to talk to each other on a variety of ports. We allow all traffic between this group of machines to facilitate these communications.

Install Raphtory and dependencies

Get Raphtory from source code

mkdir ~/pometry && cd ~/pometry && git clone https://github.com/Raphtory/Raphtory.git && cd Raphtory
git checkout <branch>

Install sbt

cd /usr/local/bin && sudo apt update && sudo apt install python3-pip unzip make -y && sudo curl -L "https://github.com/sbt/sbt/releases/download/v1.6.2/sbt-1.6.2.zip" -o sbt-1.6.2.zip && sudo unzip sbt-1.6.2.zip && sudo mv sbt sbt-bin && sudo mv sbt-bin/bin/sbt .

Install pip

sudo pip install pip -U

Install java

wget -O - https://packages.adoptium.net/artifactory/api/gpg/key/public | sudo apt-key add -
echo "deb https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | sudo tee /etc/apt/sources.list.d/adoptium.list
sudo apt update
sudo apt install temurin-11-jdk -y

Build Raphtory

cd ~/pometry/Raphtory && pip install .
# Restart shell here to get the newly installed command line tools

Run zookeeper steps (Cluster Manager machines)

Install zookeeper

cd ~/pometry
curl https://dlcdn.apache.org/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz --output apache-zookeeper-3.8.0-bin.tar.gz
tar xvf apache-zookeeper-3.8.0-bin.tar.gz
cat <<EOF > apache-zookeeper-3.8.0-bin/conf/zoo.cfg
 tickTime=2000
 dataDir=~/pometry/zookeeper-data
 clientPort=2181
EOF
~/pometry/apache-zookeeper-3.8.0-bin/bin/zkServer.sh start

Set Raphtory variables (All machines)

export RAPHTORY_CLUSTER_MANAGER_IP_ADDR=<CLUSTER_MANAGER_IP_ADDR>
export RAPHTORY_ZOOKEEPER_ADDRESS=${RAPHTORY_CLUSTER_MANAGER_IP_ADDR}:2181
export RAPHTORY_PARTITIONS_SERVERCOUNT=1
export RAPHTORY_PARTITIONS_COUNTPERSERVER=1
export RAPHTORY_PARTITIONS_CHUNKSIZE=128
export RAPHTORY_CORE_LOG=DEBUG

Run Cluster Manager Service (Cluster Manager machines)

raphtory-clustermanager &

Run Query Service (Query machines)

raphtory-query &

Run Ingeestion Service (Injestion machines)

raphtory-ingestion &

Run Partition Service (Partition machine)

raphtory-partition &

Client machine

From your client, invoke python (or Jupiter) and when in python terminal, you can create a context to a remote source

from pyraphtory.context import PyRaphtory
ctx = PyRaphtory.remote("<CLUSTER_MANAGER_IP_ADDR>",1736)
graph = ctx.new_graph()