raphtory.graph_loader

Load and save Raphtory graphs from/to file(s)

Members

raphtory.graph_loader.lotr_graph

Load the Lord of the Rings dataset into a graph.

raphtory.graph_loader.neo4j_movie_graph

raphtory.graph_loader.reddit_hyperlink_graph

Load (a subset of) Reddit hyperlinks dataset into a graph. The dataset is available at http://snap.stanford.edu/data/soc-redditHyperlinks-title.tsv The hyperlink network represents the directed connections between two subreddits (a subreddit is a community on Reddit). We also provide subreddit embeddings. The network is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017. *NOTE: It may take a while to download the dataset.

raphtory.graph_loader.stable_coin_graph

raphtory.graph_loader.lotr_graph(shards=Ellipsis)

Load the Lord of the Rings dataset into a graph. The dataset is available at https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv and is a list of interactions between characters in the Lord of the Rings books and movies. The dataset is a CSV file with the following columns:

  • src_id: The ID of the source character

  • dst_id: The ID of the destination character

  • time: The time of the interaction (in page)

Dataset statistics:
  • Number of nodes (subreddits) 139

  • Number of edges (hyperlink between subreddits) 701

Parameters

shards – The number of shards to use for the graph

Returns

A Graph containing the LOTR dataset

raphtory.graph_loader.neo4j_movie_graph(uri, username, password, database=Ellipsis, shards=Ellipsis)

Load (a subset of) Reddit hyperlinks dataset into a graph. The dataset is available at http://snap.stanford.edu/data/soc-redditHyperlinks-title.tsv The hyperlink network represents the directed connections between two subreddits (a subreddit is a community on Reddit). We also provide subreddit embeddings. The network is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017. *NOTE: It may take a while to download the dataset

Dataset statistics:
  • Number of nodes (subreddits) 35,776

  • Number of edges (hyperlink between subreddits) 137,821

  • Timespan Jan 2014 - April 2017

Source:
    1. Kumar, W.L. Hamilton, J. Leskovec, D. Jurafsky. Community Interaction and Conflict

on the Web. World Wide Web Conference, 2018.

Properties:

  • SOURCE_SUBREDDIT: the subreddit where the link originates

  • TARGET_SUBREDDIT: the subreddit where the link ends

  • POST_ID: the post in the source subreddit that starts the link

  • TIMESTAMP: time time of the post

  • POST_LABEL: label indicating if the source post is explicitly negative towards the target

post. The value is -1 if the source is negative towards the target, and 1 if it is neutral or positive. The label is created using crowd-sourcing and training a text based classifier, and is better than simple sentiment analysis of the posts. Please see the reference paper for details.

  • POST_PROPERTIES: a vector representing the text properties of the source post, listed as a

list of comma separated numbers. This can be found on the source website

Parameters
  • shards – The number of shards to use for the graph

  • timeout_seconds – The number of seconds to wait for the dataset to download

Returns

A Graph containing the Reddit hyperlinks dataset

raphtory.graph_loader.stable_coin_graph(path=Ellipsis, shards=Ellipsis)