Skip to main content
Ctrl+K
Raphtory - Home

Site Navigation

  • API reference
  • User Guide
  • Created by Pometry
  • GitHub
  • Slack
  • X (Twitter)

Site Navigation

  • API reference
  • User Guide
  • Created by Pometry
  • GitHub
  • Slack
  • X (Twitter)

Section Navigation

  • raphtory
    • graphql
      • GraphqlGraphs
      • GraphServer
      • RunningGraphServer
      • RaphtoryClient
      • RemoteGraph
      • RemoteEdge
      • RemoteNode
      • RemoteNodeAddition
      • RemoteUpdate
      • RemoteEdgeAddition
      • encode_graph
      • decode_graph
    • algorithms
      • Matching
      • Infected
      • dijkstra_single_source_shortest_paths
      • global_reciprocity
      • betweenness_centrality
      • all_local_reciprocity
      • triplet_count
      • local_triangle_count
      • average_degree
      • directed_graph_density
      • degree_centrality
      • max_degree
      • min_degree
      • max_out_degree
      • max_in_degree
      • min_out_degree
      • min_in_degree
      • pagerank
      • single_source_shortest_path
      • global_clustering_coefficient
      • temporally_reachable_nodes
      • temporal_bipartite_graph_projection
      • local_clustering_coefficient
      • local_clustering_coefficient_batch
      • weakly_connected_components
      • strongly_connected_components
      • in_components
      • in_component
      • out_components
      • out_component
      • fast_rp
      • global_temporal_three_node_motif
      • global_temporal_three_node_motif_multi
      • local_temporal_three_node_motifs
      • hits
      • balance
      • label_propagation
      • temporal_SEIR
      • louvain
      • fruchterman_reingold
      • cohesive_fruchterman_reingold
      • max_weight_matching
    • graph_loader
      • lotr_graph
      • lotr_graph_with_props
      • neo4j_movie_graph
      • stable_coin_graph
      • reddit_hyperlink_graph
      • reddit_hyperlink_graph_local
      • karate_club_graph
    • graph_gen
      • random_attachment
      • ba_preferential_attachment
    • vectors
      • VectorisedGraph
      • Document
      • Embedding
      • VectorSelection
    • node_state
      • NodeGroups
      • DegreeView
      • NodeStateUsize
      • NodeStateU64
      • NodeStateOptionI64
      • IdView
      • NodeStateGID
      • EarliestTimeView
      • LatestTimeView
      • NameView
      • NodeStateString
      • EarliestDateTimeView
      • LatestDateTimeView
      • NodeStateOptionDateTime
      • HistoryView
      • NodeStateListI64
      • HistoryDateTimeView
      • NodeStateOptionListDateTime
      • NodeTypeView
      • NodeStateOptionStr
      • NodeStateListDateTime
      • NodeStateWeightedSP
      • NodeStateF64
      • NodeStateNodes
      • NodeStateReachability
      • NodeStateListF64
      • NodeStateMotifs
      • NodeStateHits
      • NodeStateSEIR
      • NodeLayout
    • filter
    • nullmodels
      • permuted_timestamps_model
      • shuffle_column
      • shuffle_multiple_columns
    • plottingutils
      • ccdf
      • cdf
      • global_motif_heatplot
      • human_format
      • lorenz
      • ordinal_number
      • to_motif_matrix
    • GraphView
    • Graph
    • PersistentGraph
    • Node
    • Nodes
    • PathFromNode
    • PathFromGraph
    • MutableNode
    • Edge
    • Edges
    • NestedEdges
    • MutableEdge
    • Properties
    • ConstantProperties
    • TemporalProperties
    • PropertiesView
    • TemporalProp
    • Prop
    • PropertyFilter
    • WindowSet
  • typing
  • API reference
  • graph_loader
  • reddit_hyper...

reddit_hyperlink_graph#

reddit_hyperlink_graph(timeout_seconds=600)#

Load (a subset of) Reddit hyperlinks dataset into a graph. The dataset is available at http://snap.stanford.edu/data/soc-redditHyperlinks-title.tsv The hyperlink network represents the directed connections between two subreddits (a subreddit is a community_detection on Reddit). We also provide subreddit embeddings. The network is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017. NOTE: It may take a while to download the dataset

Dataset statistics:
  • Number of nodes (subreddits) 35,776

  • Number of edges (hyperlink between subreddits) 137,821

  • Timespan Jan 2014 - April 2017

Source:
  • S. Kumar, W.L. Hamilton, J. Leskovec, D. Jurafsky. Community Interaction and Conflict on the Web. World Wide Web Conference, 2018.

Properties:

  • SOURCE_SUBREDDIT: the subreddit where the link originates

  • TARGET_SUBREDDIT: the subreddit where the link ends

  • POST_ID: the post in the source subreddit that starts the link

  • TIMESTAMP: time time of the post

  • POST_LABEL: label indicating if the source post is explicitly negative towards the target post. The value is -1 if the source is negative towards the target, and 1 if it is neutral or positive. The label is created using crowd-sourcing and training a text based classifier, and is better than simple sentiment analysis of the posts. Please see the reference paper for details.

  • POST_PROPERTIES: a vector representing the text properties of the source post, listed as a list of comma separated numbers. This can be found on the source website

Parameters:
  • shards – The number of shards to use for the graph

  • timeout_seconds – The number of seconds to wait for the dataset to download

Returns:

A Graph containing the Reddit hyperlinks dataset

Return type:

Graph

previous

stable_coin_graph

next

reddit_hyperlink_graph_local

On this page
  • reddit_hyperlink_graph()

© Copyright 2023, Pometry.

Created using Sphinx 7.4.7.

Built with the PyData Sphinx Theme 0.14.1.