PyTorch Geometric Signed Directed Data Generators and Data Loaders

Data Classes

class SignedData(x: Optional[torch.Tensor] = None, edge_index: Optional[torch.Tensor] = None, edge_attr: Optional[torch.Tensor] = None, edge_weight: Optional[torch.Tensor] = None, y: Optional[torch.Tensor] = None, pos: Optional[torch.Tensor] = None, A: Optional[Union[Tuple[scipy.sparse.base.spmatrix, scipy.sparse.base.spmatrix], scipy.sparse.base.spmatrix]] = None, init_data: Optional[torch_geometric.data.data.Data] = None, **kwargs)[source]

A data object describing a homogeneous signed graph.

Parameters
  • x (Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)

  • edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)

  • edge_attr (Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)

  • edge_weight (Tensor, optional) – Edge weights with shape [num_edges,]. (default: None)

  • y (Tensor, optional) – Graph-level or node-level ground-truth labels with arbitrary shape. (default: None)

  • pos (Tensor, optional) – Node position matrix with shape [num_nodes, num_dimensions]. (default: None)

  • A (sp.spmatrix or a tuple of sp.spmatrix, optional) – SciPy sparse adjacency matrix, or a tuple of the positive and negative parts. (default: None)

  • init_data (Data, optional) – Initial data object, whose attributes will be inherited. (default: None)

  • **kwargs (optional) – Additional attributes.

property is_directed: bool

Returns True if graph edges are directed.

Get train/val/test dataset for the link sign prediction task.

Arg types:
  • data (torch_geometric.data.Data or DirectedData object) - The input dataset.

  • prob_val (float, optional) - The proportion of edges selected for validation (Default: 0.05).

  • prob_test (float, optional) - The proportion of edges selected for testing (Default: 0.15).

  • splits (int, optional) - The split size (Default: 10).

  • size (int, optional) - The size of the input graph. If none, the graph size is the maximum index of nodes plus 1 (Default: None).

  • task (str, optional) - The evaluation task: four_class_signed_digraph (four-class sign and direction prediction); five_class_signed_digraph (five-class sign, direction and existence prediction); sign (link sign prediction). (Default: ‘sign’)

  • seed (int, optional) - The random seed for positve edge selection (Default: 0). Negative edges are selected by pytorch geometric negative_sampling.

  • maintain_connect (bool, optional) - If maintaining connectivity when removing edges for validation and testing. The connectivity is maintained by obtaining edges in the minimum spanning tree/forest first. These edges will not be removed for validation and testing. (Default: False).

  • ratio (float, optional) - The maximum ratio of edges used for dataset generation. (Default: 1.0)

  • device (int, optional) - The device to hold the return value (Default: ‘cpu’).

Return types:
  • datasets - A dict include training/validation/testing splits of edges and labels. For split index i:

    1. datasets[i][‘graph’] (torch.LongTensor): the observed edge list after removing edges for validation and testing.

    2. datasets[i][‘train’/’val’/’testing’][‘edges’] (List): the edge list for training/validation/testing.

    3. datasets[i][‘train’/’val’/’testing’][‘label’] (List): the labels of edges:

      • If task == “four_class_signed_digraph”: 0 (the positive directed edge exists in the graph),

        1 (the negative directed edge exists in the graph), 2 (the positive edge of the reversed direction exists), 3 (the edge of the reversed direction exists). The undirected edges in the directed input graph are removed to avoid ambiguity.

      • If task == “five_class_signed_digraph”: 0 (the positive directed edge exists in the graph),

        1 (the negative directed edge exists in the graph), 2 (the positive edge of the reversed direction exists), 3 (the edge of the reversed direction exists), 4 (the edge doesn’t exist in both directions). The undirected edges in the directed input graph are removed to avoid ambiguity.

      • If task == “sign”: 0 (negative edge), 1 (positive edge).

node_split(train_size: Optional[Union[int, float]] = None, val_size: Optional[Union[int, float]] = None, test_size: Optional[Union[int, float]] = None, seed_size: Optional[Union[int, float]] = None, train_size_per_class: Optional[Union[int, float]] = None, val_size_per_class: Optional[Union[int, float]] = None, test_size_per_class: Optional[Union[int, float]] = None, seed_size_per_class: Optional[Union[int, float]] = None, seed: List[int] = [], data_split: int = 2)[source]

Train/Val/Test/Seed split for node classification tasks. The size parameters can either be int or float. If a size parameter is int, then this means the actual number, if it is float, then this means a ratio. train_size or train_size_per_class is mandatory, with the former regardless of class labels. Validation and seed masks are optional. Seed masks here masks nodes within the training set, e.g., in a semi-supervised setting as described in the SSSNET: Semi-Supervised Signed Network Clustering paper. If test_size and test_size_per_class are both None, all the remaining nodes after selecting training (and validation) nodes will be included.

Parameters
  • data (torch_geometric.data.Data or DirectedData, required) – The data object for data split.

  • train_size (int or float, optional) – The size of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size (int or float, optional) – The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size (int or float, optional) – The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size (int or float, optional) – The size of random splits for the seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • train_size_per_class (int or float, optional) – The size per class of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size_per_class (int or float, optional) – The size per class of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size_per_class (int or float, optional) – The size per class of random splits for the testing dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size_per_class (int or float, optional) – The size per class of random splits for seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • seed (An empty list or a list with the length of data_split, optional) – The random seed list for each data split.

  • data_split (int, optional) – number of splits (Default : 2)

set_signed_Laplacian_features(k: int = 2)[source]

generate the graph features using eigenvectors of the signed Laplacian matrix.

Parameters

k (int) – The dimension of the features. Default is 2.

set_spectral_adjacency_reg_features(k: int = 2, normalization: Optional[int] = None, tau_p=None, tau_n=None, eigens=None, mi=None)[source]

generate the graph features using eigenvectors of the regularised adjacency matrix.

Parameters
  • k (int) – The dimension of the features. Default is 2.

  • normalization (string) –

    How to normalise for cluster size:

    1. none: No normalization.

    2. "sym": Symmetric normalization \(\mathbf{A} <- \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}\)

    3. "rw": Random-walk normalization \(\mathbf{A} <- \mathbf{D}^{-1} \mathbf{A}\)

    1. "sym_sep": Symmetric normalization for the positive and negative parts separately.

    2. "rw_sep": Random-walk normalization for the positive and negative parts separately.

  • tau_p (int) – Regularisation coefficient for positive adjacency matrix.

  • tau_n (int) – Regularisation coefficient for negative adjacency matrix.

  • eigens (int) – The number of eigenvectors to take. Defaults to k.

  • mi (int) – The maximum number of iterations for which to run eigenvlue solvers. Defaults to number of nodes.

class DirectedData(x: Optional[torch.Tensor] = None, edge_index: Optional[torch.Tensor] = None, edge_attr: Optional[torch.Tensor] = None, edge_weight: Optional[torch.Tensor] = None, y: Optional[torch.Tensor] = None, pos: Optional[torch.Tensor] = None, A: Optional[scipy.sparse.base.spmatrix] = None, init_data: Optional[torch_geometric.data.data.Data] = None, **kwargs)[source]

A data object describing a homogeneous directed graph.

Parameters
  • x (Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)

  • edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)

  • edge_attr (Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)

  • edge_weight (Tensor, optional) – Edge weights with shape [num_edges,]. (default: None)

  • y (Tensor, optional) – Graph-level or node-level ground-truth labels with arbitrary shape. (default: None)

  • pos (Tensor, optional) – Node position matrix with shape [num_nodes, num_dimensions]. (default: None)

  • A (sp.spmatrix, optional) – SciPy sparse adjacency matrix. (default: None)

  • init_data (Data, optional) – Initial data object, whose attributes will be inherited. (default: None)

  • **kwargs (optional) – Additional attributes.

property is_directed: bool

Returns True if graph edges are directed.

Get train/val/test dataset for the link prediction task.

Arg types:
  • prob_val (float, optional) - The proportion of edges selected for validation (Default: 0.05).

  • prob_test (float, optional) - The proportion of edges selected for testing (Default: 0.15).

  • splits (int, optional) - The split size (Default: 2).

  • size (int, optional) - The size of the input graph. If none, the graph size is the maximum index of nodes plus 1 (Default: None).

  • task (str, optional) - The evaluation task: three_class_digraph (three-class link prediction); direction (direction prediction); existence (existence prediction). (Default: ‘direction’)

  • seed (int, optional) - The random seed for dataset generation (Default: 0).

  • ratio (float, optional) - The maximum ratio of edges used for dataset generation. (Default: 1.0)

  • maintain_connect (bool, optional) - If maintaining connectivity when removing edges for validation and testing. The connectivity is maintained by obtaining edges in the minimum spanning tree/forest first. These edges will not be removed for validation and testing (Default: True).

  • device (int, optional) - The device to hold the return value (Default: ‘cpu’).

Return types:
  • datasets - A dict include training/validation/testing splits of edges and labels. For split index i:

    • datasets[i][‘graph’] (torch.LongTensor): the observed edge list after removing edges for validation and testing.

    • datasets[i][‘train’/’val’/’testing’][‘edges’] (List): the edge list for training/validation/testing.

    • datasets[i][‘train’/’val’/’testing’][‘label’] (List): the labels of edges:

      • If task == “existence”: 0 (the directed edge exists in the graph), 1 (the edge doesn’t exist).The undirected edges in the directed input graph are removed to avoid ambiguity.

      • If task == “direction”: 0 (the directed edge exists in the graph), 1 (the edge of the reversed direction exists). The undirected edges in the directed input graph are removed to avoid ambiguity.

      • If task == “three_class_digraph”: 0 (the directed edge exists in the graph), 1 (the edge of the reversed direction exists), 2 (the edge doesn’t exist in both directions). The undirected edges in the directed input graph are removed to avoid ambiguity.

node_split(train_size: Optional[Union[int, float]] = None, val_size: Optional[Union[int, float]] = None, test_size: Optional[Union[int, float]] = None, seed_size: Optional[Union[int, float]] = None, train_size_per_class: Optional[Union[int, float]] = None, val_size_per_class: Optional[Union[int, float]] = None, test_size_per_class: Optional[Union[int, float]] = None, seed_size_per_class: Optional[Union[int, float]] = None, seed: List[int] = [], data_split: int = 2)[source]

Train/Val/Test/Seed split for node classification tasks. The size parameters can either be int or float. If a size parameter is int, then this means the actual number, if it is float, then this means a ratio. train_size or train_size_per_class is mandatory, with the former regardless of class labels. Validation and seed masks are optional. Seed masks here masks nodes within the training set, e.g., in a semi-supervised setting as described in the SSSNET: Semi-Supervised Signed Network Clustering paper. If test_size and test_size_per_class are both None, all the remaining nodes after selecting training (and validation) nodes will be included.

Parameters
  • data (torch_geometric.data.Data or DirectedData, required) – The data object for data split.

  • train_size (int or float, optional) – The size of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size (int or float, optional) – The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size (int or float, optional) – The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size (int or float, optional) – The size of random splits for the seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • train_size_per_class (int or float, optional) – The size per class of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size_per_class (int or float, optional) – The size per class of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size_per_class (int or float, optional) – The size per class of random splits for the testing dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size_per_class (int or float, optional) – The size per class of random splits for seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • seed (An empty list or a list with the length of data_split, optional) – The random seed list for each data split.

  • data_split (int, optional) – number of splits (Default : 2)

set_hermitian_features(k: int = 2)[source]

create Hermitian feature (rw normalized)

Parameters

k (int) – Half of the dimension of features. Default is 2.

Data Generators

SSBM(n: int, k: int, pin: float, etain: float, pout: Optional[float] = None, size_ratio: float = 2, etaout: Optional[float] = None, values: str = 'ones')Tuple[Tuple[scipy.sparse.base.spmatrix, scipy.sparse.base.spmatrix], numpy.array][source]

A signed stochastic block model graph generator from the SSSNET: Semi-Supervised Signed Network Clustering paper.

Arg types:
  • n (int) - Number of nodes.

  • k (int) - Number of communities.

  • pin (float) - Sparsity value within communities.

  • etain (float) - Noise value within communities.

  • pout (float) - Sparsity value between communities.

  • etaout (float) - Noise value between communities.

  • size_ratio (float) - The communities have number of nodes multiples of each other, with the largest size_ratio times the number of nodes of the smallest.

  • values (string) - Edge weight distribution (within community and without sign flip; otherwise weight is negated):

    1. ones: Weights are 1.

    2. "exp": Weights are exponentially distributed, with parameter 1.

    3. "uniform": Weights are uniformly distributed between 0 and 1.

Return types:
  • A_p (sp.spmatrix) - A sparse adjacency matrix for the positive part.

  • A_n (sp.spmatrix) - A sparse adjacency matrix for the negative part.

  • labels (np.array) - Labels.

polarized_SSBM(total_n: int = 100, num_com: int = 3, N: int = 30, K: int = 2, p: float = 0.1, eta: float = 0.1, size_ratio: float = 1)Tuple[Tuple[scipy.sparse.base.spmatrix, scipy.sparse.base.spmatrix], numpy.array, numpy.array][source]

A polarized signed stochastic block model graph generator from the SSSNET: Semi-Supervised Signed Network Clustering paper.

Arg types:
  • total_n (int) - Total number of nodes in the polarized network.

  • num_com (int) - Number of conflicting communities.

  • N (int) - Default size of an SSBM community.

  • K (int) - Number of blocks(clusters) within a conflicting community.

  • p (int) - Probability of existence of an edge.

  • eta (float) - Sign flip probability, 0 <= eta <= 0.5.

  • size_ratio (float) - The communities have number of nodes multiples of each other, with the largest size_ratio times the number of nodes of the smallest.

Return types:
  • A_p_new, A_n_new (sp.spmatrix) - Positive and negative parts of the polarized network.

  • labels_new (np.array) - Ordered labels of the nodes, with conflicting communities labeled together, cluster 0 is the ambient cluster.

  • conflict_groups (np.array) - An array indicating which conflicting group the node is in, 0 is ambient.

DSBM(N: int, K: int, p: float, F: numpy.array, size_ratio: float = 1)Tuple[scipy.sparse.base.spmatrix, numpy.array][source]

A directed stochastic block model graph generator from the DIGRAC: Digraph Clustering Based on Flow Imbalance paper.

Arg types:
  • N (int) - Number of nodes.

  • K (int) - Number of clusters.

  • p (float) - Sparsity value, edge probability.

  • F (np.array) - The meta-graph adjacency matrix to generate edges.

  • size_ratio (float) - The communities have number of nodes multiples of each other, with the largest size_ratio times the number of nodes of the smallest. A geometric sequence is generated to denote the node size of each cluster based on the size_ratio.

Return types:
  • a (sp.csr_matrix) - a is a sparse N by N matrix of the edges.

  • c (np.array) - c is an array of cluster membership.

SDSBM(N: int, K: int, p: float, F: numpy.array, size_ratio: float = 1, eta: float = 0.1)Tuple[scipy.sparse.base.spmatrix, numpy.array][source]

A signed directed stochastic block model graph generator from the MSGNN: A Spectral Graph Neural Network Based on a Novel Magnetic Signed Laplacian paper.

Arg types:
  • N (int) - Number of nodes.

  • K (int) - Number of clusters.

  • p (float) - Sparsity value, edge probability.

  • F (np.array) - The meta-graph adjacency matrix to generate edges.

  • size_ratio (float) - The communities have number of nodes multiples of each other, with the largest size_ratio times the number of nodes of the smallest. A geometric sequence is generated to denote the node size of each cluster based on the size_ratio.

  • eta (float) - Sign flip probability.

Return types:
  • a (sp.csr_matrix) - a is a sparse N by N matrix of the edges.

  • c (np.array) - c is an array of cluster membership.

Data Loaders

load_directed_real_data(dataset: str = 'WebKB', root: str = './', name: str = 'Texas', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, train_size: Optional[Union[int, float]] = None, val_size: Optional[Union[int, float]] = None, test_size: Optional[Union[int, float]] = None, seed_size: Optional[Union[int, float]] = None, train_size_per_class: Optional[Union[int, float]] = None, val_size_per_class: Optional[Union[int, float]] = None, test_size_per_class: Optional[Union[int, float]] = None, seed_size_per_class: Optional[Union[int, float]] = None, seed: List[int] = [], data_split: int = 10)torch_geometric_signed_directed.data.directed.DirectedData.DirectedData[source]

The function for real-world directed data downloading and convert to DirectedData object.

Arg types:
  • dataset (str, optional) - Data set name (default: ‘WebKB’).

  • root (str, optional) - The path to save the dataset (default: ‘./’).

  • name (str, optional) - The name of the subdataset (default: ‘Texas’).

  • transform (callable, optional) - A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) - A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • train_size (int or float, optional) - The size of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size (int or float, optional) - The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size (int or float, optional) - The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size (int or float, optional) - The size of random splits for the seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • train_size_per_class (int or float, optional) - The size per class of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size_per_class (int or float, optional) - The size per class of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size_per_class (int or float, optional) - The size per class of random splits for the testing dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size_per_class (int or float, optional) - The size per class of random splits for seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • seed (An empty list or a list with the length of data_split, optional) - The random seed list for each data split.

  • data_split (int, optional) - number of splits (Default : 10)

Return types:
  • data (Data) - The required data object.

load_signed_real_data(dataset: str = 'epinions', root: str = './tmp_data/', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, train_size: Optional[Union[int, float]] = None, val_size: Optional[Union[int, float]] = None, test_size: Optional[Union[int, float]] = None, seed_size: Optional[Union[int, float]] = None, train_size_per_class: Optional[Union[int, float]] = None, val_size_per_class: Optional[Union[int, float]] = None, test_size_per_class: Optional[Union[int, float]] = None, seed_size_per_class: Optional[Union[int, float]] = None, seed: List[int] = [], data_split: int = 10, sparsify_level: float = 1)torch_geometric_signed_directed.data.signed.SignedData.SignedData[source]

The function for real-world signed data downloading and convert to SignedData object.

Arg types:
  • dataset (str, optional) - data set name (default: ‘epinions’).

  • root (str, optional) - The path to save the dataset (default: ‘./’).

  • transform (callable, optional) - A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) - A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • train_size (int or float, optional) - The size of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size (int or float, optional) - The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size (int or float, optional) - The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size (int or float, optional) - The size of random splits for the seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • train_size_per_class (int or float, optional) - The size per class of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size_per_class (int or float, optional) - The size per class of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size_per_class (int or float, optional) - The size per class of random splits for the testing dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size_per_class (int or float, optional) - The size per class of random splits for seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • seed (An empty list or a list with the length of data_split, optional) - The random seed list for each data split.

  • data_split (int, optional) - number of splits (Default : 10)

  • sparsify_level (float, optional) - the density of the graph, a value between 0 and 1, for MSGNN data only. Default: 1.

Return types:
  • data (Data) - The required data object.

class DIGRAC_real_data(name: str, root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Data loader for the data sets used in the DIGRAC: Digraph Clustering Based on Flow Imbalance” paper.

Parameters
  • name (str) – Name of the data set, choices are: ‘blog’, ‘wikitalk’, ‘migration’, ‘lead_lag”+str(year) (year from 2001 to 2019).

  • root (str) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

class SSSNET_real_data(name: str, root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Data loader for the data sets used in the SSSNET: Semi-Supervised Signed Network Clustering paper.

Parameters
  • name (str) – Name of the data set, choices are: ‘rainfall’, ‘PPI’, ‘wikirfa’, ‘sampson’, ‘SP1500’, ‘Fin_YNet”+str(year) (year from 2000 to 2020).

  • root (str) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

class SDGNN_real_data(name: str, root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Signed Directed Graph from the “SDGNN: Learning Node Representation for Signed Directed Networks” paper, consising of five different datasets: Bitcoin-Alpha, Bitcoin-OTC, Wikirfa, Slashdot and Epinions from snap.stanford.edu.

Parameters
  • name (str) – Name of the dataset, choices are: ‘bitcoin_alpha’, ‘bitcoin_otc’, ‘wiki’, ‘epinions’, ‘slashdot’.

  • root (str) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names: str

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names: str

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

class MSGNN_real_data(name: str, root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, sparsify_level: float = 1)[source]

Data loader for the data sets used in the MSGNN: A Spectral Graph Neural Network Based on a Novel Magnetic Signed Laplacian paper.

Parameters
  • name (str) – Name of the data set, choices are: ‘FiLL-pvCLCL”+str(year), ‘FiLL-OPCL”+str(year) (year from 2000 to 2020).

  • root (str) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • sparsify_level (float, optional) – the density of the graph, a value between 0 and 1. Default: 1.

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

class Telegram(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Data loader for the Telegram data set used in the MagNet: A Neural Network for Directed Graphs. paper.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

class WikiCS(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

This is the copy of the torch_geometric.datasets.WikiCS (v1.6.3)

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

class Citeseer(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Data loader for the CiteSeer data set used in the MagNet: A Neural Network for Directed Graphs. paper.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

class Cora_ml(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Data loader for the Cora_ML data set used in the MagNet: A Neural Network for Directed Graphs. paper.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

class WikipediaNetwork(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The code is modified from torch_geometric.datasets.WikipediaNetwork (v1.6.3)

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Cornell", "Chameleon" "Squirrel").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.