torch_geometric_signed_directed.data.signed.SignedData

Classes

SignedData

A data object describing a homogeneous signed graph.

Functions

sqrtinvdiag(→ scipy.sparse.csc_matrix)

Inverts and square-roots a positive diagonal matrix.

Module Contents

sqrtinvdiag(M: scipy.sparse.spmatrix) scipy.sparse.csc_matrix

Inverts and square-roots a positive diagonal matrix.

Parameters:

M (scipy sparse matrix) – matrix to invert

Returns:

scipy sparse matrix of inverted square-root of diagonal

class SignedData(x: torch_geometric.typing.OptTensor = None, edge_index: torch_geometric.typing.OptTensor = None, edge_attr: torch_geometric.typing.OptTensor = None, edge_weight: torch_geometric.typing.OptTensor = None, y: torch_geometric.typing.OptTensor = None, pos: torch_geometric.typing.OptTensor = None, A: torch_geometric.typing.Union[torch_geometric.typing.Tuple[scipy.sparse.spmatrix, scipy.sparse.spmatrix], scipy.sparse.spmatrix, None] = None, init_data: torch_geometric.data.Data | None = None, **kwargs)

Bases: torch_geometric.data.Data

A data object describing a homogeneous signed graph.

Parameters:
  • x (Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)

  • edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)

  • edge_attr (Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)

  • edge_weight (Tensor, optional) – Edge weights with shape [num_edges,]. (default: None)

  • y (Tensor, optional) – Graph-level or node-level ground-truth labels with arbitrary shape. (default: None)

  • pos (Tensor, optional) – Node position matrix with shape [num_nodes, num_dimensions]. (default: None)

  • A (sp.spmatrix or a tuple of sp.spmatrix, optional) – SciPy sparse adjacency matrix, or a tuple of the positive and negative parts. (default: None)

  • init_data (Data, optional) – Initial data object, whose attributes will be inherited. (default: None)

  • **kwargs (optional) – Additional attributes.

A
edge_weight
edge_index
num_nodes
separate_positive_negative()
clear_separate_attributes()
property is_signed: bool
property is_directed: bool
property is_weighted: bool
to_unweighted()
set_signed_Laplacian_features(k: int = 2)

generate the graph features using eigenvectors of the signed Laplacian matrix.

Parameters:

k (int) – The dimension of the features. Default is 2.

set_spectral_adjacency_reg_features(k: int = 2, normalization: int | None = None, tau_p=None, tau_n=None, eigens=None, mi=None)

generate the graph features using eigenvectors of the regularised adjacency matrix.

Parameters:
  • k (int) – The dimension of the features. Default is 2.

  • normalization (string) – How to normalise for cluster size:

    1. none: No normalization.

    2. "sym": Symmetric normalization \(\mathbf{A} <- \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}\)

    3. "rw": Random-walk normalization \(\mathbf{A} <- \mathbf{D}^{-1} \mathbf{A}\)

    1. "sym_sep": Symmetric normalization for the positive and negative parts separately.

    2. "rw_sep": Random-walk normalization for the positive and negative parts separately.

  • tau_p (int) – Regularisation coefficient for positive adjacency matrix.

  • tau_n (int) – Regularisation coefficient for negative adjacency matrix.

  • eigens (int) – The number of eigenvectors to take. Defaults to k.

  • mi (int) – The maximum number of iterations for which to run eigenvlue solvers. Defaults to number of nodes.

inherit_attributes(data: torch_geometric.data.Data)
node_split(train_size: torch_geometric.typing.Union[int, float] = None, val_size: torch_geometric.typing.Union[int, float] = None, test_size: torch_geometric.typing.Union[int, float] = None, seed_size: torch_geometric.typing.Union[int, float] = None, train_size_per_class: torch_geometric.typing.Union[int, float] = None, val_size_per_class: torch_geometric.typing.Union[int, float] = None, test_size_per_class: torch_geometric.typing.Union[int, float] = None, seed_size_per_class: torch_geometric.typing.Union[int, float] = None, seed: List[int] = [], data_split: int = 2)

Train/Val/Test/Seed split for node classification tasks. The size parameters can either be int or float. If a size parameter is int, then this means the actual number, if it is float, then this means a ratio. train_size or train_size_per_class is mandatory, with the former regardless of class labels. Validation and seed masks are optional. Seed masks here masks nodes within the training set, e.g., in a semi-supervised setting as described in the SSSNET: Semi-Supervised Signed Network Clustering paper. If test_size and test_size_per_class are both None, all the remaining nodes after selecting training (and validation) nodes will be included.

Parameters:
  • data (torch_geometric.data.Data or DirectedData, required) – The data object for data split.

  • train_size (int or float, optional) – The size of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size (int or float, optional) – The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size (int or float, optional) – The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size (int or float, optional) – The size of random splits for the seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • train_size_per_class (int or float, optional) – The size per class of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size_per_class (int or float, optional) – The size per class of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size_per_class (int or float, optional) – The size per class of random splits for the testing dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size_per_class (int or float, optional) – The size per class of random splits for seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • seed (An empty list or a list with the length of data_split, optional) – The random seed list for each data split.

  • data_split (int, optional) – number of splits (Default : 2)

Get train/val/test dataset for the link sign prediction task.

Arg types:
  • data (torch_geometric.data.Data or DirectedData object) - The input dataset.

  • prob_val (float, optional) - The proportion of edges selected for validation (Default: 0.05).

  • prob_test (float, optional) - The proportion of edges selected for testing (Default: 0.15).

  • splits (int, optional) - The split size (Default: 10).

  • size (int, optional) - The size of the input graph. If none, the graph size is the maximum index of nodes plus 1 (Default: None).

  • task (str, optional) - The evaluation task: four_class_signed_digraph (four-class sign and direction prediction); five_class_signed_digraph (five-class sign, direction and existence prediction); sign (link sign prediction). (Default: ‘sign’)

  • seed (int, optional) - The random seed for positve edge selection (Default: 0). Negative edges are selected by pytorch geometric negative_sampling.

  • maintain_connect (bool, optional) - If maintaining connectivity when removing edges for validation and testing. The connectivity is maintained by obtaining edges in the minimum spanning tree/forest first. These edges will not be removed for validation and testing. (Default: False).

  • ratio (float, optional) - The maximum ratio of edges used for dataset generation. (Default: 1.0)

  • device (int, optional) - The device to hold the return value (Default: ‘cpu’).

Return types:
  • datasets - A dict include training/validation/testing splits of edges and labels. For split index i:

    1. datasets[i][‘graph’] (torch.LongTensor): the observed edge list after removing edges for validation and testing.

    2. datasets[i][‘train’/’val’/’testing’][‘edges’] (List): the edge list for training/validation/testing.

    3. datasets[i][‘train’/’val’/’testing’][‘label’] (List): the labels of edges:

      • If task == “four_class_signed_digraph”: 0 (the positive directed edge exists in the graph),

        1 (the negative directed edge exists in the graph), 2 (the positive edge of the reversed direction exists), 3 (the edge of the reversed direction exists). The undirected edges in the directed input graph are removed to avoid ambiguity.

      • If task == “five_class_signed_digraph”: 0 (the positive directed edge exists in the graph),

        1 (the negative directed edge exists in the graph), 2 (the positive edge of the reversed direction exists), 3 (the edge of the reversed direction exists), 4 (the edge doesn’t exist in both directions). The undirected edges in the directed input graph are removed to avoid ambiguity.

      • If task == “sign”: 0 (negative edge), 1 (positive edge).