torch_geometric_signed_directed.utils.general.node_split

Functions

node_class_split(→ torch_geometric.data.Data)

Train/Val/Test/Seed split for node classification tasks.

sample_per_class(→ List[int])

This function is modified from https://github.com/flyingtango/DiGCN/blob/main/code/Citation.py. It samples a set of nodes per class.

get_train_val_test_seed_split(→ Tuple[List[int], ...)

Get train/validation/test/seed splits based on the input setting.

Module Contents

node_class_split(data: torch_geometric.data.Data, train_size: int | float = None, val_size: int | float = None, test_size: int | float = None, seed_size: int | float = None, train_size_per_class: int | float = None, val_size_per_class: int | float = None, test_size_per_class: int | float = None, seed_size_per_class: int | float = None, seed: List[int] = [], data_split: int = 10) torch_geometric.data.Data

Train/Val/Test/Seed split for node classification tasks. The size parameters can either be int or float. If a size parameter is int, then this means the actual number, if it is float, then this means a ratio. train_size or train_size_per_class is mandatory, with the former regardless of class labels. Validation and seed masks are optional. Seed masks here masks nodes within the training set, e.g., in a semi-supervised setting as described in the SSSNET: Semi-Supervised Signed Network Clustering paper. If test_size and test_size_per_class are both None, all the remaining nodes after selecting training (and validation) nodes will be included.

Arg types:
  • data (Data or DirectedData, required) - The data object for data split.

  • train_size (int or float, optional) - The size of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size (int or float, optional) - The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size (int or float, optional) - The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size (int or float, optional) - The size of random splits for the seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • train_size_per_class (int or float, optional) - The size per class of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size_per_class (int or float, optional) - The size per class of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size_per_class (int or float, optional) - The size per class of random splits for the testing dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size_per_class (int or float, optional) - The size per class of random splits for seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • seed (An empty list or a list with the length of data_split, optional) - The random seed list for each data split.

  • data_split (int, optional) - number of splits (Default : 10)

Return types:
  • data (Data or DirectedData) - The data object includes train_mask, val_mask and test_mask.

sample_per_class(random_state: numpy.random.RandomState, labels: List[int], num_examples_per_class: int | float, forbidden_indices: List[int] | None = None, force_indices: List[int] | None = None) List[int]

This function is modified from https://github.com/flyingtango/DiGCN/blob/main/code/Citation.py. It samples a set of nodes per class. If num_exmples_per_class is int, then this means the actual number, if it is float, then this means a ratio.

Arg types:
  • random_state (np.random.RandomState) - Numpy random state for random selection.

  • labels (List[int]) - Node labels array.

  • num_examples_per_class (int or float) - Number of nodes per class.

  • forbidden_indices (List[int]) - Nodes to be avoided when selection.

  • force_indices (List[int]) - Node list to be selected.

Return types:
  • selection (List) - A list of node indices to be selected.

get_train_val_test_seed_split(random_state: numpy.random.RandomState, labels: List[int], train_size_per_class: int | float = None, val_size_per_class: int | float = None, test_size_per_class: int | float = None, seed_size_per_class: int | float = None, train_size: int | float = None, val_size: int | float = None, test_size: int | float = None, seed_size: int | float = None) Tuple[List[int], List[int], List[int], List[int]]

Get train/validation/test/seed splits based on the input setting. The size parameters can either be int or float. If a size parameter is int, then this means the actual number, if it is float, then this means a ratio. Train_size or train_size_per_class is mandatory, with the former regardless of class labels. Validation and seed masks are optional. Seed masks here masks nodes within the training set, e.g., in a semi-supervised setting as described in the SSSNET: Semi-Supervised Signed Network Clustering paper. If test_size and test_size_per_class are both None, all the remaining nodes after selecting training (and validation) nodes will be included.

Arg types:
  • random_state (np.random.RandomState): Numpy random state for random selection.

  • train_size (int ,optional): The size of random splits for the training dataset.

  • val_size (int, optional): The size of random splits for the validation dataset.

  • test_size (int, optional): The size of random splits for the validation dataset. (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size (int or float, optional): The size of random splits for the seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

  • train_size_per_class (int or float, optional): The size per class of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • val_size_per_class (int or float, optional): The size per class of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

  • test_size_per_class (int or float, optional): The size per class of random splits for the testing dataset. If the input is a float number, the ratio of nodes in each class will be sampled.

    (Default: None. All nodes not selected for training/validation are used for testing)

  • seed_size_per_class (int or float, optional): The size per class of random splits for seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

Return types:
  • train_indices (List) - A List includes the node indices for training.

  • val_indices (List) - A List includes the node indices for validation.

  • test_indices (List) - A List includes the node indices for testing.

  • seed_indices (List) - A list includes the node indices for seed nodes (could be empty).