torch_geometric_signed_directed.utils.general.node_split

Functions

`node_class_split`(→ torch_geometric.data.Data)	Train/Val/Test/Seed split for node classification tasks.
`sample_per_class`(→ List[int])	This function is modified from https://github.com/flyingtango/DiGCN/blob/main/code/Citation.py. It samples a set of nodes per class.
`get_train_val_test_seed_split`(→ Tuple[List[int], ...)	Get train/validation/test/seed splits based on the input setting.

Module Contents

node_class_split(data: torch_geometric.data.Data, train_size: int | float = None, val_size: int | float = None, test_size: int | float = None, seed_size: int | float = None, train_size_per_class: int | float = None, val_size_per_class: int | float = None, test_size_per_class: int | float = None, seed_size_per_class: int | float = None, seed: List[int] = [], data_split: int = 10) → torch_geometric.data.Data

Train/Val/Test/Seed split for node classification tasks. The size parameters can either be int or float. If a size parameter is int, then this means the actual number, if it is float, then this means a ratio. train_size or train_size_per_class is mandatory, with the former regardless of class labels. Validation and seed masks are optional. Seed masks here masks nodes within the training set, e.g., in a semi-supervised setting as described in the SSSNET: Semi-Supervised Signed Network Clustering paper. If test_size and test_size_per_class are both None, all the remaining nodes after selecting training (and validation) nodes will be included.

Arg types:

data (Data or DirectedData, required) - The data object for data split.
train_size (int or float, optional) - The size of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.
val_size (int or float, optional) - The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.
test_size (int or float, optional) - The size of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)
seed_size (int or float, optional) - The size of random splits for the seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.
train_size_per_class (int or float, optional) - The size per class of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.
val_size_per_class (int or float, optional) - The size per class of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.
test_size_per_class (int or float, optional) - The size per class of random splits for the testing dataset. If the input is a float number, the ratio of nodes in each class will be sampled. (Default: None. All nodes not selected for training/validation are used for testing)
seed_size_per_class (int or float, optional) - The size per class of random splits for seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.
seed (An empty list or a list with the length of data_split, optional) - The random seed list for each data split.
data_split (int, optional) - number of splits (Default : 10)

Return types:

data (Data or DirectedData) - The data object includes train_mask, val_mask and test_mask.

sample_per_class(random_state: numpy.random.RandomState, labels: List[int], num_examples_per_class: int | float, forbidden_indices: List[int] | None = None, force_indices: List[int] | None = None) → List[int]

This function is modified from https://github.com/flyingtango/DiGCN/blob/main/code/Citation.py. It samples a set of nodes per class. If num_exmples_per_class is int, then this means the actual number, if it is float, then this means a ratio.

Arg types:

random_state (np.random.RandomState) - Numpy random state for random selection.
labels (List[int]) - Node labels array.
num_examples_per_class (int or float) - Number of nodes per class.
forbidden_indices (List[int]) - Nodes to be avoided when selection.
force_indices (List[int]) - Node list to be selected.

Return types:

selection (List) - A list of node indices to be selected.

get_train_val_test_seed_split(random_state: numpy.random.RandomState, labels: List[int], train_size_per_class: int | float = None, val_size_per_class: int | float = None, test_size_per_class: int | float = None, seed_size_per_class: int | float = None, train_size: int | float = None, val_size: int | float = None, test_size: int | float = None, seed_size: int | float = None) → Tuple[List[int], List[int], List[int], List[int]]

Get train/validation/test/seed splits based on the input setting. The size parameters can either be int or float. If a size parameter is int, then this means the actual number, if it is float, then this means a ratio. Train_size or train_size_per_class is mandatory, with the former regardless of class labels. Validation and seed masks are optional. Seed masks here masks nodes within the training set, e.g., in a semi-supervised setting as described in the SSSNET: Semi-Supervised Signed Network Clustering paper. If test_size and test_size_per_class are both None, all the remaining nodes after selecting training (and validation) nodes will be included.

Arg types:

random_state (np.random.RandomState): Numpy random state for random selection.
train_size (int ,optional): The size of random splits for the training dataset.
val_size (int, optional): The size of random splits for the validation dataset.
test_size (int, optional): The size of random splits for the validation dataset. (Default: None. All nodes not selected for training/validation are used for testing)
seed_size (int or float, optional): The size of random splits for the seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.
train_size_per_class (int or float, optional): The size per class of random splits for the training dataset. If the input is a float number, the ratio of nodes in each class will be sampled.
val_size_per_class (int or float, optional): The size per class of random splits for the validation dataset. If the input is a float number, the ratio of nodes in each class will be sampled.
test_size_per_class (int or float, optional): The size per class of random splits for the testing dataset. If the input is a float number, the ratio of nodes in each class will be sampled.
(Default: None. All nodes not selected for training/validation are used for testing)
seed_size_per_class (int or float, optional): The size per class of random splits for seed nodes within the training set. If the input is a float number, the ratio of nodes in each class will be sampled.

Return types:

train_indices (List) - A List includes the node indices for training.
val_indices (List) - A List includes the node indices for validation.
test_indices (List) - A List includes the node indices for testing.
seed_indices (List) - A list includes the node indices for seed nodes (could be empty).