Introduction

PyTorch Geometric Signed Directed is a signed/directed graph neural network extension library for PyTorch Geometric. It builds on open-source deep-learning and graph processing libraries. PyTorch Geometric Signed Directed consists of various signed and directed geometric deep learning, embedding, and clustering methods from a variety of published research papers and selected preprints.

Citing

If you find PyTorch Geometric Signed Directed useful in your research, please consider adding the following citation:

>@article{he2022pytorch,
        title={{PyTorch Geometric Signed Directed: A Software Package on Graph Neural Networks for Signed and Directed Graphs}},
        author={He, Yixuan and Zhang, Xitong and Huang, Junjie and Rozemberczki, Benedek and Cucuringu, Mihai and Reinert, Gesine},
        journal={arXiv preprint arXiv:2202.10793},
        year={2022}
        }

We briefly overview the fundamental concepts and features of PyTorch Geometric Signed Directed through simple examples.

Data Structures

PyTorch Geometric Signed Directed is designed to provide easy to use data loaders and data generators.

Data Classes

PyTorch Geometric Temporal offers data classes for signed and directed datasets.

  • SignedData - Is designed for signed networks (possibly directed and weighted) defined on a static graph.

  • DirectedData - Is designed for directed networks (possibly weighted) defined on a static graph.

Directed Unsigned Data Class

A directed data object is a PyTorch Geometric Data object. Please take a look at this readme for the details. The returned data object has the following major attributes:

  • edge_index - A PyTorch LongTensor of edge indices stored in COO format (optional).

  • edge_weight - A PyTorch FloatTensor of edge weights stored in COO format (optional).

  • edge_attr - A PyTorch FloatTensor of edge features stored in COO format (optional).

  • x - A PyTorch FloatTensor of vertex features (optional).

  • y - A PyTorch LongTensor of node labels (optional).

  • A - An Scipy.sparse spmatrix of the adjacency matrix (optional).

Signed Data Class (compatible with signed undirected and signed directed graphs)

A signed data object is a PyTorch Geometric Data object. Please take a look at this readme for the details. The returned data object has the following major attributes:

  • edge_index - A PyTorch LongTensor of edge indices stored in COO format (optional).

  • edge_weight - A PyTorch FloatTensor of edge weights stored in COO format (optional).

  • edge_attr - A PyTorch FloatTensor of edge features stored in COO format (optional).

  • x - A PyTorch FloatTensor of vertex features (optional).

  • y - A PyTorch LongTensor of node labels (optional).

  • A - An Scipy.sparse spmatrix of the adjacency matrix (optional).

  • edge_index_p - A PyTorch LongTensor of edge indices for the positive part of the adjacency matrix stored in COO format (optional).

  • edge_weight_p - A PyTorch FloatTensor of edge weights for the positive part of the adjacency matrix stored in COO format (optional).

  • A_p - An Scipy.sparse spmatrix of the positive part of the adjacency matrix (optional).

  • edge_index_n - A PyTorch LongTensor of edge indices for the negative part of the adjacency matrix stored in COO format (optional).

  • edge_weight_n - A PyTorch FloatTensor of edge weights for the negative part of the adjacency matrix stored in COO format (optional).

  • A_n - An Scipy.sparse spmatrix of the negative part of the adjacency matrix (optional).

Benchmark Datasets

We released and included a number of datasets which can be used for comparing the performance of signed/directed graph neural networks algorithms. The related machine learning tasks are node and edge level learning. We also provide synthetic data generators for both signed and directed networks.

Real-World Data Loaders

For example, the Telegram Dataset can be loaded by the following code snippet. The dataset returned is a DirectedData object.

from torch_geometric_signed_directed.data import load_directed_real_data

dataset = load_directed_real_data(dataset='telegram', root='./tmp_data/')

Node Splitting

We provide a function to create node splits of the data objects. The size parameters can either be int or float. If a size parameter is int, then this means the actual number, if it is float, then this means a ratio. train_size or train_size_per_class is mandatory, with the former regardless of class labels. Validation and seed masks are optional. Seed masks here masks nodes within the training set, e.g., in a semi-supervised setting as described in the SSSNET: Semi-Supervised Signed Network Clustering paper. If test_size and test_size_per_class are both None, all the remaining nodes after selecting training (and validation) nodes will be included. This function returns the new data object with train, validation, test and possibly also seed (some parts within the training set) masks. The splitting can either be done via data loading or separately.

from torch_geometric_signed_directed.data import load_directed_real_data

dataset = load_directed_real_data(dataset='telegram', root='./tmp_data/', train_size_per_class=0.8, val_size_per_class=0.1, test_size_per_class=0.1)

dataset.node_split(train_size_per_class=0.8, val_size_per_class=0.1, test_size_per_class=0.1, seed_size_per_class=0.1)

Edge Splitting

We provide a function to create edge splits. The splitting can either be done via data loading or separately.

Directed Unsigned Edge Splitting

from torch_geometric_signed_directed.data import load_directed_real_data
from torch_geometric_signed_directed.utils import link_split

directed_dataset = load_directed_real_data(dataset='telegram', root='./tmp_data/')
datasets = link_class_split(directed_dataset, prob_val = 0.15, prob_test = 0.05, task = 'direction')
from torch_geometric_signed_directed.data import load_directed_real_data

directed_dataset = load_directed_real_data(dataset='telegram', root='./tmp_data/')
datasets = directed_dataset.link_split(prob_val = 0.15, prob_test = 0.05, task = 'direction')