首页 > 其他分享> > Task01：简单图论与环境配置与PyG

Task01：简单图论与环境配置与PyG

2021-06-16 17:32:44 作者：互联网

Task01：简单图论与环境配置与PyG

一、简单图论

具体可以参考datawhale开源资料

结合以上知识，概括图在药物发现领域的简要概念（待补充）：

定义一（分子图）：

分子图被记为 G = { V , E } \mathcal{G}=\{\mathcal{V}, \mathcal{E}\} G={V,E}，其中 V = { v 1 , … , v N } \mathcal{V}=\left\{v_{1}, \ldots, v_{N}\right\} V={v1,…,vN}是数量为 N = ∣ V ∣ N=|\mathcal{V}| N=∣V∣ 的原子的集合， E = { e 1 , … , e M } \mathcal{E}=\left\{e_{1}, \ldots, e_{M}\right\} E={e1,…,eM} 是数量为 M M M 的化学键的集合。

分子图结构数据上的机器学习

节点预测：预测节点的类别或某类属性的取值
1. 例子：预测原子类型、可以用于分子表示的预训练（如Bert中的MLM）
边预测：预测两个节点间是否存在链接
1. 例子：蛋白质相互作用，药物相互作用
图的预测：对不同的图进行分类或预测图的属性
1. 例子：分子属性预测
节点聚类：检测节点是否形成一个社区
1. 例子：官能团（motif）的识别
其他任务
1. 图生成：例如分子生成
2. 逆反应合成
3. ……

二、环境配置

在自己的电脑和实验室服务器上分别安装。首先，查看电脑上安装的pytorch版本以及服务器上安装的pytorch版本和cudatoolkit版本：
在这里插入图片描述

安装正确版本的PyG

# 通用形式
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric

# 电脑安装
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-geometric

# 服务器安装
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cu101.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.7.0+cu101.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.7.0+cu101.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.7.0+cu101.html
pip install torch-geometric

测试是否安装成功：
在这里插入图片描述

三、PyG中图与图数据集的表示和使用

PyTorch Geometric Library (简称 PyG) 是一个面向几何深度学习的PyTorch的扩展库，几何深度学习指的是应用于图和其他不规则、非结构化数据的深度学习。基于PyG库，我们可以轻松地根据数据生成一个图对象，然后很方便的使用它；我们也可以容易地为一个图数据集构造一个数据集类，然后很方便的将它用于神经网络。PyG的作者是Matthias Fey，他的主页是Github。

`Data`和`Dataset`对象的创建

这里先贴几个相关链接，阅读这些内容会对Data和Dataset类有很直观的理解：

使用datawhale开源代码下载Planetoid中Cora数据集的时候遇到了问题：

from torch_geometric.datasets import Planetoid

dataset = Planetoid(root='/dataset/Cora', name='Cora')
# Cora()

报错1：因为防火墙原因不能稳定访问github，可以通过修改数据集的url链接来解决，具体可以见这篇文章。
报错2：按道理更改url之后，能正常的下载数据集，学习群里面的很多人都能正常下载，但是我却发生了下面的错误：

这应该是下载的数据出现了错误。最终的解决方法是，直接将从github下载好的数据替换掉原有的数据，数据加载成功。

四、作业

请通过继承Data类实现一个类，专门用于表示“机构-作者-论文”的网络。该网络包含“机构“、”作者“和”论文”三类节点，以及“作者-机构“和“作者-论文“两类边。对要实现的类的要求：1）用不同的属性存储不同节点的属性；2）用不同的属性存储不同的边（边没有属性）；3）逐一实现获取不同节点数量的方法。

直播提供的代码如下：
在这里插入图片描述
参考他人代码1、2并整理下：

# Data类的构造函数：
'''
【OAP，机构-作者-论文】
O-Orginazation,机构；
A-Author,作者；
P-Paper,论文
'''
import torch
from torch_geometric.data import Data


class OAP_Data(Data):
    def __init__(self, x_O=None, x_A=None, x_P=None, edge_index_A_O=None, edge_index_A_P=None, edge_attr_A_O=None, edge_attr_A_P=None, y=None, **kwargs):
        r"""
        Args:
            x_O (Tensor, optional): 节点属性矩阵，大小为`[num_nodes_O, num_node_O_features]`
            x_A (Tensor, optional): 节点属性矩阵，大小为`[num_nodes_A, num_node_A_features]`
            x_P (Tensor, optional): 节点属性矩阵，大小为`[num_nodes_P, num_node_P_features]`
            edge_index_A_O (LongTensor, optional): 边索引矩阵，大小为`[2, num_edges_A_O]`，第0行为尾节点，第1行为头节点，头指向尾
            edge_index_A_P (LongTensor, optional): 边索引矩阵，大小为`[2, num_edges_A_P]`，第0行为尾节点，第1行为头节点，头指向尾    
            edge_attr_A_O (Tensor, optional): 边属性矩阵，大小为`[num_edges_A_O, 1]`  # 边没有属性，故列为1
            edge_attr_A_P (Tensor, optional): 边属性矩阵，大小为`[num_edges_A_P, 1]`  # 边没有属性，故列为1
            y (Tensor, optional): 节点或图的标签，任意大小（也可以是边的标签）
        """
        self.x_O = x_O  # 机构类节点
        self.x_A = x_A  # 作者类节点
        self.x_P = x_P  # 论文类节点
        self.edge_index_A_O = edge_index_A_O  # 作者-机构边的序号
        self.edge_index_A_P = edge_index_A_P  # 作者-论文边的序号
        # 边没有属性
        self.edge_attr_A_O = edge_attr_A_O  # 作者-机构边的属性
        self.edge_attr_A_P = edge_attr_A_P  # 作者-论文边的属性
        self.y = y  # 标签
    
    # 实例方法  
    @property
    def num_nodes_O(self):
        return self.x_O.shape[0]   # 机构节点数量
    
    @property
    def num_nodes_A(self):
        return self.x_A.shape[0]   # 作者节点数量
    
    @property    
    def num_nodes_P(self):
        return self.x_P.shape[0]   # 论文节点数量
    
    @property
    def num_edges_A_O(self):
        return self.edge_index_A_O.shape[1]   # 作者-机构边数量
    
    @property
    def num_edges_A_P(self):
        return self.edge_index_A_P.shape[1]   # 作者-论文边数量

# 构造数据：假设作者为3，出版机构为4，论文共5篇
x_A = torch.randn(3, 6)
x_P = torch.randn(5, 7)
x_O = torch.randn(4, 5)
# 节点连接关系
edge_index_A_P = torch.tensor([
    [0, 1, 2, 3, 4],
    [5, 5, 5, 6, 7],
])
edge_index_A_O = torch.tensor([
    [8, 9, 10, 11],
    [5, 6, 7, 5],
])

# 构造dict对象  
OAP_graph_dict = {
    'x_O': x_O,
    'x_A': x_A,
    'x_P': x_P,
    'edge_index_A_O': edge_index_A_O,
    'edge_index_A_P': edge_index_A_P,
}

# 转dict对象为Data对象
OAP_graph_data = OAP_Data.from_dict(OAP_graph_dict)

# 获取OAP图上不同节点、不同边的数量
print(f'Number of orginazation nodes：{OAP_graph_data.num_nodes_O}') # 节点数量
print(f'Number of author nodes：{OAP_graph_data.num_nodes_A}') # 机构数量
print(f'Number of paper nodes：{OAP_graph_data.num_nodes_P}') # 论文数量
print(f'Number of author-orginazation edges：{OAP_graph_data.num_edges_A_O}') # 作者-机构边数量
print(f'Number of author-paper edges： {OAP_graph_data.num_edges_A_P}') # 作者-论文边数量

# 输出
Number of orginazation nodes：4
Number of author nodes：3
Number of paper nodes：5
Number of author-orginazation edges：4
Number of author-paper edges： 5

标签：图论,num,self,torch,edge,Task01,geometric,节点,PyG
来源： https://blog.csdn.net/m0_46306014/article/details/117962587

Task01：简单图论与环境配置与PyG

Task01：简单图论与环境配置与PyG

一、简单图论

二、环境配置

三、PyG中图与图数据集的表示和使用

Data和Dataset对象的创建

四、作业

`Data`和`Dataset`对象的创建