其他分享
首页 > 其他分享> > 手部姿态检测(按视频段)_v2(stgcn)

手部姿态检测(按视频段)_v2(stgcn)

作者:互联网

参考链接:

https://github.com/yongqyu/st-gcn-pytorch

https://www.cnblogs.com/shyern/p/11262926.html

https://blog.csdn.net/qq_36893052/article/details/79860328

https://www.zhihu.com/collection/437834930

网络特点:
stgcn(时空图卷积):
1.2018年提出,网络由九层时空图卷积,一共有9个时间卷积核,在每一个ST-GCN使用残差链接
2.通过线性堆叠的 GCN 和 TCN 来间接扩大每个节点的感受野,非常有助于时空信息的提取
3.通过 openpose 从视频中提取 2D 检测点坐标,再将置信度作为 z 轴,从而得到 3D 骨架序列

试验方法:

序号Idea原理优先级数据状态更新前更新后备注
1优化手部特征点检测现在的手部检测点经常会检测飞了,希望能有更稳定的检测器。目前没有找到更优的检测器,有待旷世的接口,暂时挂起来6人(1504条)
1.动作不是很标准,手部检测很多点检测不到;
2.三类数据极其不均衡。
1.准确度不高;
2.训练过拟合。
 目前没有更好的优化器,后续采集尽量动作规范。
2采集更多的数据平衡各个类别的数据已尝试11人(3639条)
北京6人
深圳5人(动作规范)
1.准确度不高;
2.训练过拟合。
准确度有所提升,组内测试集准确度有80%多,但是泛化能力大概只有45%-60%。过拟合严重。
31.weight decay
2.dropout
3.学习率
减少过拟合已尝试11人(3639条)
北京6人
深圳5人(动作规范)
1.准确度不高;
2.训练过拟合。
在验证集上的过拟合能很大程度缓解,但没有彻底解决泛化能力。在北京6人上面进行测试,效果极其不佳,怀疑与动作标准有关。
4超级人模拟更多的动作一个人模拟各人的多种动作已尝试11人(3639条)
北京6人
深圳5人(动作规范)+1个超级人
1.准确度不高;
2.训练过拟合。
1.准确度不高;
2.训练过拟合。
超级用户没有明显效果。
5继续新增数据数据扩增能解决过拟合问题,并且有助于准确度提高已尝试22人(7000条)
北京6人+后采6人(距离不统一)
深圳10人(动作规范)
1.准确度不高(60%);
2.训练过拟合(轻微)。
准确度提升较大,大概能多70%出头,提升了10个点。深圳的距离比较统一,用深圳的做测试集,准确度要比北京的高。
6特征点数据归一化用相对坐标除以人脸的长宽,把特征归一化,解决距离问题优先尝试22人(7000条)
北京6人+后采6人(距离不统一)
深圳10人(动作规范)
1.准确度不高(70%);
2.训练过拟合。
效果比原来差效果不佳,有可能是归一化后的数据差异变小
7清洗数据增加了两个限制方法:
1.检测不到人脸、脸宽小于50像素的去掉;(帧)
2.手部的7个关键点,若是置信度大于0.1的 不超过4个,也去掉;(帧)
3.一秒30帧,有15帧没通过上面的话,整段视频就去掉。(视频)
优先尝试22人(清洗后剩4000条)
北京6人+后采6人(距离不统一)
深圳10人(动作规范)
1.准确度不高(70%);
2.训练过拟合(轻微)。
1.准确度提升较大,大概到达80%多,提升了10多个点;
2.过拟合也有减轻。
若能让手部检测更准确,准确度还能提升。
8延长每段视频的时间(2s/5s)视频时间延长,能表达的内容更多优先尝试2/3/4/9批数据(16人)
深圳10+北京6
1.在清洗数据上准确度80%多;
2.在原数据上准确度很低。
1.在清洗数据上有两三个点的提升(85%左右);
2.在原始数据上没有什么效果(75%左右)。
1.还是脏数据太多,影响了预测结果,需要有更优的手部检测器;
2.经过1s/2s/4s的交叉对比,目前最优的是:2s+通过15帧(清洗数据)、2s+通过15帧+全部帧合并(不清洗数据)。

目前最优:2s视频,64帧/通过15帧即可,准确度90%。

若是希望更多的数据合格,可以选择通过8帧即可,损失一点精度。

原数据:64帧64帧最优
data清洗脏数据清洗脏数据通过15帧数据量清洗前总和清洗后总和
2+3-8110.9170.8512+3-811-20.9180.743013013
2+4-8110.8550.6672+4-811-20.8680.68625453279396536919390652
3+4-8110.9260.6953+4-811-20.920.833473300129902375269128772
4475304162941319217162698
918499483311528248282
16771030432313912157614282404

代码工程:

data.py    

数据处理:
1.一段2min的视频,每1s剪裁下来,编号1-n,得到n段视频(n条数据);加上人物编号1-m;每段加动作类型标签:xx
2.每一帧得到关键坐标点,以某个稳定的中心点作为坐标原点,其他的点都减去这个原点
原点坐标改为(0,0),得到整体相对坐标;
3. 把坐标变为一维存储,后面要用再从代码reshape成(n,-1,2)

from torch.utils import data
import torch
import os
import random
import numpy as np


list1=[14,18,23,30,40,48]   # 测试人物编号
list2=[13,17,22,29,39,47]   # 验证测试编号

forder='2+3+4+5+6'
forder2='2+3+4+5+6_v1'
file_name = 'dataset/test9/src/'+str(forder)+'.txt'

save='dataset/test9/src/'+str(forder2)+"/"
if os.path.exists(save) == False:
	os.makedirs(save)

split_label=1
frame_nums=64
f = open(file_name)
lines = f.readlines()

prev_video = int(lines[0].strip().split(' ')[1])   # 视频编号
prev_categ = int(lines[0].strip().split(' ')[2])   # 类别标签

datas=[]
datas_label=[]
frames = []
train = []
valid = []
test  = []
train_label = []
valid_label = []
test_label  = []
m=0

for line in lines:
	line = line.strip().split(' ')
	vid = int(line[1])   # 视频编号
	aid = int(line[0])   # 任务编号
	cid = int(line[2])   # 类别标签
	label=list(map(int, line[:3]))
	features = list(map(float, line[3:]))   # 21个特征点
	
	# 若是视频标签相同,则都放入数组中,作为一条训练数据
	if prev_video == vid:
		frames.append(np.reshape(np.asarray(features), (-1,3)))   # 把一维转换成[15,3]的格式
	else:
		# 如果一条视频帧数过多,则选取前frame_nums帧,并连接起来,转成torch格式
		if len(frames) >= frame_nums:
			# frames = random.sample(frames, frame_nums)   # 随机取帧
			frames = frames[0:frame_nums]    # 按顺序取帧
			frames = torch.from_numpy(np.stack(frames, 0))  # 把每一帧在0维连接起来,转成torch格式

		# 若是视频帧数不够多,则利用线性插值,把数据补充到frame_nums帧
		else:
			frames = np.stack(frames, 0) # 把每一帧连接起来,如:n帧 n*[1,15,3]=[n,15,3] 作为一条数据
			xloc = np.arange(frames.shape[0])   # np.arange:生成n个自然数,即等于frame_nums帧数
			new_xloc = np.linspace(0, frames.shape[0], frame_nums)  # 生成start和end之间frame_nums个等差间隔的元素,如:1、2、··n
			frames = np.reshape(frames, (frames.shape[0], -1)).transpose()  # transpose:矩阵转置
			# print(frames.shape,xloc.shape,new_xloc.shape)
			
			new_datas = []
			for data in frames:
				new_datas.append(np.interp(new_xloc, xloc, data))   # interp:进行线性插值, 获得frame_nums帧数据
			frames = torch.from_numpy(np.stack(new_datas, 0)).t()  # 把n帧数据再次连接起来,转换torch格式

		frames = frames.view(frame_nums, -1, 3)  # 强制reshape矩阵形状
		datas.append(frames)   #数据
		if split_label==1:
			datas_label.append(label)   #标签
		else:
			datas_label.append(prev_categ)   #标签

		m+=1
		# 2.按人物编号分
		if aid in list1:
			test.append(frames)
			test_label.append(prev_categ)
		elif aid in list2:
			valid.append(frames)
			valid_label.append(prev_categ)
		else:
			train.append(frames)
			train_label.append(prev_categ)

		frames = [np.reshape(np.asarray(features), (-1,3))]  # frames重置,等于每条视频的第一帧的关键点 

	prev_actor = aid   # 人物编号重置
	prev_video = vid   # 视频编号重置
	prev_categ = cid   # 标签重置


# # 3.随机划分
# lens=len(datas)
# num=random.sample(range(lens),lens)   #获取随机数

# for i in range(lens):
# 	index=num[i]
# 	if i <=int(lens*0.7):
# 		train.append(datas[index])
# 		train_label.append(datas_label[index])
# 	elif i <=int(lens*0.9):
# 		valid.append(datas[index])
# 		valid_label.append(datas_label[index])
# 	else:
# 		test.append(datas[index])
# 		test_label.append(datas_label[index])



train_label = torch.from_numpy(np.asarray(train_label))
valid_label = torch.from_numpy(np.asarray(valid_label))
test_label  = torch.from_numpy(np.asarray(test_label))
print(len(train_label),len(valid_label),len(test_label))
print(train[0].shape)


torch.save((torch.stack(train, 0), train_label), save+'train.pkl')
torch.save((torch.stack(valid, 0), valid_label), save+'valid.pkl')
torch.save((torch.stack(test, 0),  test_label),  save+'test.pkl')


# 数据处理:
# 1.一段2min的视频,每1s剪裁下来,编号1-n,得到n段视频(n条数据);加上人物编号1-m;每段加动作类型标签:xx
# 2.每一帧得到关键坐标点,以某个稳定的中心点作为坐标原点,其他的点都减去这个原点
# 原点坐标改为(0,0),得到整体相对坐标;
# 3. 把坐标变为一维存储,后面要用再从代码reshape成(n,-1,2)

main.py

import os
import numpy as np
import torch
import torch.optim as optim
import torch.utils.data as data
import time
from model import *
from metric import accuracy
from config import get_args
args = get_args()

# 判断是否有gpu
device = torch.device('cpu' if torch.cuda.is_available() else 'cpu')

# torch的数据加载方法
train_tensor, train_label = torch.load(args.train_path)
valid_tensor, valid_label = torch.load(args.valid_path)
test_tensor , test_label  = torch.load(args.test_path)

# 数据加载器,一次性加载所有数据,每次取出batch个数据
train_loader = data.DataLoader(data.TensorDataset(train_tensor.to(device)),
							   batch_size = args.batch_size, shuffle=False)
valid_loader = data.DataLoader(data.TensorDataset(valid_tensor.to(device)),
							   batch_size = args.batch_size, shuffle=False)
test_loader  = data.DataLoader(data.TensorDataset(test_tensor.to(device)),
							   batch_size = args.batch_size, shuffle=False)
train_label = train_label.to(device)
valid_label = valid_label.to(device)
test_label  = test_label.to(device)

# 权重矩阵
A = [[0,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
	 [1,0,1,0,0,0,0,0,0,0,0,0,0,0,0],
	 [0,1,0,1,0,0,1,0,0,1,0,0,0,0,0],
	 [0,0,1,0,1,0,0,0,0,0,0,0,0,0,0],
	 [0,0,0,1,0,1,0,0,0,0,0,0,0,0,0],
	 [0,0,0,0,1,0,0,0,0,0,0,0,0,0,0],
	 [0,0,1,0,0,0,0,1,0,0,0,0,0,0,0],
	 [0,0,0,0,0,0,1,0,1,0,0,0,0,0,0],
	 [0,0,0,0,0,0,0,1,0,0,0,0,0,0,0],
	 [0,0,1,0,0,0,0,0,0,0,1,0,1,0,0],
	 [0,0,0,0,0,0,0,0,0,1,0,1,0,0,0],
	 [0,0,0,0,0,0,0,0,0,0,1,0,0,0,0],
	 [0,0,0,0,0,0,0,0,0,1,0,0,1,0,0],
	 [0,0,0,0,0,0,0,0,0,0,0,1,0,1,0],
	 [0,0,0,0,0,0,0,0,0,0,0,0,1,0,0]]
A = torch.from_numpy(np.asarray(A)).to(device)

# 定义GCN模型
model = GGCN(A, train_tensor.size(3), args.num_classes,
			 [train_tensor.size(3), train_tensor.size(3)*3], [train_tensor.size(3)*3, 16, 32, 64],
			 args.feat_dims, args.frame_nums, args.dropout_rate)
# print([train_tensor.size(3), train_tensor.size(3)*3], [train_tensor.size(3)*3, 16, 32, 64])

if device == 'cuda':
	model.cuda()

# 查看模型参数
num_params = 0
for p in model.parameters():
	num_params += p.numel()
# print(model)

# 定义Loss,优化器,学习率衰减
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = args.learning_rate,
					   betas=[args.beta1, args.beta2], weight_decay = args.weight_decay)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma = 0.1)

best_epoch = 0
best_acc = 0
def train():
	global best_epoch, best_acc

	# 恢复模型,训练
	if args.start_epoch:
		model.load_state_dict(torch.load(os.path.join(args.model_path, 'model-%d.pkl'%(args.start_epoch))))

	# Training
	for epoch in range(args.start_epoch, args.num_epochs):
		train_loss = 0
		train_acc  = 0
		scheduler.step()   # 用了scheduler.step(),按epoch更新lr
		model.train()  # model.train():启用 BatchNormalization、Dropout    model.eval():不启用


		for i, x in enumerate(train_loader):
			logit = model(x[0].float())   # 模型预测结果
			target = train_label[i]  # 真实标签
			loss = criterion(logit, target.view(1).long())   #计算loss
			
			model.zero_grad()  # 每个batch清除一次梯度
			loss.backward()    # 反向传播loss,计算梯度
			optimizer.step()   # 优化器根据梯度更新网络参数

			train_loss += loss.item()   # 按epoch统计loss
			train_acc  += accuracy(logit, target.view(1).long())   # 按epoch统计acc
		print('[epoch',epoch+1,'] Train loss:',train_loss/(i+1), 'Train Acc:',train_acc/(i+1))

		# 保存模型
		if os.path.exists(args.model_path) == False:
			os.makedirs(args.model_path)
		if (epoch+1) % 20 ==0:
			torch.save(model.state_dict(), os.path.join(args.model_path, 'model-%d.pkl'%(epoch+1)))

		# 训练中进行验证
		if (epoch+1) % args.val_step == 0:
			model.eval()
			val_loss = 0
			val_acc  = 0
			with torch.no_grad():
				for i, x in enumerate(valid_loader):
					logit = model(x[0].float())
					target = valid_label[i]

					val_loss += criterion(logit, target.view(1).long()).item()
					val_acc += accuracy(logit, target.view(1).long())

				if best_acc >= (val_acc/(i+1)):
					best_epoch = epoch+1
					torch.save(model.state_dict(), os.path.join(args.model_path, 'best_model-%d.pkl'%(best_epoch)))
				best_acc = (val_acc/(i+1))

			print('Val loss:',val_loss/(i+1), 'Val Acc:',val_acc/(i+1))

def test():
	global best_epoch
	
	model.load_state_dict(torch.load(os.path.join(args.model_path, 
												  'model-%d.pkl'%(best_epoch))))
	print("load model from 'model-%d.pkl'"%(best_epoch))

	model.eval()
	test_loss = 0
	test_acc  = 0
	with torch.no_grad():
		for i, x in enumerate(test_loader):
			star=time.time()
			logit = model(x[0].float())
			target = test_label[i]

			test_loss += criterion(logit, target.view(1).long()).item()
			test_acc  += accuracy(logit, target.view(1).long())

			end=int((time.time()-star)*1000)
			# print('pred:',torch.max(logit, 1)[1].float()
			# .cpu().numpy(),  'true:',target.cpu().numpy(),'time:',end, 'index:',i)
	print('Test loss:',test_loss/(i+1), 'Test Acc:',test_acc/(i+1))

if __name__ == '__main__':
	# if args.mode == 'train':
	# 	train()
	# elif args.mode == 'test':
	# 	best_epoch = args.test_epoch
	# 	test()
	
	train()
	# best_epoch = 200
	# test()

config.py

import argparse

def get_args():
	parser = argparse.ArgumentParser()

	parser.add_argument('--mode',  type=str, default='train')
	parser.add_argument('--test_epoch',type=int, default=80)
	
	parser.add_argument('--start_epoch',type=int, default=0)
	parser.add_argument('--num_epochs',type=int, default=100)
	parser.add_argument('--val_step',type=int, default=20)
	
	model_forder = "2+3+4+5+6_v3"
	data_forder = "2+3+4+5+6_v3"
	parser.add_argument('--train_path', type=str, default='dataset/test9/clear/'+str(data_forder)+'/train.pkl')
	parser.add_argument('--valid_path', type=str, default='dataset/test9/clear/'+str(data_forder)+'/valid.pkl')
	parser.add_argument('--test_path',  type=str, default='dataset/test9/clear/'+str(data_forder)+'/test.pkl')

	parser.add_argument('--model_path',  type=str, default='model/test9/clear/'+str(model_forder)+'/')

	parser.add_argument('--batch_size',  type=int, default=1)
	parser.add_argument('--learning_rate',type=int, default=0.01)
	parser.add_argument('--beta1',type=int, default=0.5)
	parser.add_argument('--beta2',type=int, default=0.99)
	parser.add_argument('--dropout_rate',type=int, default=0.5)
	parser.add_argument('--weight_decay',type=int, default=0.0)

	parser.add_argument('--frame_nums',type=int, default=64)   #32-192, 64-960, 128-2496
	parser.add_argument('--num_classes',type=int, default=4)
	parser.add_argument('--feat_dims',type=int, default=13)
	
	
	args = parser.parse_args()

	return args

test.py

import os
import numpy as np
import torch
import torch.optim as optim
import torch.utils.data as data
import time
from model import *
from metric import accuracy
from test_config import get_args
args = get_args()

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_tensor, train_label = torch.load(args.train_path)
valid_tensor, valid_label = torch.load(args.valid_path)
test_tensor , test_label  = torch.load(args.test_path)

train_loader = data.DataLoader(data.TensorDataset(train_tensor.to(device)),
							   batch_size = args.batch_size, shuffle=False)
valid_loader = data.DataLoader(data.TensorDataset(valid_tensor.to(device)),
							   batch_size = args.batch_size, shuffle=False)
test_loader  = data.DataLoader(data.TensorDataset(test_tensor.to(device)),
							   batch_size = args.batch_size, shuffle=False)
train_label = train_label.to(device)
valid_label = valid_label.to(device)
test_label  = test_label.to(device)

A = [[0,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
	 [1,0,1,0,0,0,0,0,0,0,0,0,0,0,0],
	 [0,1,0,1,0,0,1,0,0,1,0,0,0,0,0],
	 [0,0,1,0,1,0,0,0,0,0,0,0,0,0,0],
	 [0,0,0,1,0,1,0,0,0,0,0,0,0,0,0],
	 [0,0,0,0,1,0,0,0,0,0,0,0,0,0,0],
	 [0,0,1,0,0,0,0,1,0,0,0,0,0,0,0],
	 [0,0,0,0,0,0,1,0,1,0,0,0,0,0,0],
	 [0,0,0,0,0,0,0,1,0,0,0,0,0,0,0],
	 [0,0,1,0,0,0,0,0,0,0,1,0,1,0,0],
	 [0,0,0,0,0,0,0,0,0,1,0,1,0,0,0],
	 [0,0,0,0,0,0,0,0,0,0,1,0,0,0,0],
	 [0,0,0,0,0,0,0,0,0,1,0,0,1,0,0],
	 [0,0,0,0,0,0,0,0,0,0,0,1,0,1,0],
	 [0,0,0,0,0,0,0,0,0,0,0,0,1,0,0]]
A = torch.from_numpy(np.asarray(A)).to(device)

model = GGCN(A, train_tensor.size(3), args.num_classes, 
			 [train_tensor.size(3), train_tensor.size(3)*3], [train_tensor.size(3)*3, 16, 32, 64], 
			 args.feat_dims, args.frame_nums, args.dropout_rate)
if device == 'cuda':
	model.cuda()

num_params = 0
for p in model.parameters():
	num_params += p.numel()
# print(model)
# print('The number of parameters: {}'.format(num_params))

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = args.learning_rate,
					   betas=[args.beta1, args.beta2], weight_decay = args.weight_decay)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma = 0.1)

best_epoch = 0
best_acc = 0

def test():
	global best_epoch

	model.load_state_dict(torch.load(os.path.join(args.model_path, 
												  'model-%d.pkl'%(best_epoch))))
	print("load model from 'model-%d.pkl'"%(best_epoch))

	model.eval()
	test_loss = 0
	test_acc  = 0
	with torch.no_grad():
		for i, x in enumerate(test_loader):
			star=time.time()
			logit = model(x[0].float())
			target = test_label[i]

			test_loss += criterion(logit, target.view(1).long()).item()
			test_acc  += accuracy(logit, target.view(1).long())

			end=int((time.time()-star)*1000)
			# print('pred:',torch.max(logit, 1)[1].float().cpu().numpy(),  'true:',target.cpu().numpy(),'time:',end, 'index:',i)
	print('Test loss:',test_loss/(i+1), 'Test Acc:',test_acc/(i+1))

if __name__ == '__main__':
	# if args.mode == 'train':
	# 	train()
	# elif args.mode == 'test':
	# 	best_epoch = args.test_epoch
	# 	test()
	
	# train()
	best_epoch = 100
	test()

test_config.py

import argparse

def get_args():
	parser = argparse.ArgumentParser()

	parser.add_argument('--mode',  type=str, default='train')
	parser.add_argument('--test_epoch',type=int, default=80)

	parser.add_argument('--start_epoch',type=int, default=0)
	parser.add_argument('--num_epochs',type=int, default=100)
	parser.add_argument('--val_step',type=int, default=20)
	
	model_forder = "2+3+4+5+6_v1"
	data_forder = "2+3+4+5+6_v1"
	parser.add_argument('--train_path', type=str, default='dataset/test9/src/'+str(data_forder)+'/train.pkl')
	parser.add_argument('--valid_path', type=str, default='dataset/test9/src/'+str(data_forder)+'/valid.pkl')
	parser.add_argument('--test_path',  type=str, default='dataset/test9/src/'+str(data_forder)+'/test.pkl')

	parser.add_argument('--model_path',  type=str, default='model/test9/clear/'+str(model_forder)+'/')

	parser.add_argument('--batch_size',  type=int, default=1)
	parser.add_argument('--learning_rate',type=int, default=0.01)
	parser.add_argument('--beta1',type=int, default=0.5)
	parser.add_argument('--beta2',type=int, default=0.99)
	parser.add_argument('--dropout_rate',type=int, default=0.5)
	parser.add_argument('--weight_decay',type=int, default=0.0)

	parser.add_argument('--frame_nums',type=int, default=64)   #32-192, 64-960, 128-2496
	parser.add_argument('--num_classes',type=int, default=4)
	parser.add_argument('--feat_dims',type=int, default=13)
	

	args = parser.parse_args()

	return args

model.py

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch
import numpy as np

from layer import GraphConvolution, StandConvolution, StandRecurrent

class GGCN(nn.Module):
	def __init__(self, adj, num_v, num_classes, gc_dims, sc_dims, feat_dims, frame_nums, dropout=0.5):
		super(GGCN, self).__init__()
		terminal_cnt = 5
		actor_cnt = 1
		self.frame_nums=frame_nums
		
		# torch.eye:返回一个2维张量,对角线位置全1,其它位置全0。 shape:adj.size(0)*adj.size(0)
		# detach():创建一个新的tensor,将其从当前的计算图中分离出来.新的tensor与之前的共享data,但是不具有梯度.(可解决爆显存问题)
		adj = adj + torch.eye(adj.size(0)).to(adj).detach()   # [15,15]
		ident = torch.eye(adj.size(0)).to(adj)   # [15,15]
		zeros = torch.zeros(adj.size(0), adj.size(1)).to(adj)   # [15,15]
		a=torch.cat([adj, ident, zeros], 1)   # [15,15]*3=[15,45]
		b=torch.cat([ident, adj, ident], 1)   # [15,15]*3=[15,45]
		c=torch.cat([zeros, ident, adj], 1)   # [15,15]*3=[15,45]
		self.adj = torch.cat([a,b,c], 0).float()   # [15,45]*3=[45,45]


		# nn.Parameter:把xx转成模型中根据训练可以改动的参数
		# randn:返回一个正态分布的随机数的张量,均值为“0”,方差为“1”
		self.terminal = nn.Parameter(torch.randn(terminal_cnt, actor_cnt, feat_dims))   #[5,1,13]
		self.gcl = GraphConvolution(gc_dims[0]+feat_dims, gc_dims[1], num_v, dropout=dropout) #3+13、9、3
		self.conv= StandConvolution(sc_dims, num_classes, dropout=dropout)   # 使用标准卷积,自定义模型
		# self.conv= StandRecurrent(sc_dims, num_classes, dropout=dropout)   # 使用LSTM模型
		
		nn.init.xavier_normal_(self.terminal)   # Xavier正态分布初始化
		
	def forward(self, x):
		# F.interpolate(input,x,mode='nearest'):
		# 采样函数,x为采样倍数(1/2则为下采样,2为上采样); model为采样方法,默认nearest
		head_la = F.interpolate(torch.stack([self.terminal[0],self.terminal[1]],2), 6)  # [1,13,6]
		head_ra = F.interpolate(torch.stack([self.terminal[0],self.terminal[2]],2), 6)  # [1,13,6]
		lw_ra = F.interpolate(torch.stack([self.terminal[3],self.terminal[4]],2), 6)  # [1,13,6]
		# print(lw_ra.shape)
		
		# 在2维度合并以下5个矩阵, node_features=[1,13,3]*5--[1,13,15]   x=[1, 32, 15, 3]
		node_features = torch.cat([
								   (head_la[:,:,:3] + head_ra[:,:,:3])/2,   # [1,13,3]   head_la、head_ra的0-3相加
								   torch.stack((lw_ra[:,:,2], lw_ra[:,:,1], lw_ra[:,:,0]), 2),  # [1,13,3]  lw_ra的0-2合并
								   lw_ra[:,:,3:], head_la[:,:,3:], head_ra[:,:,3:]], 2).to(x)   # [1,13,3]  三者都是取3-5
		
		
		# permute:将tensor的维度换位;  unsqueeze:指定位置1(或者n),加上一个维数为1的维度; repeat:沿着纵轴(1)方向,重复增加n倍列数
		# 数据变化:[1,13,15]-[1,15,13]-[1,1,15,13]-[1,32,15,13]
		node_features=node_features.permute(0,2,1).unsqueeze(1).repeat(1,self.frame_nums,1,1)
		# [1,32,15,3]+[1,32,15,13]=[1,32,15,16]
		x = torch.cat((x, node_features), 3)
		# [1,30,15,16]*3=[1,30,45,16]
		concat_seq = torch.cat([x[:,:-2], x[:,1:-1], x[:,2:]], 2)
		
		# print(self.adj.shape, concat_seq.shape)
		multi_conv = self.gcl(self.adj, concat_seq)
		logit = self.conv(multi_conv)
		
		return logit
		

layer.py

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# 图卷积结构
class GraphConvolution(nn.Module):
	def __init__(self, input_dim, output_dim, num_vetex, act=F.relu, dropout=0.5, bias=True):
		super(GraphConvolution, self).__init__()

		self.alpha = 1.

		self.act = act
		self.dropout = nn.Dropout(dropout)
		self.weight = nn.Parameter(torch.randn(input_dim, output_dim)).to(device)  # [16,9]
		if bias:
			self.bias = nn.Parameter(torch.randn(output_dim)).to(device)
		else:
			self.bias = None

		for w in [self.weight]:
			nn.init.xavier_normal_(w)

	# gcn的网络结构---计算公式
	def normalize(self, m):
		rowsum = torch.sum(m, 0)
		r_inv = torch.pow(rowsum, -0.5)
		r_mat_inv = torch.diag(r_inv).float()

		m_norm = torch.mm(r_mat_inv, m)
		m_norm = torch.mm(m_norm, r_mat_inv)

		return m_norm

	def forward(self, adj, x):
		x = self.dropout(x)  # [1,30,45,16]
		adj_norm = self.normalize(adj)  # [45,45]
		# 如果mat1 是一个n×m张量,mat2 是一个 m×p 张量,将会输出一个 n×p 张量
		sqr_norm = self.normalize(torch.mm(adj,adj))  # [45,45]
		m_norm = self.alpha*adj_norm + (1.-self.alpha)*sqr_norm  # [45,45]
		
		x_tmp = np.einsum('abcd,de->abce', x, self.weight)  # # [1,30,45,16]+[16,9]=[1, 30, 45, 9]
		x_out = np.einsum('ij,abid->abjd', m_norm, x_tmp)  # [1, 30, 45, 9]
		# x_tmp = torch.einsum('abcd,de->abce', x, self.weight)
		# x_out = torch.einsum('ij,abid->abjd', m_norm, x_tmp)
		
		if self.bias is not None:
			x_out += self.bias

		x_out=torch.from_numpy(x_out)
		# x_out=torch.from_numpy(x_out).cuda()
		x_out = self.act(x_out)

		return x_out
		

# 使用标准卷积,自定义模型
class StandConvolution(nn.Module):
	def __init__(self, dims, num_classes, dropout):
		super(StandConvolution, self).__init__()
		# h/w = (h/w - kennel_size + 2padding) / stride + 1
		# x = ([10,16,30,32]),其中h=30,w=32,对于卷积核长分别是 h:3,w:2 ;对于步长分别是h:2,w:1;padding默认0;
		# h = (30 - 3 + 20)/ 2 +1 = 27/2 +1 = 13+1 =14
		# w =(32 - 2 + 2*0)/ 1 +1 = 30/1 +1 = 30+1 =31
		# batch = 10, out_channel = 33
		# 故: y= ([10, 33, 14, 31])

		self.num_classes=num_classes
		self.dropout = nn.Dropout(dropout)
		self.conv = nn.Sequential(
								   nn.Conv2d(dims[0], dims[1], kernel_size=5, stride=2),
								   nn.InstanceNorm2d(dims[1]),
								   nn.ReLU(inplace=True),
								   # nn.AvgPool2d(5, stride=1),
								   nn.Conv2d(dims[1], dims[2], kernel_size=5, stride=2),
								   nn.InstanceNorm2d(dims[2]),
								   nn.ReLU(inplace=True),
								   #nn.AvgPool2d(3, stride=1),
								   nn.Conv2d(dims[2], dims[3], kernel_size=5, stride=2),
								   nn.InstanceNorm2d(dims[3]),
								   nn.ReLU(inplace=True),
								   #nn.AvgPool2d(3, stride=2)
								   ).to(device)

		# self.fc = nn.Linear(dims[3]*3, num_classes).to(device)
		self.fc = nn.Linear(960, num_classes).to(device)   # 32-192, 64-960, 128-2496
		

	def forward(self, x):
		x = self.dropout(x.permute(0,3,1,2))  # [1, 9, 30, 45]  [1,9,62,45]
		x_tmp = self.conv(x)   # [1, 64, 1, 3]  [1,64,5,3]
		# view:按照行优先的顺序排成一个一维的数据,再重新组成需要的形状  [1, 64, 1, 3]---[1, 192]
		x_tmp = x_tmp.view(x.size(0), -1)  # [1, 64, 1, 3]---[1, 192]
		# print(x_tmp.shape)
		
		x_out = self.fc(x_tmp)   # [1,4],返回分类标签
		
		return x_out


# 使用LSTM模型
class StandRecurrent(nn.Module):
	def __init__(self, dims, num_classes, dropout):
		super(StandRecurrent, self).__init__()
		self.lstm = nn.LSTM(dims[0]*45, dims[1], batch_first=True,dropout=0.5).to(device)
		self.fc = nn.Linear(dims[1], num_classes).to(device)

	def forward(self, x):
		x_tmp,_ = self.lstm(x.contiguous().view(x.size(0), x.size(1), -1))
		x_out = self.fc(x_tmp[:,-1])

		return x_out

metric.py

import torch
from sklearn.metrics import accuracy_score

def accuracy(preds, target):
	# torch.max(a,dim):返回torch中的最大值a、以及a对应的下标;   dim=0行对比最大值,dim=1则是列对比最大值
	preds = torch.max(preds, 1)[1].float()
	acc = accuracy_score(preds.cpu().numpy(), target.cpu().numpy())   # 返回准度率

	# print(preds.cpu().numpy(), target.cpu().numpy())
	return acc

标签:stgcn,torch,args,self,手部,v2,train,test,model
来源: https://blog.csdn.net/gm_Ergou/article/details/118393968