其他分享
首页 > 其他分享> > 知识追踪-Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing

知识追踪-Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing

作者:互联网

记录知识追踪领域所看的论文

本文研究内容

采用FM因子分解机,利用特征之间的组合特性,预测习题是否被正确做出。主要在于如何构造特征, User,items,skills,是id类feature的one-hot(skills是向量),分别是用户id,习题id,习题对应知识点id向量,Wins和Fails变量的设置是核心所在,分别表示做对和做错某题对应知识点的计数器,但仅在该时刻习题知识点与历史做题记录共同的知识点上进行累加。

作者另一篇论文-待看
Deep Factorization Machines for Knowledge Tracing

相关工作

BKT的缺点

无法为含有多个知识组件的问题建模
BKT最新模型feature aware student tracing (FAST) 可以解决该问题,即同时为多个知识组件建模

DKT

Wilson, K. H.; Karklin, Y.; Han, B.; and Ekanadham, C.
2016a. Back to the basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation. In Proceedings of the 9th International Conference on Educational
Data Mining (EDM), 539–544.

该研究文献表明一些因子分析模型可以与DKT的性能相当

Factor Analysis

因子分析是从假设出发,它假设所有的自变量x出现的原因是因为背后存在一个潜变量f,也就是我们所说的因子,在这个因子的作用下,x可以被观察到。
举个例子,比如一个学生考试,数学,化学 ,物理都考了满分,那么我们认为这个学生理性思维较强,理性思维就是一个因子。在这个因子的作用下,偏理科的成绩才会那么高。这就是因子分析。
因子分析

Item Response Theory

代表模型 :Rasch model
在这里插入图片描述
θ i θ_i θi​:the ability of student i (the student bias)
d j d_j dj​: the difficulty of question j(the question bias)

Wilson, K. H.; Xiong, X.; Khajah, M.; Lindsey, R. V.; Zhao,
S.; Karklin, Y.; Van Inwegen, E. G.; Han, B.; Ekanadham,
C.; Beck, J. E.; et al. 2016b. Estimating student proficiency:
Deep learning is not the panacea. Presented at the Workshop
on Machine Learning for Education, Neural Information Processing Systems.

该研究文献表明没有时间特征,IRT也可以优于DKT:可能是因为DKT太多参数,容易过拟合

Multidimensional Item Response Theory (MIRT)
在这里插入图片描述
θ i θ_i θi​ : the multidimensional ability of student i,
d j d_j dj​ : the multidimensional discrimination of item j
δ j δ_j δj​ : the easiness of item j (item bias)

Additive factor model (AFM)

takes into account the number of attempts a learner has made to an item
在这里插入图片描述
β k β_k βk​ : the bias for skill k
γ k γ_k γk​: the bias for each opportunity of learning skill k
N i k N_{ik} Nik​ :the number of times student i attempted a question that requires skill k

AFM 待看论文

Cen, H.; Koedinger, K.; and Junker, B. 2006. Learning factors analysis–a general method for cognitive model evaluation
and improvement. In International Conference on Intelligent
Tutoring Systems, 164–175. Springer.

Cen, H.; Koedinger, K.; and Junker, B. 2008. Comparing two
irt models for conjunctive skills. In International Conference
on Intelligent Tutoring Systems, 796–798. Springer

Performance factor analysis model (PFA)

counts separately positive and negative attempts
在这里插入图片描述
β k β_k βk​ : the bias for skill k,
γ k ( δ k ) γ_k (δ_k) γk​(δk​): the bias for each opportunity of learning skill k after a successful (unsuccessful)
attempt
W i k ( F i k ) W_{ik} (F_{ik}) Wik​(Fik​) : the number of successes (failures) of student i over a question that requires skill k

PFA待看论文

Pavlik, P. I.; Cen, H.; and Koedinger, K. R. 2009. Performance factors analysis–a new alternative to knowledge tracing. In Proceedings of the 2009 conference on Artificial Intelligence in Education: Building Learning Systems that Care:
From Knowledge Representation to Affective Modelling, 531–538. IOS Press.

AFM其实就是PFA的一个特例,即 γ k = δ k γ_k = δ_k γk​=δk​时的情况

Wilson, K. H.; Xiong, X.; Khajah, M.; Lindsey, R. V.; Zhao,
S.; Karklin, Y.; Van Inwegen, E. G.; Han, B.; Ekanadham,
C.; Beck, J. E.; et al. 2016b. Estimating student proficiency:
Deep learning is not the panacea. Presented at the Workshop
on Machine Learning for Education, Neural Information Processing Systems.

Xiong, X.; Zhao, S.; Inwegen, E. V.; and Beck, J. 2016. Going deeper with deep knowledge tracing. In Proceedings of
the 9th International Conference on Educational Data Mining (EDM), 545–550.

上述文献表明DKT和PFA性能相当

Factorization Machines

Thai-Nghe, N.; Drumond, L.; Horv´ath, T.; and SchmidtThieme, L. 2012. Using factorization machines for student
modeling. In Proceedings of FactMod 2012 at the 20th Conference on User Modeling, Adaptation, and Personalization
(UMAP 2012)

Sweeney, M.; Lester, J.; Rangwala, H.; and Johri, A. 2016.
Next-term student performance prediction: A recommender
systems approach. JEDM — Journal of Educational Data
Mining 8(1):22–51.

上述两篇文献采用FM处理学生建模中的回归问题

本文采用FM处理学生建模中的分类问题

Knowledge Tracing Machines

KTMs基于事件中涉及的所有特征的稀疏权重集,对事件(对或错)的二进制结果进行了建模。事件中涉及的特征由长度为N的稀疏向量x编码,仅当该事件涉及特征1≤i≤N时,使得 x i x_i xi​>0。对于涉及x的每个事件,观察到正结果的概率p(X)验证为:
在这里插入图片描述
µ : a global bias
w : refer to the vector of biases ( w 1 , . . . , w N ) (w_1, . . . , w_N ) (w1​,...,wN​)
V :refer to the matrix of embeddings v i , i = 1 , . . . , N v_i, i = 1, . . . , N vi​,i=1,...,N

每个特征i都是由一个偏置 w i w_i wi​∈R和一个嵌入 v i ∈ R d v_i∈R^d vi​∈Rd对某一维d建模的。
在这里插入图片描述

特征feature

基于以上特征的数据进行编码(此时特征长度N=m+n+3s)
在这里插入图片描述
举例说明:第一轮,User2回答问题item2,outcome为1,从而获得skills1和2
第二轮,User2回答问题item2, outcome为0,之前的操作使得这一轮的Wins1和2为1
第三轮,User2回答问题item2, outcome为1,之前的操作使得这一轮的Wins1和2为1,Fails1和2为0
如此进行编码 …

Relation to Existing Models

ψ = logit,d = 0,only biases are learned for features, no embeddings

假设n个student features have bias: w i = θ i − µ w_i = θ_i - µ wi​=θi​−µ,m个question features have bias - d j d_j dj​ ,KTM就会变成 the 1-PL IRT model,也就是Rasch model 。此时 w = ( θ 1 − µ , . . . , θ n − µ , − d 1 , . . . , − d m ) w = (θ_1 - µ, . . . , θ_n -µ, -d_1, . . . , -d_m) w=(θ1​−µ,...,θn​−µ,−d1​,...,−dm​)

假设 w = ( β 1 , . . . , β s , γ 1 , . . . , γ s , δ 1 , . . . , δ s ) w = (β_1, . . . , β_s, γ_1, . . . , γ_s, δ_1, . . . , δ_s) w=(β1​,...,βs​,γ1​,...,γs​,δ1​,...,δs​),encoding of “student i attempted question j” is given by x = ( q j 1 , . . . , q j s , q j 1 W i 1 , . . . , q j s W i s , q j 1 F i 1 , . . . , q j s F i s ) x = (q_{j1}, . . . , q_{js}, q_{j1}W_{i1}, . . . , q_{js}W_{is}, q_{j1}F_{i1}, . . . , q_{js}F_{is}) x=(qj1​,...,qjs​,qj1​Wi1​,...,qjs​Wis​,qj1​Fi1​,...,qjs​Fis​),其中 W i k 和 F i k W_{ik}和F_{ik} Wik​和Fik​是skill水平上成功和失败尝试的计数器。此时,KTM就变成了PFA模型

训练

通过最小化所有S观测样本的负对数似然NLL来训练KTMS
在这里插入图片描述
X = ( x i ) 1 ≤ i ≤ S X = (x_i)1≤i≤S X=(xi​)1≤i≤S :样本特征
y = ( y i ) 1 ≤ i ≤ S ∈ 0 , 1 S y = (y_i)1≤i≤S ∈ {0, 1}^S y=(yi​)1≤i≤S∈0,1S:输出结果
为了知道训练,以及避免过度拟合,假设一些先验模型参数
bias $w_k :w_k ∼ N (µ, 1/λ) $
embedding component v k f , f = 1 , . . . , d : v k f ∼ N ( µ , 1 / λ ) v_{kf }, f = 1, . . . , d : v_{kf} ∼ N (µ, 1/λ) vkf​,f=1,...,d:vkf​∼N(µ,1/λ)
µ and λ :正则化参数, 遵循超先验 µ ∼ N (0, 1) and λ ∼ Γ(1, 1)
由于这些超先验,我们不需要手工调整正则化参数。 当我们使用ψ=probit,即正态分布的CDF逆,我们可以用吉布斯抽样来拟合模型。

The model is learned using the MCMC Gibbs sampler implementation of libFM2 in C++ , using the pywFM Python wrapper3

KTMs可以可视化ebeddings
在这里插入图片描述

数据集

在这里插入图片描述

实验结果

在这里插入图片描述

总结

本文介绍了KTM,对EDM领域的一些经典模型采用KTM处理知识追踪领域的分类问题。即使观测数据稀疏,它也可以估计用户和项目参数,并提供比现有模型更好的预测。

未来可研究的点

根据数据的收集方式改进KTM中特征的编码:

代码实现

标签:...,Knowledge,Tracing,question,bias,student,skill,model,Machines
来源: https://blog.csdn.net/CZYruobing/article/details/114188091