编程语言
首页 > 编程语言> > 概念飘移-concept drift-Python小结

概念飘移-concept drift-Python小结

作者:互联网

概念飘移

我个人目前浅读了大概10几篇论文为了自己的硕士论文,大概对于概念飘移(下面简称CD:concept drift)有一些基础的了解。
首先要说明几点我的理解:

  1. CD是指数据集背后的Concept/原因发生了变化,
  1. CD有5种:abrupt,gradual,incremental,recurrent,blip,意思可以参考下图:

  2. CD出现了我们想要的做的就是:检测+改进我们的模型

  1. CD主要应用于incremental training的场景
  1. 主要的PYTHON框架:RIVER(ski-multiflow+creme的合体版)

  2. 主要的java框架: moa(wakaido大学出版)--> 之后得好好学学

PYTHON简单实现+sklearn训练

STAGGER:介绍

This dataset contains three nominal attributes, namely size = {small, medium, large}, color={red, green}, and shape={circular, non-circular}. Before the first drift point, instances are labeled positive if (color = red) ∧ (size = small). After this point and before the second drift, instances are classified positive if (color = green) v (shape = circular), and finally after this second drift point, instances are classified positive only if (size = medium) v (size = large).

from enum import Enum
class Size(Enum):
  small = 0
  medium = 1
  large = 2
class Color(Enum):
  red = 0
  green = 1
class Shape(Enum):
  circular = 0
  noncircular = 1
def generate_stagger(size=1000):
  """
    one datarow:(0,1,1,1) == (size.small,color.green,shape.noncircular,true)
    with easy change we can also get row like (small,green,noncircular,true): but just add workload
  """
  rule1 = lambda si,co,sh:co==Color.red.value and si==Size.small.value
  rule2 = lambda si,co,sh:co==Color.green.value or sh==Shape.circular.value
  rule3 = lambda si,co,sh:si==Size.medium.value or si==Size.large.value
  
  ans2 = list(map(rule1,[0],[0],[1]))

  print(ans2)
  np.random.seed(10)

  stagger_ = []
  ans_ = []
  for i,rule in enumerate([rule1,rule2,rule3]):
    # for size
    size_ = np.random.randint(0,3,size=int(size/3))
    # for color
    color_ = np.random.randint(0,2,size=int(size/3))
    # for shape
    shape_ = np.random.randint(0,2,size=int(size/3))

    ans = list(map(rule,size_,color_,shape_))
    # true->1 false->0
    ans = [1 if a else 0 for a in ans]

    temp = list(zip(size_,color_,shape_))
    ans_.extend(ans)
    stagger_.extend(temp)

  return stagger_,ans

使用普通batch training时候GaussianNB的测试结果:能大概在300左右看到acc下降

SIN1:介绍

It consists of two attributes x and y uniformly distributed in [0, 1]. The classification function is y = sin(x). Instances are classified as positive if they are under the curve; otherwise they are classified as negative. At a drift point, the class labels are reversed.

# create synthetic sin1 dataset
def generate_sin(size=1000):
  random_state = np.random.RandomState(42)
  x = random_state.normal(0.5,0.5,size)
  noise = random_state.normal(0.3,0.2,size)-0.3
  y = np.sin(x)+ noise
  target1 = [1 if n < 0 else 0 for n in noise[:int(size/2)]]
  # drift point at half -> inverse
  target2 = [0 if n < 0 else 1 for n in noise[int(size/2):]]
  target1.extend(target2)
  return zip(x,y),target1

测试结果:总觉得NB就没学会这个acc一直在下降

标签:concept,Python,random,drift,color,shape,ans,size
来源: https://www.cnblogs.com/niemand-01/p/14195363.html