首页 > 其他分享> > 量化交易米筐因子数据处理 -- 市值中性化

量化交易米筐因子数据处理 -- 市值中性化

2021-10-13 16:30:29 作者：互联网

因子数据处理 – 市值中性化

1. 中性化意义

防止选到的股票集中在固定的某些股票当中

市值影响，默认大部分因子都包含了市值的影响，去除其它因子存在的市值影响。
去除市值影响市值 <==> 某因子去除相关性
回归法去除

建立回归方程： x(特征：市值) * w + b = y(某因子)

回归方程预测：市值(X) * 系数(w) + 偏置(b) = 预测值( y_predict)

计算偏差：某因子（y） - 预测值(y_predict) = 偏差

偏差：不受影响的部分

2. 回归法API

from sklearn.linear_model import LinearRegression

把市值设置成特征，市值不进行任何处理
将其它因子设置成目标值

3. 案例：去除市净率与市值之间的联系部分

3.1 分析

获取两个因子数据
对目标值因子-市净率进行去极值，标准化处理
建立市值与市净率回归方程
通过回归系数，预测新的因子结果y_predict
求出市净率与y_predict的偏差即心点因子值

3.2 代码

# 获取两个因子数据
# 获取所有股票代码
stocks = all_instruments('CS').order_book_id

# 1、获取因子 市净率与市值
fund = get_factor(stocks, factor=['pb_ratio', 'market_cap'], start_date='20180103', end_date='20180103')
# 删除日期index
fund = fund.reset_index(1, drop=True)
# 删除 nan数据
fund = fund.dropna()

# 2、对因子数据进行处理 3倍中位数,stand
fund['pb_ratio'] = mad(fund['pb_ratio'])
fund['pb_ratio'] = stand(fund['pb_ratio'])

# 3、确定建立回归方程特征值和目标值
# 传入训练的特征值是二维的形状
x = np.array(fund['market_cap']).reshape(-1, 1) 
y = fund['pb_ratio']

from sklearn.linear_model import LinearRegression
# 4、利用线性回归进行预测
lr = LinearRegression()
lr.fit(x, y)

# 5、得出每个预测值，让因子的真实值-预测值得出的误差，
# 得到中性化处理后的结果
y_predict = lr.predict(x)
res = y - y_predict
fund['pb_ratio'] = res

3.3 总结

原因：防止回测时选股集中
原理：建立回归关系

3.4 市值中心化选股对比

市值中性化处理：定期分散到不同股票里面
没有市值中性化处理：比较集中在某些股票

4.案例：去除中性化的因子回测

4.1 对pb_ratio进行中性化

1、获取市值和市净率因子数据
- 因子：极值，标准化，中性化处理
2、选股股票池
- 市净率小的某些股票

4.2 代码

# 在这个方法中编写任何的初始化逻辑。context对象将会在你的算法策略的任何方法之间做传递。
#  1、获取市值和市净率因子数据
#   + 因子：极值，标准化，中性化处理
#  2、选股股票池
#   + 市净率小的某些股票
from sklearn.linear_model import LinearRegression
import numpy as np
import pandas as pd

def init(context):
    # 获取所有股票代码
    # context.stocks = all_instruments('CS').order_book_id
    stocks = index_components('000300.XSHG') # 沪深300
    context.stocks = stocks
    # 每月定时
    scheduler.run_monthly(get_data, tradingday=1)

def get_data(context, bar_dict):
    # 查询两个因子的数据

    # 获取因子 市净率与市值
    fund = get_factor(context.stocks, ['pb_ratio', 'market_cap'])

    # 删除日期index
    fund = fund.reset_index(1, drop=True)
    
    # 删除 nan数据
    fund = fund.dropna()
    
    context.fund = fund
    # 因子数据的处理、去极值、标准化、市值中性化
    treat_data(context)

    # 利用市净率小的表现好
    quantile = context.fund['pb_ratio'].quantile(0.05) # 分位数(0.05)
    fund_select = context.fund[context.fund['pb_ratio'] <= quantile] # 筛选市净率小的股票

    # 市值排序 升序
    # fund_select = fund_select.sort_values(by='market_cap', ascending=True).head(10)
	
	# 获取股票代码列表
    context.stock_list = fund_select.index.values

    # 去除合约状态异常的股票
    filter_active(context)
    update_universe(context.stock_list)
    
    print(context.stock_list)
    # print(fund)


def treat_data(context):
    """
    因子数据的处理、去极值、标准化、市值中性化
    """
    context.fund['pb_ratio'] = mad(context.fund['pb_ratio'])
    context.fund['pb_ratio'] = stand(context.fund['pb_ratio'])

    # 市值中性化
    x = np.array(context.fund['market_cap']).reshape(-1, 1) 
    y = context.fund['pb_ratio']

    lr = LinearRegression()
    lr.fit(x, y)

    y_predict = lr.predict(x)
    res = y - y_predict
    context.fund['pb_ratio'] = res


# before_trading此函数会在每天策略交易开始前被调用，当天只会被调用一次
def before_trading(context):
    pass


# 你选择的证券的数据更新将会触发此段逻辑，例如日或分钟历史数据切片或者是实时数据切片更新
def handle_bar(context, bar_dict):
    # 获取仓位
    position_keys = context.portfolio.positions.keys()
    if len(position_keys) != 0:
        for stock in position_keys:
            # 如果旧的股票池，不在新的股票池中
            if stock not in context.stock_list:
                order_target_percent(stock, 0)

    # 买入最新的更新的股票
    # 等比率资金买入 投资组合总价值的百分比平分 20份
    weight = 1.0 / len(context.stock_list)
    for stock in context.stock_list:
        order_target_percent(stock, weight)
    pass

# after_trading函数会在每天交易结束后被调用，当天只会被调用一次
def after_trading(context):
    pass

def filter_active(context):
    # 去除合约状态异常的股票
    stock_list = []
    for stock in context.stock_list:
        is_st = is_st_stock(stock, count=1) 
        day_from = instruments(stock).days_from_listed()
        day_expire = instruments(stock).days_to_expire()
        status = instruments(stock).status
        # print(stock, is_st, day_from, day_expire, status)
        
        if status != 'Active':
            continue
        stock_list.append(stock)

    context.stock_list = np.array(stock_list)

def mad(factor, n=3):
    """
    3倍中位数偏差法
    """
    # 1、找中位数
    med = np.median(factor)
    
    # 2.求绝对偏差
    # 3、计算绝对偏差的中位数 MAD 
    mad = np.median(abs(factor - med))
    
    # 求出上下限
    up = med + (1.4826 * mad) * n # 计算 MAD_e = 1.4826 * MAD 然后确定参数n 做出调整
    down = med - (1.4826 * mad) * n
    
    # 去极值
    factor = np.where(factor > up, up, factor)
    factor = np.where(factor < down, down, factor)
    
    return factor

def stand(factor):
    """
    自己实现标准化
    """
    mean = factor.mean()
    std = factor.std()
    
    return (factor - mean) / std

未添加中心化策略
在这里插入图片描述

添加中性化处理
在这里插入图片描述

4.3 市净率因子去选股

# 多因子选择 
# 四个因子的市值影响去除掉
pcf_ratio pe_ratio revenue operating_revenue

市值影响(选股集中) [01, 02, 06, 10, 09] ==> [01, 02, 06, 10, 09]
去除影响(选股分散) [01, 02, 06, 10, 09] ==> [03, 05, 06, 11, 09]

标签：ratio,--,fund,因子,市值,中性化,factor,数据处理
来源： https://blog.csdn.net/weixin_45875105/article/details/120746479

量化交易 米筐 因子数据处理 -- 市值中性化