编程语言
首页 > 编程语言> > python机器学习与数据科学

python机器学习与数据科学

作者:互联网

简单概述

  1. 机器学习:统计学,人工智能,计算机科学三门学科的综合。
  2. 机器学习:机器学习可以简单的理解为将大量数据(将数据按照一定的分配方式分为训练集数据和测试集数据)放入到一个黑箱子(某种算法),通过大量数据训练测试产生有价值的信息。即:数据————>算法(黑箱子)————>结果
  3. 机器学习分为:监督式学习(supervised learning) ,非监督式学习(unsupervised learning)

例子:预测平安银行未来7天股价的变化,并判断其准确性(代码实现)

import pandas as pd
import tushare as ts  #(若没有安装 :pip install tushare /  conda install tushare)
import math
import numpy as np
from sklearn import preprocessing,svm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df=ts.get_hist_data('000001')
df=df[['open','high','close','low','volume']]
df['High_Low_Pct']=(df['high']-df['low'])/df['low']*100
df['Change_Pct']=(df['close']-df['open'])/df['open']*100

df=df[['close','High_Low_Pct','Change_Pct','volume']]

pd.set_option('display.max_rows',1000)
pd.set_option('display.max_columns',1000)

#print(df.head())
future_value='close'
df.fillna(value=-99999,inplace=True)
how_far_I_want_to_forecast=int(math.ceil(0.01*len(df)))
#print(how_far_I_want_to_forecast)
df['label']=df[future_value].shift(-how_far_I_want_to_forecast)
df.dropna(inplace=True)

#print(df.head())

x=np.array(df.drop(['label'],1))
x=preprocessing.scale(x)

x_recent_real_data=x[-how_far_I_want_to_forecast:]

y=np.array(df['label'])

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)


black_box=LinearRegression()
black_box.fit(x_train,y_train)

forecast_set=black_box.predict(x_recent_real_data)
print(forecast_set)

#accuracy=black_box.score(x_test,y_test)
#print(accuracy)
----------------------------------
[9.52053828 9.86518547 9.39170026 9.41294818 9.34341797 9.34482912
 9.29581423]
0.9106378525453049 

标签:机器,python,print,forecast,学习,df,train,test,import
来源: https://blog.csdn.net/weixin_45290455/article/details/99321509