编程语言
首页 > 编程语言> > Python实现点与点、点与线的批量近邻匹配(TransBigData)

Python实现点与点、点与线的批量近邻匹配(TransBigData)

作者:互联网

近邻匹配

Python的TransBigData包提供了点与点、点与线的近邻匹配算法,下面的案例展示如何用TransBigData包进行点与点、点与线的近邻匹配。该方法使用的是KDTree算法,可查看wiki:https://en.wikipedia.org/wiki/K-d_tree,算法复杂度为o(log(n))

点与点匹配(DataFrame与DataFrame)

导入TransBigData包

import transbigdata as tbd

生成两个GeoDataFrame表,但它们只有经纬度列

import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
dfA = gpd.GeoDataFrame([[1,2],[2,4],[2,6],
                        [2,10],[24,6],[21,6],
                        [22,6]],columns = ['lon1','lat1'])
dfB = gpd.GeoDataFrame([[1,3],[2,5],[2,2]],columns = ['lon','lat'])

使用tbd.ckdnearest进行点与点匹配,如果是DataFrame与DataFrame匹配(不含有地理信息),则需要指定前后两个表的经纬度列

transbigdata.ckdnearest(dfA_origindfB_originAname=['lon', 'lat']Bname=['lon', 'lat'])

输入两个DataFrame,分别指定经纬度列名,为表A匹配表B中最近点,并计算距离

输入

dfA_origin:DataFrame

        表A

dfB_origin:DataFrame

        表B

Aname:List

        表A中经纬度列字段

Bname:List

        表B中经纬度列字段

输出

gdf:DataFrame

        为A匹配到B上最近点的表

tbd.ckdnearest(dfA,dfB,Aname=['lon1','lat1'],Bname=['lon','lat'])
#此时计算出的距离为经纬度换算实际距离
lon1lat1indexlonlatdist
0120131.111949e+05
1241251.111949e+05
2261251.111949e+05
32101255.559746e+05
42461252.437393e+06
52161252.105798e+06
62261252.216318e+06

点与点匹配(GeoDataFrame与GeoDataFrame)

将A表B表变为含有点信息的GeoDataFrame

dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])
dfB['geometry'] = gpd.points_from_xy(dfB['lon'],dfB['lat'])

使用tbd.ckdnearest_point进行点与点匹配

transbigdata.ckdnearest_point(gdAgdB)

输入两个GeoDataFrame,gdfA、gdfB均为点,该方法会为gdfA表连接上gdfB中最近的点,并添加距离字段dsit

输入

gdA:GeoDataFrame

        表A,点要素

gdB:GeoDataFrame

        表B,点要素

输出

gdf:GeoDataFrame

        为A匹配到B上最近点的表

tbd.ckdnearest_point(dfA,dfB)
#此时计算出的距离为经纬度距离
lon1lat1geometry_xdistindexlonlatgeometry_y
012POINT (1.00000 2.00000)1.000000013POINT (1.00000 3.00000)
124POINT (2.00000 4.00000)1.000000125POINT (2.00000 5.00000)
226POINT (2.00000 6.00000)1.000000125POINT (2.00000 5.00000)
3210POINT (2.00000 10.00000)5.000000125POINT (2.00000 5.00000)
4246POINT (24.00000 6.00000)22.022716125POINT (2.00000 5.00000)
5216POINT (21.00000 6.00000)19.026298125POINT (2.00000 5.00000)
6226POINT (22.00000 6.00000)20.024984125POINT (2.00000 5.00000)

点与线匹配(GeoDataFrame与GeoDataFrame)

将A表变为地理点,B表为线

dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])
dfB['geometry'] = [LineString([[1,1],[1.5,2.5],[3.2,4]]),
                  LineString([[1,0],[1.5,0],[4,0]]),
                   LineString([[1,-1],[1.5,-2],[4,-4]])]
dfB.plot()

_images/output_15_1.png

transbigdata.ckdnearest_line(gdfAgdfB)

输入两个GeoDataFrame,其中gdfA为点,gdfB为线,该方法会为gdfA表连接上gdfB中最近的线,并添加距离字段dsit

输入

gdA:GeoDataFrame

        表A,点要素

gdB:GeoDataFrame

        表B,线要素

输出

gdf:GeoDataFrame

        为A匹配到B中最近的线

用tbd.ckdnearest_line可以实现点匹配线,其原理是将线中的折点提取,然后使用点匹配点。

tbd.ckdnearest_line(dfA,dfB)
#此时计算出的距离为经纬度距离
lon1lat1geometry_xdistindexlonlatgeometry_y
012POINT (1.00000 2.00000)0.707107013LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
124POINT (2.00000 4.00000)1.200000013LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
226POINT (2.00000 6.00000)2.332381013LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
3210POINT (2.00000 10.00000)6.118823013LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
4216POINT (21.00000 6.00000)17.912007013LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
5226POINT (22.00000 6.00000)18.906084013LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
6246POINT (24.00000 6.00000)20.880613125LINESTRING (1.00000 0.00000, 1.50000 0.00000, ...

 

标签:1.00000,匹配,Python,近邻,TransBigData,GeoDataFrame,dfB,dfA,2.00000
来源: https://blog.csdn.net/u013410354/article/details/121305104