相似性的英文是 : Similarity Measurement
- DTW 算法
https://blog.csdn.net/raym0ndkwan/article/details/45614813
算法介绍
https://www.cnblogs.com/luxiaoxun/archive/2013/05/09/3069036.html
算法介绍2
https://blog.csdn.net/niuniuyuh/article/details/54809587
算法介绍3
https://blog.mythsman.com/2016/04/19/1/
介绍4
http://blog.sciencenet.cn/blog-212252-701037.html
介绍5
6
https://www.youtube.com/watch?v=_K1OsqCicBY&list=PLQ9wcil_P4A6WkBBRd6Ywsi5HiSe3pBZP&index=3&t=0s
视频讲解
https://www.jianshu.com/p/05bee48cc6a2
python 实现
https://nipunbatra.github.io/blog/2014/dtw.html
python 讲解 , 最重要的
https://www.cnblogs.com/StrayWolf/articles/6792261.html
中文讲解最好的
DTM 算法的目标是找出一种对应关系 : x 上的每一个点如何对应到 y
Our aim is to find a mapping between all points of x and y. For instance, x(3) may be mapped to y(4) and so on.
-
hausdorff 距离
定义 :
123Hausdorff distanceassigns to each point of one set the distance to its closest point on the other and takesthe maximum over all these values.对一条线的每一个点, 计算另一条线上距离它最近的点 ,计为d1 , 对于 d1,d2,d3…..dN 的点集, 取其中的最大值 , 即为hausdorff 距离
用途:
1 2 |
given an object A find the most simple object A0 resembling A within given tolerance; |
给定一个形状 A ,找出 A在指定容忍范围内的最相似形状 A0
公式:
-
12345Let A , B ⊂ R² --- compact, we define the one-sided Hausdorff distancefrom A to B asδₕ(A,B) = max (~δₕ(A,B),~δₕ(B,A))
双向 Hausdorff 距离
算法
如果想比较两个曲线有多相像,必须先有模型量化什么是相像, 也就是: 怎么定义 similar
参考文章
https://www.cnblogs.com/luxiaoxun/archive/2013/05/09/3069036.html#undefined
Dynamic Time Warping 动态时间规整算法
到现在为止的实现方法(2019-05-16)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
import sys # import matplotlib.pylab as plt import matplotlib.pyplot as plt import matplotlib.animation as animation import matplotlib.dates as md import pymysql import numpy as np import datetime as dt import pandas as pd import seaborn as sns import time import sys #sys.argv += 'if1906_20190516 09:30:00 10:00:00'.split() def normalizeArray(pa): amin,amax = min(pa),max(pa) for j in range(len(pa)): pa[j]=(pa[j]-amin)/(amax-amin) def distance_cost_plot(distances): im = plt.imshow(distances, interpolation='nearest', cmap='Reds') plt.gca().invert_yaxis() plt.xlabel("X") plt.ylabel("Y") plt.grid() plt.colorbar() #plt.show() def path_cost(x, y, accumulated_cost, distances): path = [[len(x)-1, len(y)-1]] cost = 0 i = len(y)-1 j = len(x)-1 while i>0 and j>0: if i==0: j = j - 1 elif j==0: i = i - 1 else: if accumulated_cost[i-1, j] == min(accumulated_cost[i-1, j-1], accumulated_cost[i-1, j], accumulated_cost[i, j-1]): i = i - 1 elif accumulated_cost[i, j-1] == min(accumulated_cost[i-1, j-1], accumulated_cost[i-1, j], accumulated_cost[i, j-1]): j = j-1 else: i = i - 1 j= j- 1 path.append([j, i]) path.append([0,0]) for [y, x] in path: cost = cost +distances[x, y] return path, cost conn=pymysql.connect(host='localhost',user='root',password='MYSQLTB',db='shfuture') a=conn.cursor() sql = 'select lastprice ,case when hour(happentime)<=11 then DATE_ADD(happentime,interval 90 minute) else happentime end from ' + sys.argv[1] + ' where time(happentime)>"' + sys.argv[2] + '" and time(happentime)<"' + sys.argv[3] + '";' print(sql) a.execute(sql) data=a.fetchall() x=[] s0=[] for result in data: x.append(result[0]) s0.append(result[1]) #print(x) normalizeArray(x) print('orignal array have ' , len(x) , ' elements ') print('************************************************') loopi = 1 loopTableName = sys.argv[1] while loopi < 20: loopi = loopi + 1 #sql = 'select lastprice ,case when hour(happentime)<=11 then DATE_ADD(happentime,interval 90 minute) else happentime end from if1906_20190419' + ' where time(happentime)<"' + sys.argv[2] + '";' sql = 'SELECT table_name FROM INFORMATION_SCHEMA.TABLES WHERE table_schema = "shfuture" and table_name like "if%" and table_name < "' + loopTableName + '" order by create_time desc limit 1;' #print(sql) a.execute(sql) data=a.fetchall() for result in data: loopTableName = result[0] print(loopTableName) sql = 'select lastprice ,case when hour(happentime)<=11 then DATE_ADD(happentime,interval 90 minute) else happentime end from ' + loopTableName + ' where time(happentime)>"' + sys.argv[2] + '" and time(happentime)<"' + sys.argv[3] + '";' #print(sql) a.execute(sql) data=a.fetchall() y=[] s0=[] for result in data: y.append(result[0]) s0.append(result[1]) #print(y) #s = time.time() normalizeArray(y) #print("original array normalize Took %f seconds" % (time.time() - s)) distances = np.zeros((len(y), len(x))) #建立一个 len(y) 行 , len(x) 列 的矩阵 ,并初始化为零 #print(distances) s = time.time() for i in range(len(y)): for j in range(len(x)): distances[i,j] = (x[j]-y[i])**2 # 计算 y的每个点到 x的距离 print("cal x y distance matrix use time %f seconds" % (time.time() - s)) #print(distances) #distance_cost_plot(distances) # 另一个矩阵 accumulated_cost = np.zeros((len(y), len(x))) accumulated_cost[0,0] = distances[0,0] s = time.time() for i in range(1, len(x)): accumulated_cost[0,i] = distances[0,i] + accumulated_cost[0, i-1] for i in range(1, len(y)): accumulated_cost[i,0] = distances[i, 0] + accumulated_cost[i-1, 0] for i in range(1, len(y)): for j in range(1, len(x)): accumulated_cost[i, j] = min(accumulated_cost[i-1, j-1], accumulated_cost[i-1, j], accumulated_cost[i, j-1]) + distances[i, j] #这句是最慢的 print("cal accumulate matrix use time %f seconds" % (time.time() - s)) #distance_cost_plot(accumulated_cost) 不用画图 #找出最短路径 s = time.time() path, cost = path_cost(x, y, accumulated_cost, distances) print("find shortest path use time %f seconds" % (time.time() - s)) #print(path) print('DTM value from ' , sys.argv[1] , ' to ' , loopTableName , ' is ' , cost) |
缺点是速度太慢
怎么提升速度 ?
a) 合并数据, 一分钟有 120个,用1个来替代
- 有人做的python 库
https://jekel.me/2017/Comparing-measures-of-similarity-between-curves/
拿两条 正弦波图形测试