Skip to content

Commit 569122d

Browse files
Jai ShahJai Shah
Jai Shah
authored and
Jai Shah
committed
my first commit
0 parents  commit 569122d

File tree

69 files changed

+3807
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+3807
-0
lines changed

Canny Edge Detection/CannyEdgeDetection.ipynb

+274
Large diffs are not rendered by default.
Loading
Loading
Loading
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,357 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Dynamic Time Warping (DTW)\n",
8+
"This notebook describes the DTW concepts.\n"
9+
]
10+
},
11+
{
12+
"cell_type": "markdown",
13+
"metadata": {},
14+
"source": [
15+
"## Time Series\n",
16+
"* A time series is a collection of observations made sequentially.\n",
17+
"* Time series occur in different fields such as medical, scientific and businesses domains.\n",
18+
"* Finding the similarity between two time series is useful for clustering and classification.\n",
19+
"\n",
20+
"\n",
21+
"\n",
22+
"## Dynamic Time Warping (DTW)\n",
23+
"Dynamic Time Warping (DTW) is an algorithm for measuring similarity between two sequences which may not have the same length.\n",
24+
"\n",
25+
"<img src=\"BWQ6YDNCD5RH21TRCV6K7URPA05UCM0T.png\"/>\n",
26+
"\n",
27+
"\n",
28+
"In general, the DTW maps each element in the first sequence to an element in the second series. Assuming that a distance function is defined for each pair of points, the goal is to find a mapping that minimizes the total distance between all the points.\n"
29+
]
30+
},
31+
{
32+
"cell_type": "code",
33+
"execution_count": 3,
34+
"metadata": {
35+
"scrolled": false
36+
},
37+
"outputs": [
38+
{
39+
"name": "stdout",
40+
"output_type": "stream",
41+
"text": [
42+
"Automatic pdb calling has been turned OFF\n"
43+
]
44+
},
45+
{
46+
"data": {
47+
"application/vnd.jupyter.widget-view+json": {
48+
"model_id": "7f357ddcb04c4bcbb1fa5fec2667d13c"
49+
}
50+
},
51+
"metadata": {},
52+
"output_type": "display_data"
53+
}
54+
],
55+
"source": [
56+
"try:\n",
57+
" if __IPYTHON__:\n",
58+
" from IPython import get_ipython\n",
59+
"\n",
60+
" get_ipython().magic('matplotlib inline')\n",
61+
" from ipython_utilities import *\n",
62+
" from ipywidgets import interact, fixed, FloatSlider, IntSlider, Label, Checkbox, FloatRangeSlider\n",
63+
" from IPython.display import display\n",
64+
"\n",
65+
" in_ipython_flag = True\n",
66+
"except:\n",
67+
" in_ipython_flag = False\n",
68+
"import cv2 as cv\n",
69+
"from matplotlib import pyplot as plt\n",
70+
"import numpy as np\n",
71+
"from PIL import Image\n",
72+
"from ipywidgets import interact, fixed, FloatSlider, IntSlider, Label, Checkbox, FloatRangeSlider\n",
73+
"\n",
74+
"%matplotlib inline\n",
75+
"%pdb\n",
76+
"\n",
77+
"def display_two_sequences(n1,n2):\n",
78+
" index1 = np.linspace(0,15,n1)\n",
79+
" index2 = np.linspace(0,15,n2)\n",
80+
" A = 5*np.sin(index1)\n",
81+
" B = 3*np.sin(index2 + 1)\n",
82+
" # s1 = [1, 2, 3, 4]\n",
83+
" # s2 = [2, 3, 5, 6, 8]\n",
84+
" # ob_dtw = cl_dtw()\n",
85+
" # distance,_ = ob_dtw.calculate_dtw_distance(s1, s2)\n",
86+
"\n",
87+
" fig = plt.figure(figsize=(12,4))\n",
88+
" plt.plot(index1, A, '-ro', label='A')\n",
89+
" plt.plot(index2, B, '-bo' ,label='B')\n",
90+
" plt.ylabel('value')\n",
91+
" plt.xlabel('index')\n",
92+
" plt.legend()\n",
93+
"\n",
94+
" plt.show()\n",
95+
" plt.pause(0.001)\n",
96+
" \n",
97+
"interact(display_two_sequences,\n",
98+
" n1=IntSlider(min=0, \n",
99+
" max=100, step=1,value=5,\n",
100+
" description='# of points in sequence 1',\n",
101+
" continuous_update=True),\n",
102+
" n2=IntSlider(min=0, \n",
103+
" max=100, step=1,value=7,\n",
104+
" description='# of points in sequence 2',\n",
105+
" continuous_update=True));\n",
106+
"\n",
107+
"# arrange_widgets_in_grid(controls)\n"
108+
]
109+
},
110+
{
111+
"cell_type": "markdown",
112+
"metadata": {},
113+
"source": [
114+
"Consider the two sequences $A=\\{a_1, a_2, ... , a_M\\}$ and $B=\\{b_1, b_2, ... , b_N\\}$ as shown above. Where $a_k$ is the $k_{th}$ sample point in the sequence $A$ and $b_k$ is the $k_{th}$ sample point in the sequence $B$. To find the similarity of these two sequences we need to calculate the total distance between the two sequences. To calculate the total distance we can connects each point in sequence $A$ to a point in sequence $B$ and accumulate the distances between the corresponding points. Since there are so many different possibilities for connecting different points, then the question is what is the best possible arrangement which results in the minimum total distance? \n",
115+
"In order for the similarity measure to be meaningful we need to impose some constraints on how the points on the two sequence should be connected:\n",
116+
"* **Boundary conditions:** the first points should be connected to each other and the last points should be connected to each point.\n",
117+
"* **Monotonicity:** The alignment can not go backward. \n",
118+
"* **Continuity:** The alignment can not skip an element.\n",
119+
"* **Warping window:** The same point (feature) should not be repeated too many times.\n",
120+
"\n",
121+
"### Path\n",
122+
"A path $P$ is defined as an ordered set of 2-tuples:\n",
123+
"\n",
124+
"$$\\large P=p_1,p_2, .....p_q$$\n",
125+
"$$\\large p_k=(i_k,j_k)$$\n",
126+
"\n",
127+
"where $q$ is the number of connections (correspondences) and $i_k$ and $j_k$ are the indexes of the connecting elements. $P_s$ is called a \"Warping\" function.\n",
128+
"\n",
129+
"For example $P=(1,1), (1,2), (2,3), ... $ means:\n",
130+
"* Point 1 in sequence A is connected to point 1 in sequence B\n",
131+
"* Point 1 in sequence A is connected to point 2 in sequence B\n",
132+
"* Point 2 in sequence A is connected to point 3 in sequence B\n",
133+
"* ...\n",
134+
"\n",
135+
"\n",
136+
"\n",
137+
"\n",
138+
"A path can be shown on a grid of M rows by N columns. The image below shows two possible paths. Notice that there are many possible paths for connecting elements of the sequence A to elements of the sequence B without violating any the constraints. The cost of each path is the accumulated distance between the corresponding points. **The goal of the DTW algorithm is to find the best path which minimizes the total cost.**\n",
139+
"<img src=\"V4VW3VNNRET6TKXJW2MR0A024BD0EMUM.png\"/>\n"
140+
]
141+
},
142+
{
143+
"cell_type": "markdown",
144+
"metadata": {},
145+
"source": [
146+
"### Finding the Best Path\n",
147+
"To find the best path:\n",
148+
"* Set an M by N matrix G\n",
149+
"* Set $G\\left[1,1\\right]=d(1,1)$ where $d(i,j)$ is the distance between elements i and j in the sequences respectively.\n",
150+
"* Calculate each element of the the matrix G as:\n",
151+
"\n",
152+
"$$\\large G\\left[i,j\\right]=d(i,j)+min(G\\left[i,j-1\\right],G\\left[i-1,j\\right],G\\left[i-1,j-1\\right])$$\n",
153+
"\n",
154+
"* Total distance will be $D(A,B)=G\\left[M,N\\right]$\n",
155+
"\n"
156+
]
157+
},
158+
{
159+
"cell_type": "code",
160+
"execution_count": 4,
161+
"metadata": {},
162+
"outputs": [
163+
{
164+
"data": {
165+
"application/vnd.jupyter.widget-view+json": {
166+
"model_id": "e160798a8b184acf9fdc255e3765d910"
167+
}
168+
},
169+
"metadata": {},
170+
"output_type": "display_data"
171+
}
172+
],
173+
"source": [
174+
"def dtw(s1, s2, window=3):\n",
175+
" grid = np.inf*np.ones((len(s1), len(s2)))\n",
176+
" # grid[0, :] = abs(s1[0] - s2)\n",
177+
" for i in range(window+1):\n",
178+
" grid[0, i] = abs(s1[0] - s2[i])\n",
179+
" for j in range(window+1):\n",
180+
" grid[j, 0] = abs(s2[0] - s1[j])\n",
181+
" \n",
182+
" for i in range(1, len(s1)):\n",
183+
" for j in range(1, len(s2)):\n",
184+
" if abs(i-j) > window:\n",
185+
" continue\n",
186+
" grid[i, j] = abs(s1[i] - s2[j]) + min(grid[i - 1, j], grid[i, j-1], grid[i-1, j-1])\n",
187+
" \n",
188+
" \n",
189+
" print(grid)\n",
190+
" print(grid[-1, -1])\n",
191+
" \n",
192+
"def display_two_sequences(n1,n2):\n",
193+
" index1 = np.linspace(0,15,n1)\n",
194+
" index2 = np.linspace(0,15,n2)\n",
195+
" A = [5, 6, 9, 2, 6]*2\n",
196+
" B = [5, 7, 2, 6, 9 , 2]*2\n",
197+
" #A = 5*np.sin(index1)\n",
198+
" #B = 3*np.sin(index2 + 1)\n",
199+
" # s1 = [1, 2, 3, 4]\n",
200+
" # s2 = [2, 3, 5, 6, 8]\n",
201+
" # ob_dtw = cl_dtw()\n",
202+
" # distance,_ = ob_dtw.calculate_dtw_distance(s1, s2)\n",
203+
" print(A)\n",
204+
" print(B)\n",
205+
" \n",
206+
" dtw(A, B)\n",
207+
"# fig = plt.figure(figsize=(12,4))\n",
208+
"# plt.plot(index1, A, '-ro', label='A')\n",
209+
"# plt.plot(index2, B, '-bo' ,label='B')\n",
210+
"# plt.ylabel('value')\n",
211+
"# plt.xlabel('index')\n",
212+
"# plt.legend()\n",
213+
"\n",
214+
"# plt.show()\n",
215+
"# plt.pause(0.001)\n",
216+
"controls = interact(display_two_sequences,\n",
217+
" n1=IntSlider(min=0, \n",
218+
" max=100, step=1,value=5,\n",
219+
" description='# of points in sequence 1',\n",
220+
" continuous_update=True),\n",
221+
" n2=IntSlider(min=0, \n",
222+
" max=100, step=1,value=7,\n",
223+
" description='# of points in sequence 2',\n",
224+
" continuous_update=True));\n",
225+
"\n",
226+
"# arrange_widgets_in_grid(controls)\n"
227+
]
228+
},
229+
{
230+
"cell_type": "code",
231+
"execution_count": null,
232+
"metadata": {
233+
"collapsed": true
234+
},
235+
"outputs": [],
236+
"source": []
237+
},
238+
{
239+
"cell_type": "markdown",
240+
"metadata": {},
241+
"source": [
242+
"### Warping Window\n",
243+
"In order to gurantee that the alignment does not get stuck in one element a warping window is defined. The warping window limits the difference between the indexes of the two sequence. In cases where the number of elements in both sequences are equal, the warping window forces the path not to wander too far from the diagnoal.\n",
244+
"<img src=\"PMCFBXFYO2WUX5EPRCQM0K8DALBYR0KB.png\"/>\n",
245+
"\n",
246+
"In a given path the warping window constraints can be imposed by limiting $\\left| {i_k - j_k} \\right| \\le r,r > 0$\n",
247+
"\n",
248+
"\n",
249+
"\n"
250+
]
251+
},
252+
{
253+
"cell_type": "markdown",
254+
"metadata": {},
255+
"source": [
256+
"### Slope Constraints\n",
257+
"In order to prevent that a very short section of one sequence match a very long section of another, a slope constraint is imposed on the path.\n",
258+
"<img src=\"A3VPKJJ0GMVJSNXICNJ5L5Y9JT9K2YKT.png\"/>\n",
259+
"\n",
260+
"$$\\large {{\\left( {{j_{k + s}} - {j_k}} \\right)} \\over {\\left( {{i_{k + s}} - {i_k}} \\right)}} \\le {s_h}$$\n",
261+
"\n",
262+
"and \n",
263+
"\n",
264+
"$$\\large {{\\left( {{i_{k + s}} - {i_k}} \\right)} \\over {\\left( {{j_{k + s}} - {j_k}} \\right)}} \\le {s_v}$$\n",
265+
"\n",
266+
"where $S_h>0$ and $S_v>0$ are the limiting slope constants.\n"
267+
]
268+
},
269+
{
270+
"cell_type": "markdown",
271+
"metadata": {},
272+
"source": [
273+
"### Time Normalized Distance Measure\n",
274+
"The time normalized distance between two sequences $A$ and $B$ for a particular path $P$ is defined as:\n",
275+
"\n",
276+
"$$\\large D(A,B)= {{{\\sum\\limits_{k = 1}^q {d({p_k}) \\cdot {w_k}} } \\over {\\sum\\limits_{k = 1}^q {{w_k}} }}} $$\n",
277+
"\n",
278+
"$$\\large p_k=(i_k,j_k)$$\n",
279+
"\n",
280+
"where $d(p_k)$ is the distance between element $i_k$ in the sequence A and element $j_k$ in the sequence B.\n",
281+
"\n",
282+
"$w_k$ is the weighting coefficient for the connection k. \n",
283+
"\n",
284+
"The best path $P^*$ is found by minimizing the $D(A,B)$\n",
285+
"\n",
286+
"$$\\large {P^*} = \\mathop {argmin }\\limits_P (D(A,B))$$\n",
287+
"\n",
288+
"\n",
289+
"The term $\\sum\\limits_{k = 1}^q {{w_k}} $ in the denominator of the $D(A,B)$ complicates the optimization of the best path because it dependes on the length of the path. It is desirable to find some weighting coefficients that are independent of the path $P$. For example if we define \n",
290+
"$$\\large w_k= (i_k-i_{k-1})+(j_k-j_{k-1})$$\n",
291+
"\n",
292+
"then \n",
293+
"$$\\large \\sum\\limits_{k = 1}^q {{w_k}}=M+N=C$$\n",
294+
"\n",
295+
"This means that the denominator of the $D(A,B)$ is a constant\n",
296+
"\n",
297+
"$$\\large D(A,B)={1 \\over C}\\mathop {\\mathop {argmin }\\limits_P \\left[ {\\sum\\limits_{s = 1}^k {d({p_s}) \\cdot {w_s}} } \\right]}\\limits_{} $$ \n",
298+
"\n",
299+
"A an alternative we can define \n",
300+
"$\\large w_k= (i_k-i_{k-1})$ which implies $\\large \\sum\\limits_{k = 1}^q {{w_k}}=M=C$\n",
301+
"\n",
302+
"or $\\large w_k= (j_k-j_{k-1})$ which implies $\\large \\sum\\limits_{k = 1}^q {{w_k}}=N=C$\n",
303+
"\n",
304+
"The algorithm for finding the best path with time normalization $w_k=(ik−ik−1)+(jk−jk−1)$ will be:\n",
305+
"\n",
306+
"* Set an M by N matrix G\n",
307+
"* Set $G\\left[1,1\\right]=2d(1,1)$ where $d(i,j)$ is the distance between elements i and j in the sequences respectively.\n",
308+
"* Calculate each element of the the matrix G as:\n",
309+
"\n",
310+
"$$\\large G\\left[i,j\\right]=min(G\\left[i,j-1\\right],G\\left[i-1,j\\right],2d(i,j)G\\left[i-1,j-1\\right])$$\n",
311+
"* Total distance will be:\n",
312+
"$$\\large D(A,B)={{G\\left[M,N\\right]}\\over {(N+M)}}$$\n"
313+
]
314+
}
315+
],
316+
"metadata": {
317+
"anaconda-cloud": {},
318+
"kernelspec": {
319+
"display_name": "Python [Root]",
320+
"language": "python",
321+
"name": "Python [Root]"
322+
},
323+
"language_info": {
324+
"codemirror_mode": {
325+
"name": "ipython",
326+
"version": 3
327+
},
328+
"file_extension": ".py",
329+
"mimetype": "text/x-python",
330+
"name": "python",
331+
"nbconvert_exporter": "python",
332+
"pygments_lexer": "ipython3",
333+
"version": "3.5.2"
334+
},
335+
"widgets": {
336+
"state": {
337+
"0555dd1618ac47b3879d19b0892a25a2": {
338+
"views": [
339+
{
340+
"cell_index": 2
341+
}
342+
]
343+
},
344+
"47a1a1eb88fb4c3d914c19a8c7d83596": {
345+
"views": [
346+
{
347+
"cell_index": 5
348+
}
349+
]
350+
}
351+
},
352+
"version": "1.2.0"
353+
}
354+
},
355+
"nbformat": 4,
356+
"nbformat_minor": 1
357+
}

0 commit comments

Comments
 (0)