-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path2_discrepancy.tex
665 lines (585 loc) · 32.1 KB
/
2_discrepancy.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
% !TEX root = Daniel-Miller-thesis.tex
\chapter{Discrepancy}\label{ch2:discrepancy}
\section{Equidistribution}
Discrepancy (also known as the Kolmogorov--Smirnov statistic) is a way of
measuring how closely sample data fits a predicted distribution. It has many
applications in computer science and statistics, but here we will focus on only
its basic properties, such as how discrepancy changes when sequences
are perturbed, transformed by a function, or combined.
First, recall that discrepancy provides a way of sharpening the soft
convergence results in \cite[A.1]{serre-1989}. Let $X$ be a compact
topological space, $\bx = (x_2,x_3,x_5,\dots)$ a sequence in $X$ indexed by
the rational primes, and $C(X)$ the space of continuous, $\bC$-valued functions
on $X$.
\begin{definition}
Let $\mu$ be a continuous probability measure on $X$. The sequence $\bx$ is
\emph{equidistributed} with respect to $\mu$ if for all $f\in C(X)$, we have
\[
\lim_{N\to \infty} \frac{1}{\pi(N)} \sum_{p\leqslant N} f(x_p) = \int f\, \dd \mu .
\]
\end{definition}
In other words, $\bx$ is $\mu$-equidistributed if the empirical measures
$P_{\bx,N} = \frac{1}{\pi(N)} \sum_{p\leqslant N} \delta_{x_p}$ converge to
$\mu$ in the weak topology. It is easy to see that $\bx$ is
$\mu$-equidistributed if and only if
$\left| \sum_{p\leqslant N} f(x_p)\right| = o(\pi(N))$ for all $f\in C(X)$
having $\int f\, \dd\mu = 0$. One can restrict to any set of functions which
generates a dense subpace of $C(X)^{\mu = 0}$.
In the discussion in \cite[A.1]{serre-1989}, $X$ is the space of conjugacy
classes in a compact Lie group, and $f$ is allowed to range over the characters
of irreducible, nontrivial, unitary representations of the group---these
generate a dense subspace of $C(X)^{\mu = 0}$ by the Peter--Weyl theorem.
Serre's results can be generalized to a much broader class of Dirichlet series,
which are of the form
\[
L_f(\bx,s) = \prod_p \left(1-f(x_p)p^{-s}\right)^{-1} .
\]
In fact, in light of the following theorem, we can consider functions $f$
which are only continuous almost everywhere. This allows us to consider
step functions like $1_{[0,\pi/2)} - 1_{(\pi/2,\pi]}$ on $[0,\pi]$.
\begin{theorem}
Let $X$ be a compact topological space, $\mu$ a Radon probability measure on
$X$, and $f\colon X\to \bC$ bounded and continuous $\mu$-almost everywhere. If
$\bx$ is $\mu$-equidistributed, then
\[
\lim_{N\to \infty} \frac{1}{\pi(N)} \sum_{p\leqslant N} f(x_p) = \int f\, \dd\mu .
\]
\end{theorem}
\begin{proof}
We prove the more general result that if $\{\mu_n\}$ is a sequence of Radon
(i.e.~finite and regular)
probability measures on $X$ which converges weakly to $\mu$, then
$\mu_n(f) \to \mu(f)$ for all $f$ which are bounded and continuous
$\mu$-almost everywhere.
Let $D$ be the set of points at which $f$ is not continuous. For every
$\epsilon>0$, there exists an open $U_\epsilon\supset D$ with
% Tietze extension: Rudin R&C th 20.4
$\mu(U_\epsilon) < \epsilon$. By the Tietze extension theorem, there exists
$f_\epsilon\in C(X)$ such that $|f_\epsilon|_\infty \leqslant |f|_\infty$ and
$f_\epsilon|_{X\smallsetminus U_\epsilon} = f|_{X\smallsetminus U_\epsilon}$.
Note that
\begin{equation}\label{eq:measures-ineq}
|\mu_n f - \mu f| \leqslant |\mu_n f - \mu_n f_\epsilon| + |\mu_n f_\epsilon - \mu f_\epsilon| + |\mu f_\epsilon - \mu f| .
\end{equation}
Now $|\mu_n f - \mu_n f_\epsilon| \leqslant 2\mu_n(U_\epsilon) |f|_\infty$. Since
$U_\epsilon$ is open, this converges to
$2\mu(U_\epsilon) |f|_\infty < 2\epsilon |f|_\infty$. The second term in
\eqref{eq:measures-ineq} converges to zero because $f_\epsilon$ is continuous,
and the third term can be bounded as
$|\mu f_\epsilon - \mu f| \leqslant 2\mu(U_\epsilon) |f|_\infty < 2\epsilon |f|_\infty$.
We have shown that
$\limsup_{n\to \infty} |\mu_n f - \mu f| \leqslant 4 \epsilon |f|_\infty$.
Since $\epsilon$ was arbitrary, the result follows.
\end{proof}
\section{Definitions and first results}
We will define discrepancy for measures on the $d$-dimensional half-open box
$[\vzero,\vinfty) = [0,\infty)^d\subset \bR^d$. For vectors
$\vx,\vy\in [\vzero,\vinfty)$, we say $\vx<\vy$ if $x_i<y_i\,\forall i$, and
in that case write $[\vx,\vy)$ for the half-open box
$[x_1,y_1)\times \cdots \times [x_d,y_d)$.
\begin{definition}
Let $\mu, \nu$ be probability measures on $[\vzero,\vinfty)$. The
\emph{discrepancy between $\mu$ and $\nu$} is
$\D(\mu,\nu) = \sup_{\vx < \vy} \left| \mu[\vx,\vy) - \nu[\vx,\vy)\right|$,
where $\vx$ and $\vy$ range over $[\vzero,\vinfty)$.
The \emph{star discrepancy between $\mu$ and $\nu$} is
$\D^\star(\mu,\nu) = \sup_{\vzero\leqslant \vy} \left| \mu[\vzero,\vy) - \nu[\vzero,\vy)\right|$,
where $\vy$ ranges over $[\vzero,\vinfty)$.
\end{definition}
\begin{lemma}\label{lem:star-reg-disc}
$\D^\star(\mu,\nu) \leqslant \D(\mu,\nu) \leqslant 2^d \D^\star(\mu,\nu)$.
\end{lemma}
\begin{proof}
The first inequality holds because the supremum defining discrepancy is
taken over a larger set than that defining star discrepancy. To prove the
second inequality, let $\vx<\vy$ be in $[\vzero,\vinfty)$. For
$S\subset \{1,\dots,d\}$, let
$I_S = \{ \vt \in [\vzero,\vy) : t_i < x_i \,\forall\,i\in S\}$.
\begin{figure}[h]
\caption{The sets $I_{\{1\}}$ and $I_{\{2\}}$ when $d = 2$.}
\centering
\begin{tikzpicture}
\draw (1,1) rectangle (4,4);
\draw (0,0) rectangle (4,4);
\draw (0,0) rectangle (1,4);
\draw (0,0) rectangle (4,1);
\node [above right] at (1, 1) {$\vx$};
\node at (1,1) {\textbullet};
\node [below left] at (4,4) {$\vy$};
\node at (.5,1) {$I_{\{1\}}$};
\node at (1,.5) {$I_{\{2\}}$};
\end{tikzpicture}
\end{figure}
Inclusion-exclusion tells us that
$\mu[\vx,\vy) = \sum_{S\subset \{1,\dots,d\}} (-1)^{\# S} \mu(I_S)$,
and similarly for $\nu$. Since each of the $I_S$ are half-open boxes
intersecting the origin, we know that
$|\mu(I_S) - \nu(I_S)| \leqslant \D^\star(\mu,\nu)$. It follows that
\[
|\mu[\vx,\vy) - \nu[\vx,\vy)| \leqslant \sum_{S\subset \{1,\dots,d\}} |\mu(I_S) - \nu(I_S)| \leqslant 2^d \D^\star(\mu,\nu) .
\]
For a discussion and related context, see
\cite[Ch.~2 Ex.~1.2]{kuipers-niederreiter-1974}.
\end{proof}
Since we are only interested in the asymptotics of discrepancy, we will
sometimes gloss over the distinction between discrepancy and star discrepancy,
using whichever type of discrepancy makes a proof easier to follow.
We are usually interested in comparing empirical measures and their conjectured
asymptotic distribution.
Let $\bvx = (\vx_1,\vx_2,\vx_3,\dots)$ be a sequence in
$[\vzero,\vinfty)$, and $\mu$ a probability measure on $[\vzero,\vinfty)$. For any
$N\geqslant 1$, the empirical measure associated to the truncated
sequence $\bvx_{\leqslant N} = (\vx_n)_{n\leqslant N}$ is
$P_{\bvx,N} = \frac{1}{N} \sum_{n\leqslant N} \delta_{\vx_n}$. Write
$\D_N(\bvx,\mu) = \D\left( P_{\bvx,N},\mu\right)$, and similarly for star
discrepancy. In this context,
\[
\D^\star_N(\bvx,\mu) = \sup_{\vy\in [\vzero,\vinfty)} \left| \frac{\# \{n\leqslant N : \vx_n \in [\vzero,\vy)\}}{N} - \mu[\vzero,\vy)\right| .
\]
When the measure $\mu$ is clear from the context, we will refer to
$\D_N(\bvx,\mu)$ (resp.~$\D_N^\star(\bvx,\mu)$) as the discrepancy (resp.~star
discrepancy) of the sequence $\bvx$.
If the measure $\mu$ is only defined on a Borel subset of $[\vzero,\vinfty)$, we
tacitly extend it by zero to $\bR^d$. It is also possible to define discrepancy
for sequences lying in compact Lie groups. For example, if the sequence $\bvx$
lies in a real torus $T$, choose a Lie isomorphism $T\simeq \bT^d = (\bR/\bZ)^d$,
and using that isomorphism identify the torus (as a measure space, not a
topological space) with
$[0,1)^d\subset [\vzero,\vinfty)$.
This gives a definition of discrepancy for sequences in $T$. Two different
Lie isomorphisms $T\simeq \bT^d$ will give two different definitions of
discrepancy, but asymptotically they will be bounded above and below by
constant multiples of each other as long as the measure in question is the
normalized Haar measure. In that case, we write $\D_N(\bvx)$ in place of
$\D_N(\bvx, \mu)$.
We can even define discrepancy for sequences in compact, connected Lie groups,
though we will only use $G = \SU(2)$. Let $G$ be such a group, and consider a
sequence lying in the space $G^\natural$ of conjugacy classes of $G$. Choose a
maximal torus $T\subset G$, and recall that
$G^\natural = T/W$, for $W$ the Weyl group of $T$. Choose a Lie isomorphism
$\bT^d = (\bR/\bZ)^d \simeq T$, and as before we can identify $T$ as a measure
space with
$[0,1)^d$. The Weyl group acts on $T$, and we can choose a connected fundamental
domain $Q$ for the action of $W$. Identifying
$G^\natural$ with $Q\subset [0,1)^d$, this gives a definition of discrepancy for
a sequence in $G^\natural$ with respect to the Haar measure. Of course this
definition depends on the choice of $T$, $Q$, and the Lie isomorphism
$T\simeq \bT^d$, but
asymptotically these definitions are all the same. The paper
\cite{rosengarten-2013} gives a different definition of discrepancy which only
works for semisimple simply-connected groups, but unlike this thesis, proves an
Erd\"os--Tur\'an inequality for that definition. It is likely that a reasonable
application of isotropic discrepancy would render these definitions equivalent,
at least for asymptotic purposes, but as the two definitions coincide for
$\SU(2)$, we do not explore this further.
Sometimes the sequence $\bx$ will not be indexed by the natural numbers, but
by the rational primes, or some other discrete subset of $\bR^+$. In that case
we will still use the notations $\D_N(\bx,\mu)$,
$\bx_{\leqslant N} = (\vx_\alpha)_{\alpha\leqslant N}$,
$\bx_{\geqslant N} = (\vx_\alpha)_{\alpha\geqslant N}$, etc.,
keeping in mind that the set $\{\vx_\alpha : \alpha \leqslant N\}$ is involved,
and that in formulas $\frac{1}{N}$ is replaced by
$\#\{\textnormal{indices }\leqslant N\}^{-1}$.
\emph{Why half-open boxes?} The choice of sets of the form $[\vx,\vy)$ in the
definition of discrepancy seems rather arbitrary, and it is. One could easily
define another kind of discrepancy as a supremum over all open or closed
balls---and
those definitions generalize to arbitrary metric spaces. There are also more
subtle definitions involving suprema over open or closed convex sets
(isotropic discrepancy). See \cite{kuipers-niederreiter-1974} for a discussion
and comparison of these differing definitions. In this thesis, we restrict to
half-open boxes because they are computationally tractable, fit well with
Diophantine approximation on tori, and the theory is most well-developed for
this definition.
\section{Statistical heuristics}
Let $\Omega$ be a probability space, and let $\{\theta_p\}$ be a collection of
prime-indexed iid (that is, independent and identically distributed)
random variables on
$\Omega$ with continuous joint distribution $\mu$. By this, we mean each
$\theta_p\colon \Omega \to \bR$ is measurable, and if $P$ is the probability
measure on $\Omega$, then
$\left(\prod_{p\in S} \theta_p\right)_\ast P = \prod_{p\in S} \mu$ for all
sets $S$ of primes. In
the language of statistics, $\{\theta_p\}$ is a sequence of iid random
variables with joint distribution $\mu$. For the sake of concreteness, the
reader may take $\mu = \frac{2}{\pi} \sin^2 \theta\, \dd\theta$,
supported on $[0,\pi]$. Then the discrepancy (known as the
Kolmogorov--Smirnov statistic in this context) is the random variable
\[
D_N = \sup_{x\in [0,\pi]} \left|\frac{1}{\pi(N)} \sum_{p\leqslant N} 1_{[0,x]}\circ \theta_p - \int 1_{[0,x]}\, \dd\mu\right| .
\]
Kolmogorov and Smirnov proved that the inside of the absolute value, a
function-valued random variable, converges
to zero. The Glivenko--Cantelli theorem says that $D_N \to 0$ almost everywhere,
and even better, the normalized discrepancy $\sqrt{\pi(N)} D_N$ approaches
a limiting distribution (supremum of the Brownian bridge) which does not
depend on $\mu$.
Now let $\theta_p$ be a collection of angles drawn at random from the
distribution $\ST$. Then the Glivenko--Cantelli theorem implies that as
$N\to \infty$, the bound
$\D_N(\btheta,\ST) = \Theta\left(\pi(N)^{-\frac 1 2}\right)$ holds. Of course,
if $E$ is an elliptic curve, the $\theta_p$ are not, \emph{a priori}, drawn at
random from $\ST$, so the Glivento--Cantelli theorem does not imply this bound
for any actual elliptic curve. However, since we expect the $\theta_p$ of an
actual elliptic curve to behave roughly as if they were drawn randomly from
$\ST$, the Glivenko--Cantelli theorem suggests that at least
$\D_N(\btheta,\ST) \ll N^{-\frac 1 2+\epsilon}$ (this is the
Akiyama--Tanigawa conjecture, i.e.~Conjecture \ref{conj:akiyama-tanigawa}).
See \cite{akiyama-tanigawa-1999} for computational evidence for this
conjecture. In a perfect world, the normalized discrepancy
$\sqrt{\pi(N)}\D_N(\btheta,\ST)$ would also be equidistributed, but sadly,
numerical experiments conducted by the author suggest this is not the case.
\section{The Koksma--Hlawka inequality}
In this section we summarize the results of the paper \cite{okten-1999},
generalizing them as needed for our context. Recall that a function $f$ on
$[\vzero,\vinfty)\subset \bR^d$ is said to be of \emph{bounded variation} (in
the measure-theoretic sense) if there is a finite Radon measure $\nu$ such that
$f(\vx) - f(\vzero) = \nu[\vzero,\vx]$. In such a case we write
$\Var(f) = |\nu|$. If $f\in C^d(\bR^d)$, then
$\Var(f) = \int_{[\vzero,\vinfty)} \left|\frac{\dd^d f}{\dd t_1 \dots \dd t_d} \right|$.
\begin{theorem}[Koksma--Hlawka]\label{thm:koksma-hlawka}
Let $\mu$ be a probability measure on $[\vzero,\vinfty)$, $f$ of bounded
variation. For any sequence $\bvx = (\vx_1,\vx_2,\dots)$ in $[\vzero,\vinfty)$,
we have
\[
\left| \frac{1}{N} \sum_{n\leqslant N} f(\vx_n) - \int f\, \dd\mu \right| \leqslant \Var(f) \D_N(\bvx,\mu) .
\]
\end{theorem}
\begin{proof}
By assumption, there is a finite Radon measure $\nu$
such that $f(\vy) - f(\vzero) = \nu[\vzero,\vy]$. Recall that
$1_{[\vzero,\vz]}(\vy) = 1_{[\vy,\vinfty)}(\vz)$ for any
$\vy,\vz\in [\vzero,\vinfty)$. Then
\begin{align*}
\frac{1}{N} \sum_{n\leqslant N} f(\vx_n) - \int f\, \dd\mu
&= \frac{1}{N} \sum_{n\leqslant N} \left(f(\vx_n) - f(\vzero)\right) - \int \left(f(\vx) - f(\vzero)\right)\, \dd\mu(\vx) \\
&= \frac{1}{N} \sum_{n\leqslant N} \int 1_{[\vzero,\vx_n]}(\vy)\, \dd \nu(\vy) - \int \int 1_{[\vzero,\vx]}(\vy)\, \dd\nu(\vy) \, \dd\mu(\vx) \\
&= \int \left(\frac{1}{N} \sum_{n\leqslant N} 1_{[\vy,\vinfty)}(\vx_n) - \int 1_{[\vy,\vinfty)}\, \dd\mu \right) \dd\nu(\vy) .
\end{align*}
It follows that
\[
\left| \frac{1}{N} \sum_{n\leqslant N} f(\vx_n) - \int f\, \dd\mu \right|
\leqslant \sup_{\vy\in [\vzero,\vinfty)} \left| \frac{1}{N} \sum_{n\leqslant N} 1_{[\vy,\vinfty)}(\vx_n) - \int 1_{[\vy,\vinfty)}\, \dd\mu\right| \cdot |\nu| .
\]
The supremum is bounded above by $\D_N(\bvx,\mu)$, so the proof is complete.
\end{proof}
This theorem is proved in a somewhat restrictive setting, and there are more
general versions of the theorem for less restrictive notions of bounded
variation. For example, for $f$ a function on $\bR^+$ that is bounded variation
in the traditional sense (any piecewise continuous function will do) and $\mu$
an absolutely continuous probability measure, the inequality
\[
\left| \frac{1}{N} \sum_{n\leqslant N} f(x_n) - \int f\, \dd\mu\right| \leqslant \Var(f) \D_N^\star(\bx,\mu)
\]
holds \cite[Ch.~2, Th.~5.1]{kuipers-niederreiter-1974}. In particular,
when $\mu$ is the Sato--Tate measure and $f$ is piecewise continuous, we can
apply this inequality. When $d>1$, the non-measure-theoretic definition of
variation is more general, but much more complicated, than the
measure-theoretic one. See \cite[2\S5]{kuipers-niederreiter-1974} for a
discussion.
\section{Comparing and combining sequences}
Throughout this section, $\lambda$ is the Lebesgue measure on $\bR$. Recall
that for another measure $\mu$ on $\bR$, the \emph{Radon--Nikodym derivative}
of $\mu$ with respect to $\lambda$ (when it exists) is uniquely determined by
$\cdf_\mu(x) = \int_{-\infty}^x \frac{\dd\mu}{\dd\lambda}(t)\, \dd t$. The
Radon--Nikodym derivative exists whenever $\mu$ is absolutely continuous with
respect to $\lambda$, i.e.~$\lambda(S) = 0\Rightarrow \mu(S)=0$ (this is the
Lebesgue--Radon--Nikodym theorem) \cite[Th.~3.8]{folland-1999}. The following
result, a generalization of \cite[Ch.~2 Th.~4.1]{kuipers-niederreiter-1974},
quantifies how much the discrepancy of a sequence changes when the elements of
the sequence are perturbed slightly.
\begin{lemma}\label{lem:disc-of-two-seq}
Let $\bx$ and $\by$ be sequences in $[0,\infty)$. Suppose $\mu$ is an
absolutely continuous probability measure on $[0,\infty)$ with bounded
Radon--Nikodym derivative $\frac{\dd \mu}{\dd\lambda}$. Let $\epsilon>0$ be
arbitrary. Then
\[
\left|\D_N^\star(\bx, \mu) - \D_N^\star(\by,\mu)\right| \leqslant \left|\frac{\dd\mu}{\dd\lambda}\right|_\infty \epsilon + \frac{\#\{ n\leqslant N : |x_n - y_n| \geqslant \epsilon\}}{N} .
\]
\end{lemma}
\begin{proof}
Fix $\epsilon>0$, and let $t\in [0,\infty)$ be arbitrary. Recall that
$P_{\bx,N} = \frac 1 N \sum_{n\leqslant N} \delta_{x_n}$ is the empirical
measure associated to $(x_n)_{n\leqslant N}$. For all $n\leqslant N$ such that
$y_n<t$, either $x_n < t+\epsilon$ or $|x_n - y_n| \geqslant \epsilon$. It
follows that
\[
P_{\by,N}[0,t) \leqslant P_{\bx,N}[0,t+\epsilon) + \frac{\#\{ n\leqslant N : |x_n - y_n| \geqslant \epsilon\}}{N} .
\]
Moreover, we have
$\left| P_{\bx,N}[0,t+\epsilon) - \mu[0,t+\epsilon)\right| \leqslant \D_N^\star(\bx,\mu)$. Putting these together, we get:
\begin{align*}
P_{\by,N}[0,t) - \mu[0,t)
&\leqslant P_{\bx,N}[0,t+\epsilon) - \mu[0,t) + \frac{\#\{ n\leqslant N : |x_n - y_n| \geqslant \epsilon\}}{N} \\
&\leqslant \mu[t,t+\epsilon) + \D_N^\star(\bx,\mu) + \frac{\#\{ n\leqslant N : |x_n - y_n| \geqslant \epsilon\}}{N} \\
&\leqslant \left|\frac{\dd\mu}{\dd\lambda}\right|_\infty \epsilon + \D_N^\star(\bx,\mu) + \frac{\#\{ n\leqslant N : |x_n - y_n| \geqslant \epsilon\}}{N}
\end{align*}
This tells us that
$\D_N^\star(\by,\mu) \leqslant \left|\frac{\dd\mu}{\dd\lambda}\right|_\infty \epsilon + \D_N^\star(\bx,\mu) + \frac{\#\{ n\leqslant N : |x_n - y_n| \geqslant \epsilon\}}{N}$.
Reversing the roles of $\bx$ and $\by$, we obtain the desired result.
\end{proof}
We can apply the above result to the case where the elements of two sequences
get closer and closer together. In the proof below, the exponent
$\frac{1}{\alpha+1}$ is optimal.
\begin{corollary}\label{cor:close-seq-disc}
Let $\bx$, $\by$, and $\mu$ be as in Lemma \ref{lem:disc-of-two-seq}. Suppose
$|x_n - y_n| \ll n^{-\alpha}$ for some fixed $\alpha>0$. Then
$|\D_N^\star(\bx,\mu) - \D_N^\star(\by,\mu)| \ll N^{-\frac{\alpha}{\alpha+1}}$.
\end{corollary}
\begin{proof}
Let $C>0$ be such that $|x_n - y_n| < C n^{-\alpha}$ for all $n$. Given
$N$, let $\epsilon_N = N^{-\frac{\alpha}{\alpha+1}}$. Note that
$C n^{-\alpha} \geqslant \epsilon_N$ if and only if
$\log C - \alpha\log n \geqslant -\frac{\alpha}{\alpha+1} \log N$, which is
equivalent to $n \leqslant N^{\frac{1}{\alpha+1}} C^{1/\alpha}$.
Lemma \ref{lem:disc-of-two-seq} with $\epsilon = \epsilon_N$ and the estimate
$\# \{n\leqslant N : |x_n - y_n|\} \leqslant N^{\frac{1}{\alpha+1}} C^{1/\alpha}$
tells us that
$|\D_N^\star(\bx,\mu) - \D_N^\star(\by,\mu)| \ll N^{-\frac{\alpha}{\alpha+1}} + N^{\frac{1}{\alpha+1} - 1} C^{1/\alpha}\ll N^{-\frac{\alpha}{\alpha+1}}$.
\end{proof}
This next result shows that if a sequence is transformed by an isometry of
$\bR$, the discrepancy of the transformed sequence is the same as the
discrepancy of the original sequence.
\begin{lemma}\label{lem:flip-discrepancy}
Let $\sigma$ be an isometry of $\bR$, and $\bx$ a sequence in $[0,\infty)$
such that $\sigma(\bx)$ is also in $[0,\infty)$. Let $\mu$ be an absolutely
continuous measure on $[0,\infty)$ such that $\sigma_\ast \mu$ is supported on
$[0,\infty)$. Then $\D_N(\bx, \mu) = \D_N(\sigma_\ast \bx, \sigma_\ast \mu)$.
\end{lemma}
\begin{proof}
Every isometry of $\bR$ is a composition of a translation and a reflection.
The statement is clear if $\sigma$ is a translation, as then the two
discrepancies are equal. So, suppose $\sigma(t) = a - t$ for some $a>0$. Since
$\mu$ is absolutely continuous, $\mu\{t\}=0$ for all $t\geqslant 0$, and
similarly for $\sigma_\ast\mu$. Thus
$\mu[s,t) = \sigma_\ast\mu(a-t,a-s] = \sigma_\ast\mu[a-t,a-s)$ for any
$[s,t)\subset [0,\infty)$. Let $\epsilon>0$ be arbitrary, and let
$I = [s,t)$ be such that $|P_{\bx,N}I - \mu I| > \D_N(\bx,\mu) - \epsilon$.
Since $P_{\bx,N}$ can assign positive measure to a point, it may not be that
$P_{\sigma_\ast \bx,N}(a-t,a-s] = P_{\sigma_\ast \bx,N}[a-t,a-s)$. Consider the
family of intervals $I_n = \left[a-t+\frac 1 n, a-s+\frac 1 n\right)$. For
$n$ sufficiently large,
$P_{\sigma_\ast\bx,N}I_n = P_{\sigma\ast\bx,N}(a-t,a-s]$ because no element of
$\sigma_\ast\bx$ is in either $\left(a-t,a-t+\frac 1 n\right)$ or
$\left(a-s,a-s+\frac 1 n\right)$. Moreover, since $\sigma_\ast\mu$ is absolutely
continuous,
$\sigma_\ast\mu(I_n) \to \sigma_\ast\mu(a-s,a-t] = \sigma_\ast\mu[a-s,a-t)$.
It follows that
\begin{align*}
|P_{\sigma_\ast\bx,N}(I_n) - \sigma_\ast\mu(I_n)|
&\to |P_{\sigma_\ast\bx,N}(a-s,a-t] - \sigma_\ast\mu(a-s,a-t]| \\
&= |P_{\bx,N}(I) - \mu(I)| .
\end{align*}
This means there exists $I_n$ such that
$|P_{\sigma_\ast\bx,N}(I_n) - \sigma_\ast\mu(I_n)|> \D_N(\bx,\mu) - \epsilon$.
We have proved that
$\D_N(\sigma_\ast\bx,\sigma_\ast\mu) \geqslant \D_N(\bx,\mu)$. Since $\sigma^2$
is the identity, we can repeat this argument with $\sigma_\ast\mu$ and
$\sigma_\ast\bx$ to get the other inequality, so the proof is complete.
\end{proof}
A technique we will use throughout this thesis involves comparing the
discrepancy of a sequence with the discrepancy of a pushforward sequence,
with respect to the pushforward measure.
\begin{lemma}\label{lem:push-discrepancy}
Let $I$, $J$ be closed connected intervals and $f\colon I\to J$ a continuous,
order-preserving (or order-reversing) bijection. If $\mu,\nu$ are probability
measure on $I$, then
$\D_N(\mu,\nu) = \D_N(f_\ast \mu,f_\ast\nu)$.
\end{lemma}
\begin{proof}
First we suppose that $f$ is order-preserving. Then for any interval
$[s,t)\subset I$, we know that $f[s,t) = [u,v)$ for some $u,v\in J$. It follows
that $|\mu[s,t) - \nu[s,t)| = |f_\ast\mu[u,v) - f_\ast\nu[u,v)|$, so
$\D_N(\mu,\nu) \geqslant \D_N(f_\ast\mu,f_\ast\nu)$. Similarly, for any
$[u,v)\subset J$, we know that $f^{-1}[u,v) = [s,t)$ for some
$s,t\in I$. It follows that
$|f_\ast\mu[u,v) - f_\ast\nu[u,v)| = |\mu[s,t) - \nu[s,t)$, which means
$\D_N(f_\ast\mu,f_\ast\nu) \geqslant \D_N(\mu,\nu)$.
If $f$ is order-reversing, then we may write $f$ as the composition of a
reflection and an order-preserving bijection. Combining Lemma
\ref{lem:flip-discrepancy} and the first part of this proof, we see that
$\D_N(f_\ast\mu,f_\ast\nu) = \D_N(\mu,\nu)$.
\end{proof}
Now we show that the discrepancy behaves as expected when two sequences are
interleaved.
\begin{definition}
Let $\bx$ and $\by$ be sequences in $[\vzero,\vinfty)\subset \bR^d$. We write
$\bx\wr\by$ for the interleaved sequence $(x_1,y_1,x_2,y_2,x_3,y_3,\dots)$.
\end{definition}
Write
$P_{\bx\wr\by,N} = \frac{1}{2} \left( P_{\bx,N} + P_{\by,N}\right)$ for the
combined empirical measure of the interleaved sequence $\bx\wr\by$.
\begin{theorem}\label{thm:wreath-seq}
Let $I$ and $J$ be disjoint open boxes in $[\vzero,\vinfty)$, and let $\mu$,
$\nu$ be probability measures on $I$ and $J$, respectively. Let $\bx$ be a
sequence in $I$ and $\by$ be a sequence in $J$. Then
\[
\max\{\D_N(\bx,\mu),\D_N(\by,\nu)\} \leqslant \D_N(\bx\wr\by, \mu+\nu) \leqslant \D_N(\bx,\mu) + \D(\by,\nu)
\]
\end{theorem}
\begin{proof}
Any half-open box in $[0,\vinfty)$ can be split by a hyperplane (parallel to a
coordinate hyperplane) into two disjoint half-open boxes
$[\va,\vb)\sqcup [\vs,\vt)$, each
of which intersects at most one of $I$ and $J$. We may assume that
$[\va,\vb)\cap J=\varnothing$ and $[\vs,\vt)\cap I = \varnothing$. Write
$A = [\va,\vb)$ and $S = [\vs,\vt)$. Then
\begin{align*}
\left| P_{\bx\wr\by,N}(A\cup S) - (\mu+\nu)(A\cup S)\right|
&\leqslant |P_{\bx,N}(A) - \mu(A)| + |P_{\by,N}(S) - \nu(S)| \\
&\leqslant \D_N(\bx,\mu) + \D_N(\by,\nu) .
\end{align*}
This yields the second inequality in the statement of the theorem. To see the
first, assume that the maximum discrepancy is $\D_N(\bx,\mu)$, and let
$[\vs,\vt)$ be a half-open box such that $|P_{\bx,N}[\vs,\vt) - \mu[\vs,\vt)|$
is within some arbitrary $\epsilon$ of $\D_N(\bx,\mu)$. Just as at the
beginning of this proof, use a hyperplane (parallel to a coordinate hyperplane)
between $I$ and $J$ to ``cut off'' the part of $[\vs,\vt)$ that
does not intersect $I$. Replacing $[\vs,\vt)$ with this smaller box, we may
assume it does not intersect $J$.
\begin{figure}[h]
\caption{The sets $I$, $J$, and $[\vs,\vt)$ when $d = 2$.}
\centering
\begin{tikzpicture}
\draw (0,0) rectangle (3,1);
\draw (1,2) rectangle (4,4);
\draw (-1,1.5) -- (5,1.5);
\draw (2,0.5) rectangle (4,1.5);
\node at (6,1.5) {hyperplane};
\node at (3.5,1) {$[\vs,\vt)$};
\node at (1,0.5) {$I$};
\node at (2.5,3) {$J$};
\end{tikzpicture}
\end{figure}
Assuming
$[\vs,\vt)\cap J=\varnothing$, we have
$\left|P_{\bx\wr\by,N}[\vs,\vt) - (\mu+\nu)[\vs,\vt)\right| = |P_{\bx,N}[\vs,\vt) - \mu[\vs,\vt)|$,
which yields the result.
\end{proof}
\section{Examples}
Historically, one of the first interesting examples of an equidistributed
sequence is the set of translates of an irrational number modulo one.
\begin{theorem}[Weyl, Sierpi\'nski, Bohl]
Let $a\in \bR$ be irrational. Then the sequence
$\bx=(a\mod 1,2a\mod 1,3a\mod 1,\dots)$ is equidistributed in $[0,1)$.
\end{theorem}
We will prove this result in Chapter \ref{chapter:irrationality-exponent}. It
is known, and we will prove, that
sequences of this form have discrepancy which decays roughly like
$N^{-\alpha}$, for some $\alpha\in \left(0,\frac 1 2\right)$ which controls the
``goodness'' of rational approximations of $x$. It is useful to have a sequence
whose discrepancy decays faster. The best known rate of decay is achieved by
the following example.
\begin{definition}\label{def:van-der-corput}
For $n\in \bN$, write $n$ in base $2$ as $n = \sum a_i 2^i$, and put
$v_n = \sum a_i 2^{-(i+1)}$. The \emph{van der Corput sequence} is
$\bv = (v_1,v_2,v_3,\dots)$.
\end{definition}
The van der Corput sequence has generalizations to other bases and higher
dimensions, but we will not use them. The discrepancy of the van der Corput
sequence has extremely fast convergence to zero.
\begin{lemma}[{\cite[Ch.~2 Th.~3.5]{kuipers-niederreiter-1974}}]
\label{lem:vdc-disc}
$\D_N(\bv) \leqslant \frac{\log(N+1)}{N\log 2}$.
\end{lemma}
By \cite[Ch.~2 Th.~2.3]{kuipers-niederreiter-1974}, this is (asymptotically)
the fastest rate of decay possible.
The van der Corput sequence is uniformly distributed (equidistributed with
respect to the Lebesgue measure). We can use the results of the previous
section to construct sequences equidistributed with respect to more general
measures.
\begin{theorem}\label{thm:van-der-corput}
Let $\mu$ be an absolutely continuous probability measure on an interval $I$.
Then there exists a sequence $\bx=(x_1,x_2,\dots)$ in $I$ such that
$\D_N(\bx,\mu) \ll \frac{\log(N)}{N}$.
\end{theorem}
\begin{proof}
Let $I = [a,b]$. We rephrase the proof of
\cite[Ch.~2 Lem.~4.2]{kuipers-niederreiter-1974} for our context. Let
$\bv = (v_1,v_2,\dots)$ be the van der Corput sequence (Definition
\ref{def:van-der-corput}). For each $n$, there exists $x_n\in I$ such that
$\cdf_\mu^{-1}[0,v_n] = [a,x_n]$. It follows that for any $t\in I$, $x_n < t$
if and only if $v_n < \cdf_\mu(t)$, and thus
\[
\left|P_{\bx,N}[a,t) - \mu[a,t)\right| = \left|P_{\bv,N}[0,\cdf_\mu(t)) - \cdf_\mu(t)\right| \leqslant \D_N^\star(\bv) .
\]
It follows from Lemma \ref{lem:vdc-disc} that
$\D_N^\star(\bx,\mu) \ll \frac{\log N}{N}$, hence
$\D_N(\bx,\mu) \ll \frac{\log N}{N}$.
\end{proof}
Now that we can construct sequences with discrepancy decaying rapidly (with
respect to a fixed measure $\mu$), we use the sequences with rapid discrepancy
decay to construct sequences whose discrepancy decays at any specified rate.
The $N^{-\alpha}$ in the following theorem could actually be specified by any
decreasing function of $N$ which converges to zero, but doesn't decay faster
than $N^{-1}$.
\begin{theorem}\label{thm:discrepancy-arbitrary}
Let $\mu$ be an absolutely continuous probability measure, supported on $I$,
whose cdf is strictly increasing on $I$. Fix $\alpha\in (0,1)$. Then there
exists a sequence $\bx=(x_1,x_2,\dots)$ such that
$\D_N(\bx,\mu) = \Theta(N^{-\alpha})$.
\end{theorem}
The proof is similar in concept to the proof that a conditionally (but not
absolutely!) convergent sequence may be rearranged to sum to any desired value.
We start with a van der Corput sequence with rapidly decaying discrepancy. Our
sequence begins by adding van der Corput elements until the discrepancy is
smaller than $N^{-\alpha}$, then repeatedly adds the same element to the end
of the sequence, pushing up the discrepancy until it is bigger than
$N^{-\alpha}$. There are two main difficulties. First, we need to show that
repeatedly adding the same element to the end of a sequence eventually forces
the discrepancy to increase, and that when doing this, the discrepancy does not
increase or decrease too rapidly.
\begin{proof}
Let $I = [a,b]$. If $\bx_{\leqslant N}$ is a sequence of length $N$, let
$\bx_{\leqslant N}: a^M$ be the sequence $(x_1,\dots,x_N,a,\dots,a)$ ($M$ copies
of $a$). We begin by showing that the discrepancy of $\bx_{\leqslant N}:a^M$
is eventually large relative to $N^{-\alpha}$. Recalling that $\mu\{a\} = 0$,
we have:
\[
\D(\bx_{\leqslant N}: a^M,\mu)
\geqslant \left| \frac{\#\{ n\leqslant N+M : x_n = a\}}{N+M} - \mu\{a\}\right|
\geqslant \frac{M}{N+M} .
\]
So for fixed $N$, if we add enough $a$'s to the end of $\bx_{\leqslant N}$, the
discrepancy $\D(\bx_{\leqslant N};a^M,\mu)$ will be larger than
$(N+M)^{-\alpha}$. On the other hand for $J = [s,t)\subset I$,
\begin{align*}
\left| P_{\bx_{\leqslant N}: a^M}(J) - P_{\bx_{\leqslant N}}(J)\right|
&\leqslant \frac{\left|\#\{n\leqslant N : x_n\in J\} + M - \frac{M+N}{N}\#\{n\leqslant N : x_n\in J\}\right|}{M+N} \\
&= \frac{\left|M - \frac{M}{N} \#\{n\leqslant N : x_n \leqslant t\} \right|}{M+N} \\
&\leqslant \frac{M}{M+N} ,
\end{align*}
which implies that
$\D\left(\bx_{\leqslant N}: a^M,\mu\right) \leqslant \D\left(\bx^N,\mu\right) + \frac{M}{M+N}$. This lets us control how rapidly the discrepancy can increase.
Let $\bv$ be the $\mu$-equidistributed van der Corput sequence of
Theorem \ref{thm:van-der-corput}, possibly transformed linearly to lie in
$[a,b]$. We know that $\D(\bv^N, \mu) \ll \frac{\log N}{N}$, which converges
to zero faster than $N^{-\alpha}$ since $\alpha\in \left(0, 1\right)$.
We construct the sequence $\bx$ via the following recipe. Start with
$(x_1 = v_1,x_2 = v_2,\dots)$ until, for some $N_1$,
$\D_{N_1}(\bx,\mu) < N_1^{-\alpha}$. Then set $x_{N_1+1} = a$,
$x_{N_1+2} = a$, \dots, until
$\D_{N_1+M_1}(\bx,\mu) > (N_1+M_1)^{-\alpha}$. Then set
$x_{N_1+M_1+1} = v_{N_1+1}$, $x_{N_1+M_1+2} = v_{N_1+2}$, \dots,
until once again
$\D_{N_1+M_1+N_2}(\bx,\mu) < (N_1+M_1+N_2)^{-\alpha}$. Repeat
indefinitely. We will show first, that the two steps are possible, and that
nowhere does $\D_N(\bx,\mu)$ differ by too much from $N^{-\alpha}$.
Note that $\frac{M+1}{N+M+1} - \frac{M}{N+M} \leqslant N^{-1}$. This tells
us that when we are adding $a$'s at the end of $\bx_{\leqslant N}$, the
discrepancy of
$\bx_{\leqslant N}: a^M$ is eventually increasing, and can increase by at most
$N^{-1}$ at each
step. So if $\D(\bx_{\leqslant N},\mu) < N^{-\alpha}$, we can ensure that
$\D(\bx_{\leqslant N}: a^M,\mu)$ is at most $N^{-1}$ greater than
$N^{-\alpha}$.
Moreover, we know that $\D(\bx_{\leqslant N}: a^1,\mu)$ is at most
$\frac{2}{N+1}$ away from $\D(\bx_{\leqslant N},\mu)$. So when adding van der
Corput elements to the end of the sequence, its discrepancy cannot decay any
faster than by $\frac{2}{N+1}$ per $a$ added. This yields
\[
\left|\D_N(\bx,\mu) - N^{-\alpha}\right| \ll N^{-1} ,
\]
which implies $\D_N(\bx,\mu) \sim N^{-\alpha}$, both of which are even
stronger than we need.
\end{proof}