-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathwork.html
882 lines (717 loc) · 44.4 KB
/
work.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!--
Design by Free CSS Templates
http://www.freecsstemplates.org
Released for free under a Creative Commons Attribution 2.5 License
Name : Pollinating
Description: A two-column, fixed-width design with dark color scheme.
Version : 1.0
Released : 20101114
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Gregory Szorc's Digital Home
</title>
<link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="/blog/feed" />
<link rel="alternate" type="application/atom+xml" title="Atom 1.0"
href="/blog/feed/atom" />
<link rel="stylesheet" href="/style/style.css" type="text/css" />
<link rel="stylesheet" href="/css/pygments_murphy.css" type="text/css" />
</head>
<body>
<div id="wrapper">
<div id="menu">
<ul>
<li><a href="/">Home</a></li>
<li><a href="/blog/">Blog</a></li>
<li><a href="/notes">Notes</a></li>
<li><a href="/work.html">Work</a></li>
<li><a href="/skills.html">Skills</a></li>
<li><a href="/thoughts.html">Thoughts</a></li>
<li><a href="/resume.pdf">Resume</a></li>
</ul>
</div>
<div id="page">
<div id="page-bgtop">
<div id="page-bgbtm">
<div id="content-wide">
<h1>Body of Work</h1>
<i>Last updated June 2019</i>
<p>This page contains information on my technical history. You can think of it
as a very descriptive resume.</p>
<h2>The College Years (2002-2006)</h2>
<p>In my 4 years at Case Western Reserve University, I dabbled into a number
of small projects and contributed to a few larger ones. Most of my efforts
were primarily focused on projects related to the university, but some
impacted people outside.</p>
<h3>Case Wiki and Related Extensions</h3>
<p>As a student employee of the university's IT team, I was responsible for
rolling at the <a href="http://wiki.case.edu/">Case Wiki</a>, a central and
unified wiki for the university. A designer prototyped the layout and I
created a MediaWiki skin for it.</p>
<p>While maintaining the wiki, I developed numerous MediaWiki extensions.
These were all released as open source. These extensions include:</p>
<ul>
<li>WikiFeeds - Provided enhanced ATOM and RSS feeds for MediaWiki. Provided
a secure feed for an individual's watchlist. Many of the features have
slowly been incorporated into the official MediaWiki code base. But, they
were exclusive to WikiFeeds for years.</li>
<li>OpenSearch - Provided OpenSearch support for MediaWiki. OpenSearch has
since been incorporated into MediaWiki.</li>
<li>Graph Data Structure and Visualization - Implemented a MediaWiki library
to create graph data structures from wiki content (e.g. articles are nodes,
links, etc are edges). Wrote an extension that turned this structure into
Graphviz markup and rendered it into pages. Also wrote an extension for
Touchgraph visualization of the same data.</li>
<li>AJAX Watcher - Exposed real-time wiki changes without page refreshing via
AJAX calls. I believe MediaWiki has since incorporated something similar.
</li>
<li>Farmer - Turns MediaWiki into a wiki farm from a single install.</li>
</ul>
<h3>University Single Sign-On</h3>
<p>While working for the university, I deployed
<a href="http://www.jasig.org/cas">CAS</a> as a single sign-on service for
the entire university. Previous to its rollout, individual sites were taking
login credentials from HTML forms, HTTP basic auth, etc and trying to log in
to the university's LDAP server.</p>
<p>Initially, very few services used CAS. Over time, more and more people
got on board and I'm pretty sure that now almost every site at the university
uses it.</p>
<h3>opensource.case.edu</h3>
<p>A few friends and I thought it would be a great idea for people at the
university to have a central service that hosted projects. We deployed
an instance of <a href="http://trac.edgewall.org/">Trac</a> at
opensource.case.edu (which was initially hosted on my personal Linux desktop
machine). Students, staff, and faculty slowly but surely added projects there.
Keep in mind this was before Github, when SourceForge was king. But, with
SourceForge, you couldn't integrate with the university's authentication
system, easily show community involvement, etc.</p>
<p>Unfortunately, after I left the university, the server opensource.case.edu
was hosted on had a catastrophic hardware failure and data was lost and the
service is no longer available. I had transitioned ownership of the site before
that happened, so I like to think that the data loss was not my fault.</p>
<h3>Undergraduate Student Government</h3>
<p>I was part of a small team hired by Undergraduate Student Government to
implement web site features for that group. This included an election
voting system and a custom management system used by members of USG to help
them organize bills, events, etc.</p>
<h3>Course Review Statistics</h3>
<p>At the time I was at the university, there was a very crude system in place
for performing course and instructor evaluations. During class, students would
fill out cards with a #2 pencil with rankings of various aspects of the course
and instructor. These were compiled by the university and made available as
text files on a central server. These were exposed through a web site with
only a crude search interface.</p>
<p>I wrote a tool that scraped all data (I believe it went back 10+ years),
imported it into a database, then exposed the data, complete with pretty
graphs, via a web site. This alone was a major improvement over the existing
system. From there, I built out extra features, such as aggregation of an
individual instructor's ratings over all courses, including all of time. This
made it easier for students to determine which instructors were likely to
teach better and thus what course offering to sign up for.</p>
<p>Feedback was generally positive, especially from the students. However,
some didn't care for the site because it exposed information they didn't want
to see (the graph of average ratings per department apparently drew the
attention of the administration). Some wanted me to take down the site. But,
I argued that I was just using the data already available by the university in
new and creative ways. With backing from USG, I won out.</p>
<h2>Tellme / Microsoft (Summer 2006, January 2007 - June 2010)</h3>
<p>During my summer break senior year in college, I was an intern at Tellme
Networks, which at the time was a successful startup in the speech recognition
and telephone space. I loved every minute of it.</p>
<h3>VoIP Telephone Provisioning</h3>
<p>My summer internship project was to design and implement a provisioning
system for VoIP telephones. The system consisted of a dynamic TFTP server
(I rolled my own TFTP server because no TFTP servers had dynamic hooks and
the protocol is very simple to implement), dynamic HTTP server to process
requests from telephones, a backend store, and a web administration interface.
</p>
<p>When completed, one could simply plug in any supported Cisco or Polycom
VoIP telephone into the ethernet network and the phone would download the
latest firmware and bootstrap with a suitable configuration. For phones that
weren't provisioned, their initial configuration was configured to dial a
VXML application hosted on Tellme's platform when the receiver was lifted.
This application talked to the company LDAP server, prompted the person for
his or her name, looked up information in the directory, and updated the phone
provisioning system with the information. The phone would then reboot, picking
up the user's configuration, complete with saved dialing preferences, etc.</p>
<p>By far the coolest feature of the system was the ability via the admin
interface to remotely reboot all connected phones. Combined with a custom
bootup chime and an open office space, it was quite amusing listening to 20+
phones play simple jingles simultaneously!</p>
<p>I must have done a good job with the project and generally impressed people,
because I was given a full-time offer at the end of my internship. I accepted,
finished my senior year at college, and started full-time at Tellme in January
2007.</p>
<h3>www.tellme.com and m.tellme.com</h3>
<p>At one time, I was maintaining www.tellme.com and m.tellme.com. These were
both written in PHP and were utilizing the <a href="http://framework.zend.com/">
Zend Framework</a>. Although, since 90%+ of the content on www.tellme.com was
static, there was aggressive caching in place such that most page views did
not involve much PHP, keeping server load down. As part of this role, I worked
closely with the creative team for designs and graphics and with marketing to
manage content.</p>
<p>At one time m.tellme.com hosted a download component for Tellme's mobile
application, which ran on most BlackBerry devices and other misc phones. One
of the most frustrating experiences I have had to date was verifying that the
site worked on all these devices. Each device had different support for
rendering web sites. On top of that, you were dealing with different
resolutions. Keep in mind this was before modern browsers came to mobile
devices. Supporting these all concurrently was a real chore. Even the
identification part was a nightmare, as phones could only usually be identified
by the User-Agent HTTP request header or by the presence of some other
non-standard header. And, this wasn't always consistent across carriers or
guaranteed. What a nightmare! I'm glad I don't have to worry about this
any more.</p>
<p>Since the assimilation of Tellme by Microsoft, www.tellme.com and
m.tellme.com are no longer available. Fortunately, the Internet Archive
<a href="http://replay.web.archive.org/20100523111225/http://www.tellme.com/">
contains</a> a semi-working snapshot.</p>
<h3>Deployment of Subversion</h3>
<p>When I started at Tellme, CVS was the lone version control system for
engineering. (I even think one group was still on RCS!) I asked around
about why a more modern version control system wasn't in use, and nobody
seemed to have a good answer (I think the common answer was "because nobody
has set one up").</p>
<p>After consulting a lot of peers, it was decided that Subversion was the
best fit for a new, supported VCS. It wasn't that Subversion was the best
of breed (I believe Perforce arguably was at the time). It was chosen because
it was very similar to CVS (their motto at the time was <i>a compelling
replacement for CVS</i>), it was free and open source, and had a great hooks
system.</p>
<p>I worked with the operations team to deploy Subversion and announced it to
the company. Slowly but surely, people started using it. Usage increased
dramtically after I did a brownbag presentation on Subversion and its merits
over CVS and I started to roll out custom hooks to supplement functionality.
Teams loved the ability to receive emails on commits, close bugs (via a
special syntax in the commit messages), require buddies on commit (also via
special syntax in commit messages). This all required special commit hooks,
of course. Unfortunately, those were developed after Tellme became Microsoft,
so they will likely never see light outside company walls.</p>
<p>At some point, I lobbied someone in the engineering org to splurge for a new
server for hosting the source code services. I worked with the operations team
to get that installed. And, I received a lot of hacker cred for making the
version control systems 5x faster (which was a big deal for some of the teams,
with their large trees that took a couple of minutes to switch branches with
CVS).</p>
<h3>Weekly Brownbags</h3>
<p>At some point in my time at Tellme/Microsoft, I organized a weekly brown
bag series. Every Wednesday, individuals or small groups would present topics
to whoever was in attendence. These were often engineer-centric, but anyone
could and did present.</p>
<p>I seeded the sessions with presentations by myself and a few select people.
After that, I either had people contact me to be added to the calendar or I
sought out and encouraged people to present. Every week, I also recorded
the sessions and made them available on the intranet.</p>
<h3>Web-Based Code Review System</h3>
<p>I believe sometime in 2009, I grew frustrated with the way we were doing
code reviews. The act of looking at plain text diffs and recording comments
in email or similar just felt out-dated. When others felt the same, I started
researching solutions. I heard Google had a pretty nifty one, but it was only
partially available to the public. I decided on <a href="http://www.reviewboard.org/">
Review Board</a> and deployed it. It became an overnight success with almost
immediate adoption by every team. Great tool.</p>
<h3>Go-To Person</h3>
<p>In my time at Tellme/Microsoft, I became a go-to person in the org for
questions on Perl, mod_perl, Apache HTTP Server, compiling packages, and
other misc topics. The first two sort of scared me at the time because when I
started at Tellme, I knew almost nothing about Perl. Somehow, I became good
enough that I could dole out advice and answers and know what I was doing in
code reviews.</p>
<p>For the Apache HTTP Server, I was very familiar with the internals, since I
wrote a couple of C modules while at Tellme. As such, all the weird bugs or
questions about server behavior seemed to always come my way. When a team was
deploying a new site or service, I would often be the one reviewing the
configs or lending advice on how to configure it.</p>
<p>For compiling packages, I always took satisfaction out of wrangling things
to compile on Solaris x86 (although, we ran a typical GNU toolchain as opposed
to the Sun stack, so it wasn't as bad as it sounds). I was also pretty adamant
about packages being built properly (e.g. considering which libraries were
statically and dynamically linked, defining proper linker flags to foster
easier linking in the future, getting an optimized binary, etc). So, people
would often come to me and ask me to review the compilation procedure.</p>
<h3>Server Management</h3>
<p>For most of my time at Tellme, I was a member of an about 15 person team
which wrote and maintained many of the Tellme/Microsoft-branded VXML
applications and the services they directly required. (VXML applications often
consist of static VXML documents and dynamic web services with which they
talk at run-time.) One of my major roles on the team was managing the servers
these resided on.</p>
<p>This role involved numerous responsibilities. First, I needed to be
familiar with all aspects of the dynamic services so I would know how to
triage them, make informed decisions about scalability, etc. I was often
developing these services myself. When I wasn't, I typically became informed
by being part of the code reviews. I would often address concerns around
scalability and failure, such as timeouts to remote services, expected
latency, ensuring adequate monitors were present and tested, etc.</p>
<p>Once the services were in my court, I needed to figure out where to host
them. We had a number of different server pools from which to choose. Or,
there was always the option of buying new hardware, but you want to keep
costs down whenever possible, of course. I had to take into consideration the
expected load, security requirements, network ACL connectivity requirements,
technology requirements, etc. This was always a complicated juggling act. But,
people knew that I would get it done and things would just work.</p>
<p>As part of this role, I interacted closely with the operations team and
the NOC. The operations team would be involved whenever new servers were
needed, new ACLs were to be deployed, etc. At a large company, this was often
a formal process, which required working with various project and product
managers, navigating political waters, etc. I could talk the lingo with the
operations people, so having me always act as the go-between was effective at
gettings things done. As for the NOC, they were involved any time we changed
anything. The operations mindset is to achieve 99.999% uptime on
everything. As part of that, any change is communicated and planned well in
advance along with the exact procedure to be performed. I knew most of the
individuals in the NOC and they knew me.</p>
<h3>Logging System Improvements</h3>
<p>Sometime in 2009, I set out to make improvements to Tellme's proprietary
event logging and transport infrastructure. The first part of this involved
writing a new C library that performed writing. When I started, there existed
the initial C++ implementation and feature-minimal implementations in Perl
and a few other languages. However, the C++ implementation required a sizeable
number of external dependencies (on the order of 40MB) and interop with C
was difficult. People in C land found themselves in dependency hell and were
cursing the computer gods.</p>
<p>I implemented a C library from scratch, using the C++ code as a guide, but
only for the low-level details of the protocol, which weren't documented
well outside of the code. The initial version only supported writing. It
was free of external dependencies, compiled on both Solaris and Windows, and
weighed in at a svelt 50kB.</p>
<p>A few months later, an individual came to me and said something along the
lines of, "I really love the simplicity of your library. But, I need to perform
reading. Can you help me?" So, I implemented reading support in the library.
It turned out that the reading library was much more efficient than the C++
one. And, for what my colleague was using it for (iterating over hundreds of
gigabytes of data), this made a huge difference! What followed was a lot of
low-level performance optimization to maximize the reading throughput of the
library. A stack instead of dynamic allocation here, a register variable
there, all contributed to significant performance improvements to the already
insanely fast library. Towards the end, we were in the territory of the x86
calling convention hurting us more than any other part.</p>
<p>I achieved great satisfaction on this project. And, a lot of it was after
it was complete (and I had even left the company). People were using the
library in ways I had not foreseen (like in C# or on non-server devices). There
was even talk of shipping the library as part of an external release, which
never would have been possible with the C++ version because it linked to open
source software and Microsoft is very sensitive to that (sadly). A few months
after I left the company, somebody emailed me and said something along the
lines of, "your XXX library is the best C library I have ever seen. I wish
all C libraries were this easy to use and would 'just work.'" Sadly, the
library is Microsoft IP, so I can't share it with you. But, I can refer you
to people who have used it!</p>
<h2>Xobni (July 2010 - May 2011)</h2>
<p>It is my understanding that Xobni hired me to bring experience to their
at-the-time frail cloud system, Xobni One (now known as Xobni Cloud). When I
joined, their operations procedures seemed like they were out of the wild west
(at least this was my perspective come from Tellme/Microsoft). System
monitoring for the product I was to beef up consisted of someone looking at
some Graphite graphs, noticing a change in the pattern, and investigating.
We could do better.</p>
<h3>Better Monitoring</h3>
<p>One of my first acts at Xobni was to deploy a real monitoring system. We
decided on <a href="http://www.opsview.com/">Opsview</a> because it is built
on top of Nagios, a popular and well-known monitoring system (although not
one without its flaws) and offers a compelling front-end for managing Nagios,
which is typically a real chore.</p>
<p>I set up Opsview on a central host, deployed NRPE on all the hosts, and
started monitoring. Email alerts were configured and all were happy we now
knew in near real time when stuff was breaking. At one point, I even had a
custom Nagios notification script that used the Twilio API to call people
with alert info. But, we didn't pursue that further. Cool idea, though!</p>
<h3>Better Metrics</h3>
<p>When I arrived, only minimal server metrics were being collected. While
they were being fed to <a href="http://graphite.wikidot.com/">Graphite</a>,
a compelling replacement for RRDTool, they were being collected via an
in-house Python daemon and any new collection required custom plugins. There
exist many open source tools for metrics collections, so I swapped out the
custom code for <a href="http://www.collectd.org/">Collectd</a>, which I think
is an excellently-designed collection system. (I like Collectd because of
its plugin system and ability to write plugins in C, Java, Perl, and Python.)
</p>
<p>The only drawback was Collectd used RRD out of the box. We used that for
a little bit. But, we quickly missed Graphite, so I
<a href="https://github.com/indygreg/collectd-carbon">wrote</a> a Collectd
plugin that writes data to Graphite instead of RRD.</p>
<p>At the end of the day, we were recording more metrics with less effort
and had much more data to back up our decisions.</p>
<h3>Xobni Cloud Scalability Improvements</h3>
<p>When I joined Xobni, Xobni Cloud was still considered beta and was wrought
with stability issues. The overall architecture of the service was fine, but
things were rough around the edges. When I first started, I believe the
service had around 500 users and was crashing daily. We could do better.</p>
<p>One of my first contributions was to learn how Cassandra worked and how to
make it run faster. A well-tuned Cassandra running on top of a properly
configured JVM makes a world of difference. We saw tons of improvement by
experimenting here.</p>
<p>One of my first major code contributions to the product was to transition
to <a href="https://code.google.com/p/protobuf/">Protocol Buffers</a> for data
encoding. Previously, the system was storing JSON anywhere there was
structed data. Switching to Protocol Buffers cut down on CPU (most of the CPU
savings were because the JSON library was utilizing reflection). More
importantly, it reduced our storage size dramatically. And, the less data
stored means less work for the hard drives.</p>
<p>On the failure resiliency side of the scaling problem, I replaced the
<a href="http://kr.github.com/beanstalkd/">Beanstalk</a>-based queue system
with jobs stored in Cassandra. I have nothing against Beanstalk. But, in our
architecture, Beanstalk was a central point of failure. If the server died
and we couldn't recover Beanstalk's binary log on disk, we would have lost
user data. We did not want to lose user data, so we stored the queue in a
highly-available data store, Cassandra. I also made a number of application
changes that allowed us to upgrade the product without incurring any
downtime. (When I joined, you had to turn the service off when upgrading
the software.)</p>
<p>Another change that saw significant performance gains was moving the
Cassandra sstable store to <a href="https://btrfs.wiki.kernel.org/index.php/Main_Page">
btrfs</a>, a modern Linux filesystem. The big performance win came not from
the filesystem switch itself, but from enabling filesystem compression. At
the expense of CPU (which we had plenty of on the Cassandra nodes), we cut
down significantly on the number of sectors being accessed, which gave
Cassandra more head room. I would have preferred using ZFS instead of btrfs
because it is stable and has deduplication (which I theorize would help
Cassandra because of its immutable sstables which carry data forward during
compactions). But, we weren't willing to switch to OpenBSD or OpenSolaris to
obtain decent ZFS performance.</p>
<p>There were a number of smaller changes that all amounted to significant
performance gains. But, they are difficult to explain without intimate
knowledge of the system. I will say that at the end of the day, we went
from crashing every day on 500 users to being stable for weeks on end and
serving over 10,000 users on roughly the same hardware configuration.</p>
<h2>Mozilla (July 2011 - December 2018)</h2>
<p>I started working for Mozilla on July 18, 2011 as part of the Services
team. The Services team is responsible for writing and maintaining a
number of Mozilla's hosted services. When I started, the main service was
Firefox Sync, but a number of other services were in the pipeline.</p>
<h3>Firefox Sync</h3>
<p>I was hired to work on Firefox Sync for the desktop version of Firefox.
The Sync client in Firefox is authored primarily in JavaScript, a programming
language I only knew casually before working at Mozilla.</p>
<p>My first major contribution to Sync/Firefox was add-on sync, which keeps
your add-ons in sync across devices. It shipped as part of Firefox 11.
Add-on sync was a heavily requested feature, so I felt quite good about
shipping it.</p>
<p>As part of working on Firefox Sync, I learned a lot about various
Firefox components and how they work. I also expanded my syncing
knowledge (the product I worked on at Xobni was essentially contact
sync) to cover scenarios where synchronization must be performed in a
distributed manner on clients (Firefox Sync data was encrypted and the
server only sees an opaque blob). I also learned a bit about
cryptography. Some of my understanding of this space is demonstrated
in this <a href="/blog/2012/04/08/comparing-the-security-and-privacy-of-browser-syncing">
blog post</a> summarizing the security of various browser syncing
implementations.</p>
<h3>Firefox Health Report</h3>
<p>I was the lead implementor on the Firefox desktop implementation of
Firefox Health Report, a feature that collects data from every Firefox
install and sends it to Mozilla (in a privacy-conscious manner, of
course).</p>
<p>Firefox Health Report (or FHR) was a huge project. It's not every day
that you have the opportunity to write a feature that will be used by
over 100 million people! FHR was also one of those projects that had a
lot of interest from management and leadership. FHR was going to be the
first time Mozilla collected so much data from every Firefox user by
default. Up to that point, Mozilla collected very little data from its
entire user base. What collection existed was opt in (Telemetry) and
thus had low activation rate or collected very little data at all (before
FHR, Mozilla measured active daily users by counting the number of
<i>update ping</i> requests (HTTP requests sent by Firefox to see whether
a new Firefox release and/or add-on versions are available). FHR was a
huge change of direction and a lot of people from metrics/statisticians,
security, privacy, performance, etc all wanted a seat at the table.
There were a lot of cooks in the kitchen and I got to feel what it was
like to be in that position at Mozilla. It was a learning experience
to say the least.</p>
<p>Firefox Health Report consists of a background Firefox service
implemented in JavaScript. It stores data in SQLite. The source code
is <a href="https://hg.mozilla.org/mozilla-central/file/1f932e462b84/services/healthreport">
available</a> for reference.</p>
<p>As part of implementing FHR, I was troubled by the lack of a clean
and useful SQLite API accessible to Firefox. So, I wrote
<a href="https://hg.mozilla.org/mozilla-central/file/1f932e462b84/toolkit/modules/Sqlite.jsm">
Sqlite.jsm</a>, a standalone module for interacting with SQLite from
JavaScript. You can read more about Sqlite.jsm on
<a href="/blog/2013/04/14/sqlite.jsm---sqlite-done-betterer/">my blog</a>
and in the <a href="https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/Sqlite.jsm">
online documentation</a>. Sqlite.jsm was very well received because it
took away most of the footguns associated with SQLite interaction and
provided compelling features such as task/generator based transactions
and easier memory management.</p>
<h3>Firefox Build System</h3>
<p>In my early days at Mozilla, I grew frustrated with the build system (as
most Mozilla people do). So, I did what curious engineers often do and
started digging deeper into the rabbit hole. One thing led to another
and I eventually became the module owner (Mozilla speak for the person
with governance responsibility over something).</p>
<p>When I first got involved with the build system, Firefox was built from
over 1000 Makefiles. We employed what is called recursive make. Essentially,
you have a tree of Makefiles which is iterated upon. This technique is
far from efficient. There's even a
<a href="http://aegis.sourceforge.net/auug97.pdf">Recursive Make Considered
Harmful</a> paper explaining it.</p>
<p>Everyone knew the build system sucked and needed to be improved. But
nobody knew what to do or had the will to tackle a major change. I spent
a lot of time looking at the problem space, experimenting with various
solutions. I wrote up a
<a href="/blog/2012/06/25/improving-mozilla's-build-system/">very detailed
blog post</a> detailing a transition plan. After much technical deliberation,
we <a href="/blog/2013/02/28/moz.build-files-and-the-firefox-build-system">
adopted a plan</a> to use sandboxed Python files to define our build
configuration. At the time, I thought this was a new and novel idea. I
thought it was somewhat risky because it had never been tried before. I later
learned that the solution was effectively invented at Google years
before. The Google project is called Blaze and the approach has been copied
by Twitter's Pants build tool, Facebook's Buck build tool, and eventually
Chromium's CN tool. I essentially independently arrived at the same solution
that Google did and that felt pretty reassuring!</p>
<p>Over time, the Firefox build config data was slowly transitioned to
moz.build files. We couldn't do this atomically in a flag day because it
would be too much work. Instead, we moved things over and continued to emit
Makefiles behind the scenes. Where we could, we would consolidate data from
various parts of the source tree together and emit optimized build rules.
moz.build files enabled us to do things we couldn't do with recursive make
and enabled us to build Firefox more efficiently.</p>
<h3>Improving the Development Experience and Developer Productivity</h3>
<p>From my first days contributing to Firefox, I was frustrated at how
difficult everything was. There were so many hurdles preventing people
from getting started and once you got up and running, there were so
many tasks that were non-intuitive. At Mozilla, I was on a never-ending
quest to improve the developer experimence and to make developers more
productive.</p>
<p>One of my major contributions to Firefox development is a tool called
<i>mach</i>. Mach is effectively a command-line command dispatcher. You
register commands and it runs them. Simple, right? Before mach came along,
Firefox developers had to run over a dozen different commands to perform
common tasks. The command locations and their options were non-intuitive.
Mach fixed all of that.</p>
<p>I first blogged about Mach in
<a href="/blog/2012/05/07/improving-the-mozilla-build-system-experience/">May 2012</a>.
I think that post details some of the empathy I feel towards new contributors
and onboarding new members. After an uphill battle where I couldn't find
someone willing to buy-in to my vision and allow mach to be checked into
the tree, mach <a href="/blog/2012/09/26/mach-has-landed/">finally landed</a>
in September 2012. In the time since, mach has increased in popularity and
gained dozens of commands. It's now used by most developers and I commonly
hear things like "I can't believe we lived in a world without mach
for so long!".</p>
<p>Onboarding new contributors has always been important to me. When I
started at Mozilla, if you want wanted to build Firefox, you needed to
install all the build dependencies manually. You did this by following
instructions on a wiki that were frequently out of data. I think it's
inefficient to perform actions that can be automated, so I
<a href="/blog/2012/09/18/bootstrap-your-system-to-build-firefox/">wrote
a tool to automate it</a>. People can now type a one-liner into the shell
to configure their system to build Firefox.</p>
<p>There was a growing movement at Mozilla in 2012 and 2013 to use Git
for developing Firefox. (The canonical repository is Mercurial.) A tool
called <a href="http://hg-git.github.io/">hg-git</a> was being used to
allow developers to convert Mercurial commits to and from Git commits.
A major complaint was it was too slow. So, I started learning a lot about
Mercurial and Git's internals and set about to improve it. The results
<a href="/blog/2013/04/14/making-hg-git-faster/">speak for themselves</a>.
</p>
<h3>Improving Mozilla's Automation</h3>
<p>When I started at Mozilla, it was clear that Mozilla's build and testing
automation had a lot of potential to grow. I've been casually involved
in making it better.</p>
<p>One of the things about Mozilla's automation that troubled me a lot was
the lack of machine readable output. For example, to determine whether a
test job was successful, we would parse the log output and look for certain
strings using regular expressions. This is a very fragile process and it
was prone to breaking and made it difficult to change output without
breaking the parser. I wrote a
<a href="/blog/2012/12/06/thoughts-on-logging---part-1---structured-logging/">
blog post on structured logging</a> and later worked with the automation
team to integrate that approach into our testing automation. I even mentored
a summer intern in 2013 who had this as his chief project. As of April 2014,
things are still moving forward and Mozilla is on the trajectory of emitting
machine-readable data from automation. I can't wait for that day to come.</p>
<p>Similar in vein to lack of machine readable output from automation was
the lack of data being captured at all. For example, Mozilla was not recording
system resource usage (CPU, I/O, memory, etc) and thus was not measuring how
efficient our automation was. The optimization engineer with server-side
experience in me tells me that you should try to get 100% out of your
servers or you are wasting money. So, I
<a href="/2013/07/14/quantifying-mozilla's-automation-efficiency/">patched
our automation code to record system resource usage</a>.</p>
<p>I also built some tools for analyzing Mozilla's automation data.
One tool <a href="/blog/2013/08/30/visualizing-mozilla's-release-infrastructure-machine-efficiency/">
visualized the efficiency of every machine in automation</a>. Although the
tool no longer is live, it was used to show people that a lot of the money
we were spending on machines was being wasted. It turned some heads. I also
<a href="/blog/2013/04/01/bulk-analysis-of-mozilla's-build-and-test-data/">
wrote a tool</a> that aggregated and allowed analysis of bulk automation
data.</p>
<h3>Version Control Geek and Maintainer</h3>
<p>Somehow I became a version control geek during my time at Mozilla.
It likely started with my
<a href="/blog/2013/04/14/making-hg-git-faster/">hg-git
optimization work.</a> I think what captivated me was the scaling problems
within both Mercurial and Git. I was also writing a lot of Python at the
time and was also captivated by Mercurial's extensibility and hackability.
I <a href="/blog/2013/05/12/thoughts-on-mercurial-(and-git)/">wrote about
the topic</a>.</p>
<p>It was the summer of 2013 that I became a Mercurial convert. I used to
loathe working with Mercurial (preferring Git instead). With what I know
now, pretty much the only reasons I'd use Git are for GitHub and because
most everyone seems to know Git these days. Those are big reasons. But
when it comes down to your version control system as a tool, Mercurial
wins hands down.</p>
<p>My casual interest in version control somehow culminated with me becoming
the maintainer of hg.mozilla.org and a lot of code and services at Mozilla
that interact with version control.</p>
<p>Before I became involved in Mercurial development and server operations,
Mozilla largely functioned as a downstream consumer of Mercurial. There
wasn't a lot of communication from Mozilla to upstream Mercurial. I changed
that, becoming a liaison of sorts between the two. I started identifying
pain points at Mozilla and started contributing changes upstream to mitigate
them. I helped communicate the needs and challenges of Mozilla to Mercurial
so they could be considered as part of product development. It is now common
for patches impacting Mercurial performance to use the Firefox repository
for measurements.</p>
<p>As part of maintaining hg.mozilla.org, I established automated tests
for the infrastructure, something that was lacking before. We not only have
lower-level unit tests, but also have a container-based environment for
running integration tests. We spawn a cluster of containers using the same
Ansible playbooks for provisioning production servers and run tests against
them. Tests hit actual SSH servers and other services rather than mocked
endpoints. Many have commented that it is one of the most comprehensive
test environments they've seen and are in awe that you can do all that
without deploying to remote servers.</p>
<p>Before a robust testing infrastructure was in place, there was a culture
of FUD around deploying updates to hg.mozilla.org. We updated Mercurial
rarely (maybe once a year) because we thought things may break. We were
scared to touch such critical infrastructure. We are now capable of
deploying multiple times a day and minutes after a change lands.</p>
<p>As part of improving the robustness of hg.mozilla.org, I implemented a
version control replication system built on top of Apache Kafka. This
solved a lot of pain points related to slow pushes, out-of-sync mirrors,
notification of changes, etc. This gave us much more flexibility for
our server architecture, such as allowing us to incur individual
machine downtime with confidence that things will automatically re-sync.
The aforementioned testing infrastructure runs a Kafka cluster in containers,
allowing us to test failure scenarios. Distributed systems are complex beasts
and this testing is invaluable.</p>
<p>I've also spent a lot of time overhauling how processes interact with
version control. There are scaling problems for repositories of Firefox's
size. Simple interaction strategies that just clone all the time just don't
work because they are wasteful. Our automation infrastructure has intelligent
use patterns and caching to facilitate optimal version control interaction
so work completes sooner.</p>
<p>Another considerable set of projects have revolved around syncing
commits between version control repositories. The Firefox repository
acts as a monorepo and has numerous sub-projects and vendored projects
that are developed externally from core Firefox. There are various
systems to keep these projects in sync.</p>
<h3>Learning Python</h3>
<p>It was at Mozilla that my Python knowledge developed from intermediate to
I'd say pretty advanced. A lot of my projects for Firefox's build system
and automation are written in Python. Mercurial is Python.</p>
<p>I even got <a href="http://developers.slashdot.org/story/14/01/09/1940232/why-do-projects-continue-to-support-old-python-releases">
Slashdotted</a> writing about Python.</p>
<h3>Misc</h3>
<p>I was a <a href="/blog/2013/05/19/using-docker-to-build-firefox/">
very early fan of Docker</a>. When it came out, I immediately saw the
potential for use in Mozilla's automation infrastructure. I went over
to dotCloud (now Docker Inc) to discuss Mozilla's use cases very early
in Docker's lifetime. For a while, a quote of mine was in the main
<i>Learn about Docker</i> slide deck that was prominently featured on
Docker's website!</p>
<p>My <a href="/blog/2012/07/18/one-year-at-mozilla/">One Year at Mozilla</a>
is worth reading.</p>
<p>Pretty much all my blog posts from 2012 and on are related to Mozilla
in one way or another.</p>
<h2>Airbnb (April 2019 - present)</h2>
<p>I joined the Developer Productivity team at Airbnb in April, 2019. My
primary job role is to help support source control and build system
infrastructure at Airbnb. At the time I wrote this in June 2019 I was
still very new at Airbnb and didn't have any meaningful job activity
to report. This will presumably change over time...</p>
<h2>Outside of Full-Time Job (Over All Time)</h2>
<h3>Clang Python Bindings</h3>
<p>I have contributed significant patches to Clang's Python bindings.
The Clang Python bindings allow you to examine the token stream and
AST that Clang generates. It is my understanding that the Clang Python
bindings are heavily used in the science and research arenas, where people
are using higher-level tools for examining source code.</p>
<h3>Zippylog</h3>
<p><a href="https://github.com/indygreg/zippylog">Zippylog</a> is a high
performance stream processing platform. I started the project in late July
2010 and hack on it when I have time.</p>
<p>I started the project to solve what I thought was a gap in the market. I
also wanted to start a personal project to assess what my skills were as an
individual developer. And, since I was working behind Microsoft's walls, where
open source contributions were difficult to swing, I wanted to do something
in the open for all to see.</p>
<h3>Lua Support for Protocol Buffers</h3>
<p>My Github project,
<a href="https://github.com/indygreg/lua-protobuf">lua-protobuf</a>
integrates the programming language Lua with Google's Protocol Buffer
serialization format. Both are extremely fast, so it is a marriage that needed
to happen.</p>
<p>I started the project because I wanted to consume protocol buffers in Lua
from within zippylog.</p>
<p>Interestingly, lua-protobuf is a Python program that generates C/C++ code
that provides Lua scripts access to protocol buffers. Yeah, that makes no
sense to me either, but it works.</p>
<h3>Graphite/Carbon Writer for Collectd</h3>
<p><a href="http://www.collectd.org/">Collectd</a> is an excellent metrics
collection and dispatching daemon.
<a href="http://graphite.wikidot.com/">Graphite</a> is a great data recording
and visualization tool. I decided to marry them by writing a Collectd plugin,
<a href="https://github.com/indygreg/collectd-carbon">collectd-carbon</a>,
which writes values to Graphite/Carbon. (Carbon is the name of the network
service that receives values.)</p>
<h3>Clang Python Bindings</h3>
<p>I've made contributions to Clang's Python bindings. These bindings
allow you to consume the Clang C API (libclang) using pure Python (via
ctypes).
<h3>Mercurial Contributions</h3>
<p>I am a significant contributor to the Mecurial open source version control
system.</p>
<p>I serve on the Mercurial Steering Committee, which is the governance group
for the Mercurial Project. I also have reviewing privileges, allowing me
to accept incoming patches for incorporation in the project.</p>
<p>My Mercurial contributions are too numerous to enumerate. A
<a href="https://www.mercurial-scm.org/repo/hg/log?rev=author(szorc)&revcount=10000">
full list</a> can be viewed. Significant contributions include:</p>
<ul>
<li>Support for seeding clones from pre-generated bundles (saves Mozilla
~1 PB/day in server-originated data and is also used by Bitbucket)</li>
<li>Zstandard compression support</li>
<li>JSON API support on HTTP server</li>
<li>Various performance improvements for multiple facets of operation</li>
<li>Python 3 porting</li>
<li>Rewriting and updating security code so Mercurial is more secure by
default</li>
</ul>
<p>The <a href="https://gregoryszorc.com/blog/category/mercurial/">Mercurial
category</a> on my blog also draws attention to work I've done.</p>
<h3>Python Zstandard Bindings</h3>
<p><a href="https://github.com/indygreg/python-zstandard">python-zstandard</a>
is a Python package providing high-quality, high-performance, and
fully-featured bindings to the Zstandard compression library.</p>
<p>I started the project a few days after Zstandard 1.0 was released. This
followed interest in Zstandard stemming from conversation with engineers
at Facebook who were extolling its virtues for months leading up to the
release. I wrote more about Zstandard in my
<a href="/blog/2017/03/07/better-compression-with-zstandard/">Better
Compression with Zstandard</a> post, which made the rounds and racked up
50,000+ views.</p>
<p>python-zstandard was the first time I seriously dabbled in Python C
extensions and <a href="https://pypi.python.org/pypi/cffi">CFFI</a>. Needless
to say, I learned a lot, including a deeper understanding of CPython's
internals, which I've found quite valuable in enabling me to become a
better Python programmer.</p>
<p>python-zstandard is bundled with Mercurial. It is also used at production
in a few companies, including Facebook.</p>
</div>
</div>
</div>
</div>
<div id="footer">
<hr/>
<p>Copyright (c) 2012- Gregory Szorc. All rights reserved. Design by <a href="http://www.freecsstemplates.org/"> CSS Templates</a>.</p>
</div>
</div>
</body>
</html>