-
Notifications
You must be signed in to change notification settings - Fork 0
/
kappa.tex
1823 lines (1467 loc) · 138 KB
/
kappa.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{Introduction}
% < Server: Microsoft-IIS/6.0
% < X-Powered-By: ASP.NET
% < X-AspNet-Version: 1.1.4322
This dissertation was submitted for registration over an unencrypted connection, to a web server running software that has passed end-of-life of security support.
Together with the dissertation itself, single sign-on credentials, useful to access all digital university systems, were submitted for authentication purposes.
All of this data neatly packaged and transmitted over an insecure connection, to a vulnerable remote host.
If you are a security researcher, you probably feel slightly distraught after reading the previous paragraph.
However, while being an inherently insecure situation, this is the reality of many systems today -- the system at the university is just one instance of a widespread problem of lack of cyber security.
Organizations maintain complex systems, at the same time as administrative responsibilities may be unclear, systems may be forgotten, or there may just be a mindset of ``if it is not broken, why fix it?''.
% if it ain't broke don't fix it
% if it is not broken, why fix it?
At the same time,
%In the information society of today, information technology is constantly evolving in an ever increasing speed.
growing amounts of information and expectations of easy access increase the risk of information manipulation and leakage if systems are left vulnerable.
As the demand for increased computational power and storage has grown, there has also been a shift in how computer resources are consumed.
Traditionally, data has been stored and used on-premise, or at least on dedicated hardware bought and managed by the owner.
However, the current trend is to instead rely on computer resources managed and owned by someone else, commonly called \emph{cloud computing}.
This ubiquitous access to computer resources relieves the user from managing their own hardware, and reduces upfront costs for buying their own hardware.
It also allows for flexible scaling of resources, allowing the consumer to increase or decrease the consumed resources depending on workload, only paying for the actual resources consumed.
% 2) Some problems
The storage and use of information on remote resources do require a discussion about cyber security.
While outsourcing the maintenance of systems to a remote entity \emph{may} reduce the risk of ending up with unmaintained systems -- though very much dependent on the type of service bought -- it also introduces whole new classes of questions related to information security.
Some issues such as data ownership, availability guarantees, performance, and the type of service provided, can be solved by using Service-Level Agreements as legally binding contracts.
Such legal measures can allow customer compensation if the terms are not met, thus providing an incentive for the service provider to fulfil their promises.
However, legal approaches do only provide some mitigation \emph{after} an incident has occurred.
Consider the example of a leak of information from a cloud provider.
While the customer can demand compensation for a breach of contract, it does not change the fact that the information is now leaked and potentially distributed and publicly available.
The point is that legal agreements, while important in their own right, only go so far, since they mainly deal with mitigation, not prevention.
For the customer, the ideal situation would have been preventing the leak from occurring in the first place.
Preventive measures instead focus on trying to prevent such incidents from occurring at all.
Such measures can take many forms, including risk assessments, security policies, as well as purely technical measures used in the implementation of systems.
While all preventive measures are important and relevant, the focus of this dissertation will be different technical preventive measures.
In particular, the following areas are considered in more detail, and they can be used as preventive measures in the following ways.
\begin{itemize}
\item Cryptography can be used to protect data at rest and during transmission, preventing information leakage.
\item Trusted computing can be used to verify the integrity of code running remotely, preventing malicious code from processing information.
\item Proactive approaches to assess software vulnerabilities can reduce the attack surface of deployed software, preventing attacks.
\end{itemize}
This dissertation presents technical contributions to several areas of the preventive measures described above.
\section{Dissertation Outline}
After this brief introduction and motivation to the scope of this work, the rest of the dissertation is structured as follows.
The following chapter, \chapref{ch:kappa-background}, provides both technical background, together with the current state of research in topics used throughout this dissertation.
In more detail, Section~\ref{sec:kappa-trustedcomputing} starts by presenting an overview of Trusted computing in general, together with technical background about several trusted computing technologies, as well as their potential applications.
This is followed by Section~\ref{sec:kappa-recommendersystems}, covering an overview of recommender systems, their possible applications in cyber security, and potential privacy implications of their use.
Third, in Section~\ref{sec:kappa-cryptography} we discuss the area of cryptography, with an emphasis on symmetric cryptography, stream ciphers in particular, and the analysis of such ciphers.
These three sections together present the background and context of the research contributions.
The contributions of each paper to the research field are described in Section~\ref{sec:kappa-contributions}, which is followed by some final conclusions and remarks in Section~\ref{sec:kappa-conclusions}.
Finally, in the second part of this dissertation, the individual included publications are presented in their published form, although reformatted for stylistic consistency with the rest of the dissertation.
\chapter{Background}
\label{ch:kappa-background}
This chapter presents some background from the various research fields of the contributions of this dissertation.
We start by presenting the area of trusted computing, which is then followed by recommender systems, and finally cryptography.
\section{Trusted Computing}
\label{sec:kappa-trustedcomputing}
In modern society, computers are ubiquitous and take part in our lives in many different ways.
While some of them are used directly by end-users -- devices such as desktop computers, laptops, or smartphones -- others are only used indirectly.
Such indirect use includes connecting to remote servers serving web pages, relying on your bank to maintain your account balances, and relying on your mobile phone network to service your calls.
Trusted computing is a technology to achieve \emph{trust} in a computer system.
The definition of trust here is important, and the Trusted Computing Group (TCG) defines it as follows:
\begin{quote}
```trust' is meant to convey an expectation of behavior'' \cite{TPM2.0r38}.
\end{quote}
From this definition, we can restate the goal of trusted computing as guaranteeing a predictable behaviour of a computer system.
This is the definition that will be used throughout this dissertation when discussing trusted computing.
Providing such guarantees is important for the overall functionality of a system.
If the behaviour of crucial parts of the system is non-predictable, the behaviour of the system as a whole may be unknown.
Non-predictable behaviour may affect the security of the system as well; a subsystem performing unintended tasks may affect the confidentiality, integrity, or availability of data.
Trusted computing can be used both to guarantee the behaviour of local devices, but also to get guarantees of the behaviour of a remote system, thus covering all of the different types of devices mentioned in the beginning of this section.
Practical uses of protecting the system integrity of local devices include ensuring that a computer's software has not been modified since last use, and verification of the behaviour of remote devices, which can be especially relevant in a cloud computing context where the hardware is not controlled by the user.
A central concept in trusted computing is the reliance on a Trusted Computing Base (TCB), a set of hardware, firmware, and software critical to enforcing the security properties of a system.
Since the TCB is crucial to the guarantees provided by trusted computing, it is important that it behaves in the expected ways.
While this will be assumed for the remainder of this dissertation, we refer to techniques such as formal verification \cite{dsilva:2008,klein:2009,kern:1999} to actually achieve this important property.
\subsection{Attacker Model}
\label{sec:kappa-attackermodel}
For the work on trusted computing in this dissertation, the following attacker model is used.
The model is similar to attacker models of other works in trusted computing such as \cite{maene:2018,peters:2017}.
As described above in the definition of the TCB, trusted computing requires trust rooted in the hardware.
While software at any level (except software in the TCB) may be compromised by an attacker, the hardware is considered trusted and tamper-resistant from attackers throughout this dissertation.
In particular, the operating system may be compromised by an attacker, which means that access control policies provided by the OS may not be enforced.
An attacker may also eavesdrop on or modify traffic between systems, i.e. both passive and active network attacks can be performed.
Finally, we assume that cryptographic primitives such as symmetric and asymmetric ciphers, hash functions, and digital signatures are secure.
While all technologies requires some level of trust rooted in hardware, the extent of \emph{which} hardware components to trust differ between various trusted computing technologies.
Such details are discussed in connection to the description of each technology.
\subsection{Security Properties}
\label{sec:kappa-secprop}
When discussing trusted computing, it is helpful to look at smaller features, which will be used throughout the dissertation.
These features are called \emph{security properties}.
The following security properties will be used according to the definitions below, following definitions from earlier work such as \cite{maene:2018,vasudevan:2012}.
\begin{description}
\item[Isolation]
Also called isolated execution, ensures confidentiality and integrity of code and data inside an environment.
Such an isolated environment is typically provided by hardware in a trusted computing setting.
This feature ensures that someone outside the isolated environment cannot read or modify information inside the isolated execution environment.
\item[Attestation]
A property which allows a local or remote party to request proof that an application is in a certain state.
This allows the verifying party to be confident in the integrity of an application.
By extension, such integrity guarantees provides some guarantees of the behaviour of the attested code.
While it does not guarantee that the code runs according to a written specification, it does provide a guarantee that it is indeed the expected binary implementation.
\item[Sealing]
A process of protecting the confidentiality of data such that it can only be read given that certain conditions are met.
Common examples are sealing data to a particular hardware device, or to a certain state (cf. attestation above).
\end{description}
\subsection{Roots of Trust}
As described in the attacker model in Section~\ref{sec:kappa-attackermodel}, trust is generally rooted in some hardware component, either directly or indirectly.
A Root of Trust (RoT) can be different depending on the functionality it provides.
Among others, the TCG defines the following roots of trust, which have been chosen due to their relevance for this dissertation:
\begin{description}
\item[Root of Trust for Measurement] The root from which measurement starts.
There are two main ways to construct the base for measurements: either a Static Root of Trust for Measurement (SRTM), or a Dynamic Root of Trust for Measurement (DRTM).
An example of SRTM is when measurement always starts at system startup.
The measurements then include major parts of the system: BIOS, boot loader, kernel, etc.
With DRTM, measurements can start at different points in time, but still give assurance that the system is in a known state.
\item[Root of Trust for Reporting] Provides a RoT for attesting the origin and authenticity of a platform state, such as measurements.
This is used during attestation of a system state.
\item[Root of Trust for Storage] Provides a RoT for storage that protects both the integrity and confidentiality of data.
\end{description}
\subsection{Technologies}
To achieve the guarantees of trusted computing, there have been several different technologies proposed.
Each technology provides some set of the security properties described in Section~\ref{sec:kappa-secprop}, which are used to provide the guarantees of trusted computing.
The differences in selection of security properties provide us with several different technologies.
While all of them can be said to support trusted computing, they do so in different ways, are suitable for different use cases, and work on different hardware.
While this dissertation focuses on a few selected technologies, it starts with a brief overview of several currently existing technologies.
Later in this chapter some of them are described in more detail -- enough to provide a solid foundation for the discussion of the contributions in the dissertation.
The first technology to be discussed is the Trusted Platform Module (TPM).
The TPM is typically a discrete chip mounted on a motherboard, which together with software support can provide certain security properties.
A TPM can be used for attestation and sealing, but does not in itself provide an isolated execution environment.
Section~\ref{sec:kappa-tpm} covers TPM in more detail.
Looking at the class of commodity desktop, laptop, and server hardware using the x86 architecture, two well-known trusted computing technologies are Trusted Execution Technology (TXT) and Software Guard Extensions (SGX), both developed by Intel.
Intel TXT is described briefly in Section~\ref{sec:kappa-othertc}, but will not be covered in detail, instead we focus more on Intel SGX.
Intel SGX is a more recent development, only available on Intel x86 hardware.
In short, SGX allows isolated execution of code inside something called \emph{enclaves}.
The enclaves can be both locally and remotely attested, and both code and data can be included in measurements.
If desired, data can also be sealed.
SGX is discussed in more detail in Section~\ref{sec:kappa-sgx}.
While this dissertation will focus on the aforementioned technologies, some work in this dissertation is not necessarily limited to a specific technology for trusted computing, which makes it relevant to also briefly discuss alternative technologies.
A few other relevant technologies are mentioned later, in Section~\ref{sec:kappa-othertc}.
\subsection{Trusted Platform Module}
\label{sec:kappa-tpm}
The Trusted Platform Module (TPM) is a computer component, typically present as a discrete hardware component on the motherboard, which can perform various cryptographic operations.
This includes functionality such as encryption, signing, sealing, and attestation.
The functionality of a TPM is defined by the Trusted Computing Group (TCG) in their specifications, e.g., \cite{TPM1.2spec,TPM2.0r38}.
There are two main different versions of the specification in use today, TPM 1.2 and TPM 2.0, which differ in some areas.
However, both versions support the same general functionality.
Here we describe both versions, starting with TPM 1.2.
\subsubsection{TPM 1.2}
\label{sec:kappa-tpm12}
The first TPM 1.2 specification was released in 2003, with the most current revision of the specification being \cite{TPM1.2spec} released in 2011.
The TPM can be used to build a key hierarchy of asymmetric keys -- useful for signing or encryption -- where the root key of this hierarchy is called the Storage Root Key (SRK) and is stored inside the TPM.
The SRK can have child keys of different types, but all child keys are part of the key hierarchy by having their private key material encrypted by their parent's public key.
%Keys can either be generated internally inside the TPM, which ensures that the private key is only available inside the TPM, or the keys can be externally generated and then imported into the TPM.
Another important key of the TPM is the Endorsement Key (EK).
This key is a 2048-bit RSA key and is only generated once, usually before an end-user receives the module \cite{TPM1.2spec}.
The purpose of the EK is to have a Root of Trust for Reporting (RTR), so that an individual TPM can be authenticated.
This raises privacy concerns, since it means that an individual platform can be identified across services if the EK were to be used as the sole key.
To try to mitigate this issue, the TPM supports the use of Attestation Identity Keys (AIKs).
The purpose of the AIK is to allow the use of different keys for different context, to avoid the privacy issues of using a single key which uniquely identifies a TPM.
\paragraph{Platform configuration registers}
A TPM has at least 16 registers called Platform Configuration Registers (PCRs).
Each register can store a 160-bit hash value, suitable for holding the output of a SHA-1 hash.
The value of the PCRs are cumulative, such that after a reset (platform boot, or later), the updated value of a PCR with index $i$ is calculated as
\begin{align}
\text{PCR}_i^{\text{new}} = h(\text{PCR}_i || \text{value}) \label{eq:kappa-pcrextend}
\end{align}
where $h$ is the hash function (SHA-1), and $\text{value}$ is the value to add to the PCR.
\paragraph{Attestation}
In TPM 1.2, attestation is performed by using an AIK together with PCRs.
Attestation of the platform is performed as follows.
During platform boot, the platform state is measured, for example firmware, bootloader, operating system, and configuration.
Measurements are cumulatively stored in the PCRs, which guarantees that if any part of the chain is modified, the resulting hash value stored in the PCR will be different.
To perform attestation, the PCR hash values -- which now represents the state of the platform -- are signed using the AIK.
This resulting package is called a \emph{quote}.
Since the AIK itself is tied to the very specific TPM it is loaded into, the quote can be verified by a local or remote entity, by first verifying the signature, and then inspecting the PCR values, which tells that the platform was in a certain state upon creation of the quote.
\paragraph{Migration}
While the TPM is designed to provide secure storage of keys, such that the private key material is protected by the TPM, it may be desirable to have the same key material on several different TPMs in certain circumstances.
An example of such a situation is when older hardware -- including the TPM -- is replaced, but encrypted data is to be retained.
This requires the new TPM to have the decryption keys from the older TPM.
Another situation is when designing a high-availability system with redundancy; each node then needs their own TPM with the required keys.
TPM 1.1 introduced \emph{migratable} keys to make it possible to migrate -- or rather duplicate -- keys to another TPM \cite{TPM1.1bspec}.
Keys can be marked as \emph{non-migratable} or \emph{migratable}, so that migration can be either disallowed or allowed.
To authorize the migration of a key, the TPM owner must authorize the destination, and the \emph{migration secret} must be presented to the TPM.
The implementation of non-migratable keys also uses this scheme; for a non-migratable key the migration secret is simply a secret value only known by the TPM (called tpmProof) thus making migration impossible to authorize for users.
TPM 1.2 further introduced another type of migratable key, called Certifiable Migration Key (CMK), which allows a third party to be in control over the destination of the migration.
This third party, the Migration Selection Authority (MSA), can be used to ensure that the destination of the migration is the intended, for example by checking that it is another TPM.
Migration, and scenarios where it is suitable, are also discussed in more depth when we make use of migratable keys as part of a solution to the problems discussed in \paperref{ch:tpmhas}.
We also discuss migration in \paperref{ch:tpm12to20} when we design a solution to migrate keys from TPM 1.2 to TPM 2.0.
\subsubsection{TPM 2.0}
\label{sec:kappa-tpm20}
TPM 2.0 is a newer version of the TPM specification \cite{TPM2.0r38}, and in many ways, TPM 2.0 is a generalization of TPM 1.2.
The key hierarchy has been replaced with a more general object hierarchy,
the use of RSA has been extended with support for several different types of cryptographic primitives,
and secrets have been partially generalized by the use of \emph{policies}.
As the changes between the old and the new standard are significant, there is no backwards compatibility with TPM 1.2.
The lack of such may require significant effort by system designers if a move from older TPM 1.2 hardware to newer TPM 2.0 hardware is desired.
Keys may require conversion, and the mismatch of features between TPM 1.2 and TPM 2.0 can cause issues with authorization, migration, and other tasks.
\paperref{ch:tpm12to20} of this dissertation tries to solve such issues by describing ways to achieve the old TPM 1.2 behaviour using the new functionality of TPM 2.0.
The goal of this section is to highlight some differences between TPM 1.2 and TPM 2.0 which will be important for the remainder of this dissertation.
For more details, refer to the specification \cite{TPM2.0r38}, or comprehensive guides such as \cite{arthur:2015}.
\paragraph{Duplication}
Migration has been renamed to \emph{duplication} in TPM 2.0, which is a more truthful description of the functionality, since the keys are indeed duplicated rather than moved.
There are also differences regarding the possibility to perform duplication in the first place.
While TPM 1.2 has migratable and non-migratable keys, TPM 2.0 has two new properties: \emph{fixedTPM} and \emph{fixedParent}, which is set on keys upon their creation.
If the former is set on a key, it is never allowed to leave the TPM.
If the latter is set, the key is fixed to its parent, and cannot be duplicated on its own, but might be duplicated indirectly if duplication of the parent key is allowed.
There is no longer a migration secret connected to keys in TPM 2.0, instead authorization is performed using the more general framework of policies, which will be described next.
\paragraph{Policies}
Policies is a new general authorization mechanism in TPM 2.0, which can be used to authorize several different operations on objects in the TPM.
As mentioned earlier, a policy can be used to authorize migration, but also to authorize for example regular usage of a key.
Policies work by including a value \texttt{authPolicy} into a key upon creation time.
The policy is calculated by building a hash chain of individual policy commands, called \emph{policy assertions}.
For an example see Figure~\ref{fig:kappa-tpm2policy} which outlines the idea.
\begin{figure}[ht]
\centering
\begin{tikzpicture}[scale=.75, every node/.style={scale=.75}, node distance=3cm, font=\sffamily]
\node[] (zerostart) {\texttt{000..00}};
\node[draw,right=.75cm of zerostart] (policypcr) {\texttt{TPM2\_PolicyPCR}};
\node[right=.75cm of policypcr] (authpol1) {\texttt{a53..1d}};
\node[draw,right=.75cm of authpol1] (policyauth) {\texttt{TPM2\_PolicyAuthValue}};
\node[right=.75cm of policyauth] (authpol2) {\texttt{7c9..3a}};
\node[above=.5cm of policypcr] (pcrs) {$\text{PCR}_i$};
\node[above=.5cm of policyauth] (authval) {Auth value};
\draw[->] (zerostart) -- (policypcr);
\draw[->] (policypcr) -- (authpol1);
\draw[->] (authpol1) -- (policyauth);
\draw[->] (policyauth) -- (authpol2);
\draw[->] (pcrs) -- (policypcr);
\draw[->] (authval) -- (policyauth);
\end{tikzpicture}
\caption{Calculation of the \texttt{authPolicy} for a key requiring both PCR values and a secret value, called the auth value}
\label{fig:kappa-tpm2policy}
\end{figure}
When calculating the policy hash, the TPM first starts with an all-zero hash, then for each consecutive policy assertion the previous hash value is concatenated with the output from the policy function, and then hashed again as:
\begin{align}
\text{policyDigest}^{\text{new}} = h(\text{policyDigest } || \text{ policyAssertionOutput})
\end{align}
similar to how the PCR values are updated in (\ref{eq:kappa-pcrextend}).
The output from the policy assertion is based on the provided parameters.
Using \texttt{TPM2\_PolicyPCR} as an example, the output will depend on the selection and values of PCRs.
When the TPM performs access control, it executes the policy assertions, and then compares the calculated hash with the value stored in the \texttt{authPolicy} field of the object.
Since the \texttt{authPolicy} is a hash value, we note that the order of the policy assertions is important.
Evaluating two policy assertions in a different order will result in a different hash value.
Chaining several policy assertions in sequence means that both assertions need to be valid for the authorization to be true, thus it can be seen as a logical AND.
To realize a logical OR, a separate policy command is required, \texttt{TPM2\_PolicyOR}.
This policy assertion is true if any of the previous branches is true.
\begin{figure}[htbp]
\centering
\begin{tikzpicture}[node distance=0.3cm,auto,>=latex',scale=0.8, every node/.style={scale=0.8}]
\matrix (m) [matrix of nodes, row sep=0.3cm]
{
\texttt{TPM2\_PolicyPCR} & & \\
\texttt{TPM2\_PolicyAuthValue} & & \texttt{TPM2\_PolicySigned} \\
& & \\
};
\path (m-2-1) -- (m-2-3) node[midway] (m-2-2) {}; % helper node for placing the node below.
\node[below=0.5cm of m-2-2] (m-3-2) {\texttt{TPM2\_PolicyOR}};
\draw[->] (m-1-1) to (m-2-1);
\draw[->] (m-2-1) to (m-3-2);
\draw[->] (m-2-3) to (m-3-2);
\end{tikzpicture}
\caption{Example of a policy chain with \texttt{TPM2\_PolicyOR}}
\label{fig:kappa-tpm2policystructure}
\end{figure}
Throughout this dissertation, we will describe policy chains using figures similar to Figure~\ref{fig:kappa-tpm2policystructure}.
The final hash value, stored in the object's \texttt{authPolicy}, is the result of the evaluation of the final (bottom) policy, thus the figures should be evaluated from top to bottom.
Using Figure~\ref{fig:kappa-tpm2policystructure} as an example, access to the object is granted if \emph{any} of the following two conditions are met:
\begin{enumerate}
\item The PCR values match some desired values \emph{and} a correct authorization secret is given.
\item A valid signature can be provided over some parameter.
\end{enumerate}
By combining logical AND and logical OR in this way, it is possible to construct a wide variety of policy assertions.
\subsection{Intel SGX}
\label{sec:kappa-sgx}
\newcommand{\sgxcommand}[1]{\texttt{#1}}
\newcommand{\sgxadd}{\sgxcommand{EADD}}
\newcommand{\sgxcreate}{\sgxcommand{ECREATE}}
\newcommand{\sgxenter}{\sgxcommand{EENTER}}
\newcommand{\sgxexit}{\sgxcommand{EEXIT}}
\newcommand{\sgxextend}{\sgxcommand{EEXTEND}}
\newcommand{\sgxgetkey}{\sgxcommand{EGETKEY}}
\newcommand{\sgxinit}{\sgxcommand{EINIT}}
\newcommand{\sgxremove}{\sgxcommand{EREMOVE}}
\newcommand{\sgxreport}{\sgxcommand{EREPORT}}
\newcommand{\sgxresume}{\sgxcommand{ERESUME}}
\newcommand{\sgxmrenclave}{\sgxcommand{MRENCLAVE}}
\newcommand{\sgxmrsigner}{\sgxcommand{MRSIGNER}}
Intel Software Guard Extensions (SGX) is a set of instructions built into modern CPUs produced by Intel.
Being a trusted computing technology, it provides several different security properties, including isolation, attestation, and sealing.
By providing the different security properties within the processor itself, it is now enough to trust only the CPU vendor.
This is opposed to for example the TPM, which requires trust in both the TPM vendor for attestation purposes, and the processor that executes the machine code.
The instruction set was first proposed in 2013, separately describing isolated execution \cite{mckeen:2013}, and attestation and sealing \cite{anati:2013}.
The first hardware with support for SGX was shipped in 2015, starting with Intel's Skylake architecture.
Since SGX is an extension of the instructions supported by the CPU, applications can use SGX specific instructions to use the various security properties.
The most central concept in SGX is the \emph{enclave}.
An enclave is a trusted execution environment (TEE) where code and data can be loaded at creation time.
After creation, the code can be measured for attestation purposes, and execution of the enclave is done in isolation, so that no other process or enclave on the system can read its memory.
SGX2 is an extension to the original SGX specification, which adds a set of extra instructions related to SGX memory and thread management.
The instructions are described in great detail in \cite{intel64b}, but are not discussed here, since SGX2 has not been used in this dissertation.
The life cycle of an enclave is described next, followed by the security properties, and their specific implementation and usage in SGX.
\subsubsection{Enclave Life Cycle}
An overview of the life cycle of an enclave can be seen in Figure~\ref{fig:kappa-lifeofenclave}.
The enclave is first created using \sgxcreate{}, which creates internal structures with metadata about the enclave.
This is followed by repeatedly calling \sgxadd{} for each 4~KiB page of memory to add to the enclave.
This copies memory from unprotected memory into the EPC of the enclave.
If desired, \sgxextend{} can be called to measure 256 bytes of memory of the enclave.
The measurement will be stored in a measurement log, which can later be used to attest the integrity of the enclave.
\sgxextend{} needs to be repeated for every 256 byte block that should be measured.
When all desired pages has been added to the enclave, \sgxinit{} should be called to finalize the creation stage.
After this, no more pages can be added to the enclave, and the measurement of the added pages will be finalized.
After initialization, the enclave code can be executed.
This is done by calling \sgxenter{}, which starts execution of enclave code at a predefined location.
The enclave then continues execution until either a controlled exit occurs using \sgxexit{}, or until an exception or interrupt occurs, which will be handled by AEX.
To avoid clutter, the AEX flow is not described in Figure~\ref{fig:kappa-lifeofenclave}, but after AEX, execution can be resumed in the enclave by issuing \sgxresume{}.
\begin{figure}[t]
\centering
\begin{tikzpicture}[scale=1, every node/.style={scale=1}, node distance=3cm,
ibox/.style={draw,minimum width=2cm},
stagebox/.style={dashed,gray,opacity=0.5}]
\node[ibox] (create) {\sgxcreate};
\node[ibox,right of=create] (add) {\sgxadd};
\node[ibox,right of=add] (extend) {\sgxextend};
\node[ibox,right of=extend] (init) {\sgxinit};
\node[ibox,below=1.5cm of create] (enter) {\sgxenter};
\node[ibox,right of=enter] (exit) {\sgxexit};
\node[ibox,below=1.5cm of enter] (remove) {\sgxremove};
\coordinate (createadd) at ($(create)!0.5!(add)$);
\coordinate (addextend) at ($(add)!0.5!(extend)$);
\coordinate (extendinit) at ($(extend)!0.5!(init)$);
\coordinate (initenter) at ($(init)!0.5!(enter)$);
\coordinate (exitremove) at ($(exit)!0.5!(remove)$);
\coordinate (removedone) at ([xshift=0.25cm,yshift=0.25cm]remove.north east);
\coordinate[above=0.25cm of extendinit |- extend.north east] (abovepos1);
\coordinate (abovepos2) at ([xshift=0.25cm,yshift=0.25cm]exit.north east);
\coordinate (createentera) at ($(create)!0.4!(enter)$);
\coordinate (createenterb) at ($(create)!0.6!(enter)$);
\coordinate (enterremovea) at ($(enter)!0.4!(remove)$);
\coordinate (enterremoveb) at ($(enter)!0.6!(remove)$);
\draw[->] (create) -- (add);
\draw[->] (add) -- (extend);
\draw[->] (extend) -- (init);
\draw[->] (init) |- (initenter) -| (enter);
\draw[->] (enter) -- (exit);
\draw[->] (exit) |- (exitremove) -| (remove);
\draw[->] ([xshift=-0.075cm]extendinit) |- (abovepos1 -| extend) -| (addextend);
\draw[->] ([xshift=+0.075cm]extendinit) |- ([yshift=0.15cm]abovepos1) -| (createadd);
\draw[->] (exit.east) -| (abovepos2) -| ([xshift=-0.25cm]enter.north east);
\draw[->] (remove.east) -| (removedone) -| ([xshift=-0.25cm]remove.north east);
%\draw[dashed,gray] (createenter -| create.west) -- (createenter -| init.east);
\draw[stagebox] ([xshift=-0.75cm]createentera -| create.west) rectangle ([xshift=0.25cm,yshift=0.75cm]init.north east);
\draw[stagebox] ([xshift=-0.75cm]enterremovea -| enter.west) rectangle ([xshift=0.25cm]createenterb -| init.north east);
\draw[stagebox] ([xshift=-0.75cm,yshift=-0.5cm]remove.south west -| remove.west) rectangle ([xshift=0.25cm]enterremoveb -| init.north east);
\node[gray,left=0.75cm of create.west,anchor=north,rotate=90] {\footnotesize creation};
\node[gray,left=0.75cm of enter.west,anchor=north,rotate=90] {\footnotesize use};
\node[gray,left=0.75cm of remove.west,anchor=north,rotate=90] {\footnotesize teardown};
\end{tikzpicture}
\caption{Overview of enclave life cycle with instructions used in the creation, use, and teardown stage}
\label{fig:kappa-lifeofenclave}
\end{figure}
\subsubsection{Isolation}
Recall that isolation, or isolated execution, means that code and data are stored within an isolated environment.
An entity outside the isolated environment should not be able to read or modify data within the environment.
This section briefly describes the various means SGX has to provide isolation.
For a more in depth description, refer to \cite{mckeen:2013} and \cite{intel64b}.
An interesting property of SGX is that enclaves provide isolation from other processes on the system, the operating system, hypervisors, as well as other enclaves.
This allows several -- potentially mutually distrusting -- enclaves to coexist on the same system while still providing isolation guarantees.
A special area of the system memory, called the Enclave Page Cache (EPC), is used by enclaves to store data.
This area cannot be accessed by peripherals, other enclaves, or other parts of the system, including hypervisors or operating systems.
This is enforced by the memory controller in the CPU, which uses the Enclave Page Cache Map (EPCM) structure which stores information about each page in the EPC.
The EPCM is then used by the processor to make access control decisions when enclave memory is accessed.
The contents of the memory is also encrypted so that enclave data is never stored in DRAM in plaintext, since the CPU is the only trusted agent in the SGX model.
To maintain isolation of the enclave, the entry and exit of enclaves is controlled and done with the \sgxenter{} and \sgxexit{} instructions, respectively.
This also ensures that the enclave can only be entered at predefined locations.
In case of an interrupt or an exception, the enclave will save its state securely using a process called Asynchronous Enclave Exit (AEX).
AEX ensures that an exception or interrupt handler in an untrusted domain outside the enclave cannot gain access to the internal enclave state.
\subsubsection{Attestation}
An important part of SGX is the possibility to perform attestation of enclaves.
A key component of attestation is to first get a measurement of the code and data of the enclave.
There are two important measurement registers related to attestation in SGX, \sgxmrenclave{} and \sgxmrsigner{}.
Both registers are 256 bit wide, and store SHA-256 hashes.
\sgxmrsigner{} is a hash of the public key that has signed the enclave, thus identifying the author, while \sgxmrenclave{} stores the hash of code and data that is included during enclave creation, which will be described more in depth below.
\sgxmrenclave{} is calculated during the creation of an enclave during the addition of pages to the EPC occurring between \sgxcreate{} and \sgxinit{}.
The measurement calculation is shown in Figure~\ref{fig:kappa-mrenclavecalculation}.
\begin{figure}[ht]
\centering
\begin{tikzpicture}[scale=0.8, every node/.style={scale=0.8}, node distance=3cm]
\node[draw] (shainit) {\texttt{SHA\_Init}};
\node[draw,right of=shainit] (shaupdate1) {\texttt{SHA\_Update}};
\node[ right=0.2cm of shaupdate1] (shaupdate2) {\ldots};
\node[draw,right=0.2cm of shaupdate2] (shaupdaten) {\texttt{SHA\_Update}};
\node[draw,right of=shaupdaten] (shafinal) {\texttt{SHA\_Final}};
\node[draw,right of=shafinal] (mrenclave) {\sgxmrenclave{}};
\draw[->] (shainit) -- (shaupdate1);
\draw[->] (shaupdate1) -- (shaupdate2);
\draw[->] (shaupdate2) -- (shaupdaten);
\draw[->] (shaupdaten) -- (shafinal);
\draw[->] (shafinal) -- (mrenclave);
% dashed rectangles
\node[draw,shape=rectangle,dashed,anchor=center,minimum width=2.7cm,minimum height=2cm] (shainitbox) at (shainit.north) {};
\node[draw,shape=rectangle,dashed,anchor=center,minimum width=2.7cm,minimum height=2cm] (shaupdate1box) at (shaupdate1.north) {};
\node[draw,shape=rectangle,dashed,anchor=center,minimum width=2.7cm,minimum height=2cm] (shaupdatenbox) at (shaupdaten.north) {};
\node[draw,shape=rectangle,dashed,anchor=center,minimum width=2.7cm,minimum height=2cm] (shafinalbox) at (shafinal.north) {};
\node[anchor=north] at (shainitbox.north) {\sgxcreate{}};
\node[anchor=north] at (shaupdate1box.north) {\sgxadd{}\texttt{/}\sgxextend{}};
\node[anchor=north] at (shaupdatenbox.north) {\sgxadd{}\texttt{/}\sgxextend{}};
\node[anchor=north] at (shafinalbox.north) {\sgxinit{}};
\end{tikzpicture}
\caption{Overview of \sgxmrenclave{} derivation during enclave creation}
\label{fig:kappa-mrenclavecalculation}
\end{figure}
The \sgxmrenclave{} measurement register is initialized during \sgxcreate{}.
Then, after each successive \sgxadd{} and \sgxextend{}, the hash is updated.
A call to \sgxadd{} will update the hash with metadata about the page that was added to the EPC, for example security properties, while a call to \sgxextend{} will update \sgxmrenclave{} with the content and metadata of a 256 byte data chunk.
SGX allows local attestation between enclaves on the same host, as well as remote attestation where an entity wishes to attest the integrity of an enclave running on a remote host.
This dissertation only considers the latter.
However, the local attestation sequence is relevant to understand how the remote attestation sequence works, thus both will be described below.
An overview of the local attestation flow can be found in Figure~\ref{fig:kappa-localsgxattestation}.
\begin{figure}[ht]
\centering
\begin{tikzpicture}[scale=1.0, every node/.style={scale=1.0}, node distance=3cm, font=\small\sffamily]
\tikzset{circled/.style={draw,shape=circle,outer sep=2pt,inner sep=0.5pt,minimum size=0.35cm,font=\footnotesize,fill=white}}
\node[draw,dashed,minimum width=9cm,minimum height=4cm] (userhost) {};
\node[draw,solid,minimum width=3.2cm,minimum height=2.7cm,anchor=north west,below right=0.4cm and 0.4cm of userhost.north west] (appA) {};
\node[draw,solid,minimum width=3.2cm,minimum height=2.7cm,anchor=north east,below left=0.4cm and 0.4cm of userhost.north east] (appB) {};
\node[draw,solid,minimum width=2.4cm,minimum height=1.5cm,anchor=north,below=0.4cm of appA.north] (appAenclave) {};
\node[draw,solid,minimum width=2.4cm,minimum height=1.5cm,anchor=north,below=0.4cm of appB.north] (appBenclave) {};
\node[above=0.0cm of appA.south] {Application A};
\node[above=0.0cm of appB.south] {Application B};
\node[above=0.0cm of appAenclave.south] {Enclave A};
\node[above=0.0cm of appBenclave.south] {Enclave B};
\node[above=0.0cm of userhost.south] {Single host};
\draw[<-] ([yshift=+0.5cm]appAenclave.east) -- node[above,circled] {1} ([yshift=+0.5cm]appBenclave.west);
\draw[->] ([yshift=+0.0cm]appAenclave.east) -- node[ circled] {2} ([yshift=+0.0cm]appBenclave.west);
\draw[<-] ([yshift=-0.5cm]appAenclave.east) -- node[below,circled] {3} ([yshift=-0.5cm]appBenclave.west);
\end{tikzpicture}
\caption{Overview of local attestation flow in SGX}
\label{fig:kappa-localsgxattestation}
\end{figure}
The goal of the local attestation flow is for Enclave B to ensure that Enclave A is running on the same host as itself, that it is running inside an enclave, and that the expected code is running.
This is performed with the following steps \cite{anati:2013}:
\begin{enumerate}
\item Enclave B sends its \sgxmrenclave{} value to Enclave A.
\item Enclave A calls the \sgxreport{} instruction to generate a report structure, integrity protected using a MAC, containing data including \sgxmrenclave{} and \sgxmrsigner{}.
The report is sent to Enclave B.
Note that the key used to create this MAC is only known to Enclave B and the CPU itself, it is \emph{not} known by Enclave A.
This proves that the report was created by trusted hardware, and not forged by Enclave A itself.
\item Enclave B receives the report, and starts by verifying the MAC using its report key, retrieved by \sgxgetkey{}.
If the MAC is valid, the contents of the report can be read, and Enclave B can verify the code and data integrity using \sgxmrenclave{} and \sgxmrsigner{}.
If desired, Enclave B can now send its own report to Enclave A, if mutual attestation is desired.
\end{enumerate}
If we instead look at the remote attestation flow, a Verifier wishes to remotely attest an enclave running on a remote host.
An overview of the attestation sequence can be seen in Figure~\ref{fig:kappa-sgxattestation}.
\begin{figure}[ht]
\centering
\begin{tikzpicture}[scale=1.0, every node/.style={scale=1.0}, node distance=3cm, font=\small\sffamily]
\tikzset{circled/.style={draw,shape=circle,outer sep=2pt,inner sep=0.5pt,minimum size=0.35cm,font=\footnotesize}}
\node[draw] (verifier) {Verifier};
\node[draw,right of=verifier] (application) {Application};
\node[draw,right of=application,align=center] (enclave) {Application\\Enclave};
\node[draw,below=1.5cm of application,align=center] (qe) {Quoting\\Enclave};
\node[draw,below=1.5cm of verifier,align=center] (ias) {Attestation\\Service};
\draw[->] ([yshift=+0.15cm]verifier.east) -- node[above,circled] {1} ([yshift=+0.15cm]application.west);
\draw[<-] ([yshift=-0.15cm]verifier.east) -- node[below,circled] {6} ([yshift=-0.15cm]application.west);
\draw[->] ([yshift=+0.15cm]application.east) -- node[above,circled] {2} ([yshift=+0.15cm]enclave.west);
\draw[<-] ([yshift=-0.15cm]application.east) -- node[below,circled] {3} ([yshift=-0.15cm]enclave.west);
\draw[->] ([xshift=+0.15cm]application.south) -- node[right,circled] {4} ([xshift=+0.15cm]qe.north);
\draw[<-] ([xshift=-0.15cm]application.south) -- node[left,circled] {5} ([xshift=-0.15cm]qe.north);
\draw[<->] (verifier.south) -- node[right,circled] {7} (ias.north);
% dashed rectangle and label.
\draw[dashed] ([xshift=0.25cm,yshift=0.25cm]enclave.north east) rectangle ([xshift=-0.5cm,yshift=-0.25cm]qe.south west);
\node[below=1.5cm of enclave] (remotehost) {{Remote host}};
\end{tikzpicture}
\caption{Overview of remote attestation flow in SGX}
\label{fig:kappa-sgxattestation}
\end{figure}
%\noindent
The numbered steps in the figure corresponds to the following steps in the attestation flow \cite{anati:2013}:
\begin{enumerate}
\item The Verifier issues a challenge to the remote host, challenging it to prove that it runs code inside an enclave.
\item The (untrusted) application forwards the challenge, together with the identity of the Quoting Enclave to the Application Enclave.
The Quoting Enclave is an Intel-provided enclave.
\item The Application Enclave calls the \sgxreport{} instruction to generate a report structure, integrity protected using a MAC, containing data such as \sgxmrenclave{} and \sgxmrsigner{}.
The report is generated by the CPU, using a key only accessible to the CPU and the Quoting Enclave.
The report is sent back to the application.
\item The report is forwarded to the Quoting Enclave.
\item The Quoting Enclave verifies the report by checking the MAC using its report key, retrieved by \sgxgetkey{}.
If the MAC is valid, it creates a quote structure, containing the report and then signs the quote with a private platform-specific private key, using the EPID scheme \cite{brickell:2010}.
Since this is a group signature scheme, it allows the creation of signatures without revealing the identity of the signing CPU.
\item The signed quote is sent back to the Verifier.
\item The Verifier can now verify the EPID signature of the quote by contacting an attestation service, typically the Intel Attestation Service (IAS), to ensure that the signature of the quote is valid. If it is, it can extract the \sgxmrenclave{} and \sgxmrsigner{} fields and compare them to known good values.
\end{enumerate}
After these steps, the Verifier can be sure that the enclave is: running inside an enclave on genuine hardware, and that the code and data in the enclave has not been modified before it was started.
These guarantees hold as long as the Verifier accepts the trust model of SGX.
Note that the communication between Application Enclave and Quoting Enclave is the local attestation flow described earlier, where the Quoting Enclave wishes to attest the Application Enclave.
Only if this local attestation is valid, the Quoting Enclave will sign the quote and pass it on.
For a more in depth description of attestation refer to \cite{anati:2013} and \cite{intel64b}.
\subsubsection{Sealing}
While the enclave protects the integrity and confidentiality of data while the enclave is running in memory, it can also be desirable to persist data to non-volatile media before the enclave exits.
This can be done by using sealing, which can be performed using several different policies in SGX.
Regardless of policy, the \sgxgetkey{} instruction is called together with the desired policy, and a 128-bit key is derived and returned by the CPU.
The key can then be used for encryption and/or integrity protection of data, before the data is persisted to disk.
Also, regardless of policy, the derived keys are unique to the particular CPU that derives the key; two identical enclaves running on different CPUs will derive different keys.
The sealing policies can be categorized into two major categories depending on the measurement register the key derivation is based on: either \sgxmrenclave{}-based or \sgxmrsigner{}-based.
In the first case, the sealing key will only be available to enclaves with the same \sgxmrenclave{} hash.
This ensures that only a particular enclave implementation will be able to unseal the data.
However, this also means that an updated version of the enclave, for example with security patches, will not be able to unseal data from a previous enclave version.
In the second case, the sealing key will be based on the \sgxmrsigner{} hash, together with a product id, and security version number (SVN).
The SVN can be used to allow an upgraded enclave to access sealed data from a previous version of the enclave.
This makes the management of security patches easier, since sealed data is accessible to newer enclave versions.
It is also possible to seal data without a product id, thus making it possible to share data between different enclaves from the same vendor.
For more details about sealing policies, refer to \cite{anati:2013} and \cite{intel64b}.
\subsubsection{Attacks on SGX}
Since the design of SGX was made public, there has been a progression in research related to potential attacks.
Several attacks have considered side-channel attacks related to memory management in Intel SGX, starting with the discussion of lack of protection against side-channel attacks in \cite{costan:2016}.
Fairly soon after these realizations, side-channel attacks on SGX started to appear.
In \cite{gotzfried:2017} the authors show a practical attack that can extract the AES key used for decryption inside an SGX enclave.
The attack is based on cache-timing attack, and can extract the AES key in less than 10 seconds, and is performed outside the enclave, thus breaking the isolation property of SGX.
Other attacks include \cite{schwarz:2017}, which performs an attack on an RSA implementation running within an enclave.
The attack is performed from within an attacker enclave, and attacks another enclave running on the same host.
It manages to recover the private RSA key either partially with a single trace, or completely when using multiple traces.
Similarly, \cite{brasser:2017} shows that the recovery of a private RSA key is possible, as well as the recovery of potentially sensitive genome data.
The extent of potential side-channel attacks is also discussed systematically in \cite{wang:2017}.
One way to avoid side-channel attacks is to implement algorithms in a data-oblivious way.
In \cite{ohrimenko:2016} this is discussed in the context of machine learning algorithms running inside SGX, where the authors propose data-oblivious algorithms for several common machine learning methods.
Also on the defensive side, \cite{gruss:2017} presents \emph{Cloak}, a technique to prevent attackers to observe cache misses.
The overall idea is to use hardware transactional memory, such as Intel TSX, which can be used to ensure that certain memory stays in the processor cache for the duration of the calculations, thus preventing cache-timing attacks.
The performance overhead is however heavily dependent on the amount of memory accesses in the code.
However, the previously mentioned mitigation does not always work.
In \cite{vanbulck:2018} the authors present Foreshadow, an attack based on the ideas from the Meltdown attacks \cite{lipp:2018}, but applied to SGX.
The attack leaks information from within the enclave, and also attacks the architectural enclaves such as the Quoting Enclave.
While the envisioned use case for SGX is to run trusted code within an enclave, shielding it from the unprotected world outside of SGX, \citeauthor{schwarz:2019} show that malware can be executed within an enclave \cite{schwarz:2019}.
Because of the isolation property of enclaves, such malware enclaves may avoid detection from anti-malware software running on the host.
%\cite{brasser:2017} % builds on others, easy to deploy, avoid detection rsa decryption, genomic processing
%\cite{gotzfried:2017} %
%\cite{schwarz:2017}
%\cite{wang:2017} % considers many different sc in a systematic approach.
%\cite{gruss:2017}
%\cite{vanbulck:2018} % immune to gruss:2017 (cloak)
%\cite{schwarz:2019}
\subsection{Other Trusted Computing Technologies}
\label{sec:kappa-othertc}
While the previous sections have discussed trusted computing technologies used later on in this dissertation, it is also valuable to briefly discuss other potential options.
The main motivation behind this is that some contributions are not necessarily tied to a specific trusted computing technology, but could be implemented by using several others as well, as long as they provide the required combination of security properties.
One technology is Intel TXT, which allows the use of a DRTM for an execution environment \cite{TXT-mle}.
It allows for an application or operating system to launch a Measured Launch Environment (MLE) of trusted code.
Intel TXT uses the PCR values of the TPM to store measurements, together with new instructions to actually perform the launch of the MLE.
Related to the TPM, one variant of the TPM is the Mobile Trusted Module, based on the TPM 1.2 specification, but with adoptions for mobile use cases.
The specification can be seen in \cite{MTM1.0spec}, and for an overview of the functionality the reader is referred to \cite{ekberg:2007}.
There has been several proposed options for actual MTM implementations, for example implementations based on TrustZone (described next) \cite{winter:2008}, M-Shield \cite{ekberg:2009}, or as a physical chip \cite{kim:2010}.
Another technology is ARM TrustZone, available in ARM processors starting with ARMv6 in 2002 \cite{zhang:2016}.
TrustZone provides two different worlds: a secure world, and a normal world.
The normal world runs a regular, rich, operating system as well as all other regular software installed on the system.
The secure world typically runs a specialized trusted software stack, implementing the desired security features.
The processor ensures that software in the normal world cannot access memory of the secure world, while the secure world has access to all memory: both in the secure and normal world.
Other trusted computing technologies focus on other platforms, such as Sanctum \cite{costan:2016b} for RISC-V, SecureBlue++ \cite{boivie:2013} for POWER, and Bastion \cite{champagne:2010} for OpenSPARC.
For surveys of other technologies, we refer to for example \cite{zhang:2016,peters:2017,maene:2018}.
The different trusted computing technologies each provide a different API, and different ways to use the security properties they provide.
Ongoing efforts to abstract differences between different technologies include for example Asylo \cite{asylo} and Open Enclave \cite{openenclave}.
\subsection{Research on Trusted Computing}
With the previous sections describing the background of the various trusted computing technologies, this section describes research trends as well as potential applications of the technology.
Research on trusted computing covers a wide area, and can have applications both on client-oriented devices, as well as in more server-oriented use cases.
On client-oriented devices, BitLocker \cite{Bitlocker} is perhaps the most well-known use of the TPM today, being available in certain Windows versions since the launch of Windows Vista in 2007.
BitLocker is a disk encryption solution, and can use TPM sealing such that the disk encryption keys are only released when the boot environment is unmodified.
On mobile devices, where use of ARM processors is widespread, use of TrustZone is most common.
Examples include the use of TrustZone in the Android Keystore \cite{androidkeystore}, and Samsung Knox allowing containers, useful to, e.g., separate business and personal containers on a single device \cite{kanonov:2016}.
TrustZone can also be used as a trust root to implement other technologies.
An example is the Mobile Trusted Module which can be implemented by the use of TrustZone as a root as in \cite{winter:2008}.
While interesting, we will not consider such client-oriented applications further in this dissertation, and instead focus on more server-oriented use cases.
We first related to virtualization and cloud computing, followed by privacy-preserving measures.
\subsubsection{Virtualization and Cloud Computing}
There has been a lot of focus on the use of trusted computing technologies in the area of virtualization and cloud computing.
Starting with the TPM, which in itself is not trivially shared between multiple virtualized machines, \citeauthor{berger:2006} proposed virtualization of TPMs, vTPM, in \cite{berger:2006}, allowing virtual TPMs to be connected to virtual machines at the same time as trust in the vTPMs is rooted in the single hardware physical TPM.
Other approaches include para-virtualized TPM sharing \cite{england:2008}, and property-based TPM virtualization \cite{sadeghi:2008}.
A possible benefit of the cloud infrastructure is the possibility to increase availability.
Migrating services between different physical hosts allows flexibility of hardware maintenance, and is an import part of a cloud service.
However, migration introduces difficulties when combined with TPMs (both physical and virtualized), because of the underlying change in physical host.
The problems are discussed in papers such as \cite{danev:2011} and \cite{wan:2012}.
In \paperref{ch:tpmhas} we also consider the availability problem by describing the use of TPMs in a high-availability system (HAS).
Instead of a virtualization approach, we instead consider a system with multiple, independent, computational units (CUs) which together constitute the HAS.
We propose a solution for how each TPM-equipped CU can share the same keys required to decrypt secure storage for availability purposes, by describing a key migration scheme using the functionality in either TPM 1.2 or TPM 2.0.
In \paperref{ch:tpm12to20} the key migration features of both TPM versions are discussed in depth, as we design a way to migrate key material from TPM 1.2 to TPM 2.0, thus allowing the migration from old to new hardware while keeping the same keys.
Due to the lack of backwards compatibility between the two versions, this requires careful considerations to maintain the same key material and behaviour with regard to authorization.
Since SGX was presented in 2013, it has attracted much attention in virtualization and cloud computing.
Proposed use cases include distributed MapReduce computations while maintaining confidentiality and integrity \cite{schuster:2015}, shielded execution of applications in the cloud \cite{baumann:2015}, running Docker containers inside enclaves \cite{arnautov:2016}, and secure database engines \cite{priebe:2018}.
Network function virtualization (NFV) and Software-Defined Networking (SDN) are two technologies often used together with virtualization and in the cloud, since they allow an increased flexibility with regard to network management.
For an overview of respective technology, the reader is referred to \cite{casado:2014} and \cite{mijumbi:2016}. % first is for SDN, second for NFV + connection to SDN.
While security in SDN networks is a broad topic in itself, there has been previous research discussing trusted computing technologies in connection with SDN.
Examples include the discussions of securing inter-domain routing in \cite{kim:2015}, securing NFV states in \cite{shih:2016}, and a framework to securely bootstrap virtual network infrastructure in SDN \cite{paladi:2016b}.
In \paperref{ch:trustanchors} we also present contributions to the security in SDN networks, by presenting work protecting credentials and cryptographic context of network elements, as well as a secure enrollment mechanism for network elements in a software-defined network.
For both parts, we utilize trusted computing technologies.
\subsubsection{Privacy-preserving Computations}
The isolation guarantees of SGX are also interesting from a privacy-preserving perspective.
Performing privacy-preserving computations can be performed in many ways, including fully homomorphic encryption \cite{gentry:2009}, secure multiparty computation \cite{yao:1982}, and lastly by using trusted execution environments, which will be the focus of this section.
Privacy-related topics where SGX have been used are, e.g., a framework for MapReduce computations \cite{schuster:2015}, a sandbox to perform arbitrary calculations on secret data \cite{hunt:2018}, functional encryption \cite{fisch:2017}, multiparty computations \cite{kucuk:2016,bahmani:2017}, private web search \cite{mokhtar:2017,pires:2018}, and in machine-learning settings \cite{ohrimenko:2016,chandra:2017,hunt:2018b}.
In \paperref{ch:recsyssgx} we also use Intel SGX to design a privacy-preserving system, but in the context of a recommender system (recommender systems in general, and privacy in recommender systems in particular will be discussed shortly, in Section~\ref{sec:kappa-recommendersystems} and \ref{sec:kappa-recsysprivacy}, respectively).
In this recommender system, sensitive information consist of the user profiles, which are protected by designing an intermediary, running inside an SGX enclave.
The design using an intermediary is similar to the work presented in \cite{lie:2017}, where the intermediary is verifiable by the client before it is entrusted with sensitive information.
In contrast to our work, the targeted recommender system is different, where \paperref{ch:recsyssgx} considers a recommender which is not collaborative, which makes the use of validation predicates unnecessary -- malicious input would only affect the profile of the user itself.
Instead of blinding by combining multiple inputs, we derive pseudo profiles, used to issue fake queries together with the genuine query, similar to the approach discussed for private web search in \cite{mokhtar:2017}.
\newpage
\section{Recommender Systems}
\label{sec:kappa-recommendersystems}
The general goal of a recommender system is to provide recommendations from a set of items to some user, the idea being that the recommendations are adapted to the user's interests or needs.
Recommender systems are widely used in a lot of different scenarios, including e-commerce, music, videos, social media, and many others.
The common denominator is that there is a large set of content, such as books, songs, videos, or social media posts -- more than the user can or wishes to manually browse.
The purpose of a recommender is to filter this large set, which likely contains many items not relevant for the user, and present the filtered view to the user to further interact with.
The end goal of presenting this filtered view can be one of many, a few examples are listed below:
\begin{itemize}
\item In an e-commerce setting, the goal is to increase the sales.
By providing recommendations that the users are likely to be interested in, or at least likely to buy, the revenue of the seller can increase.
\item For a monthly fixed-rate streaming service for music or videos, the goal may be to increase user satisfaction by providing recommendations of new or old content.
\item For content financed with advertisements, the goal of the recommender could be to increase the ad revenue by recommending interesting content to the users, increasing the chance that they stay on the service and are exposed to more advertisements.
\end{itemize}
A conceptual view of a recommender system can be seen in Figure~\ref{fig:kappa-recommender}.
The recommender takes two inputs: user information and item information.
\begin{description}
\item[User information] This can be any kind of information about a user, and the information can be supplied both explicitly and implicitly by the user.
Examples of explicit user information would be user-stated preferences or properties such as interests, ratings, or age.
Implicit user information could instead be automatically collected data about users, such as visited web pages, or purchase history.
\item[Item information] This is instead information about the set of possible items to recommend.
This is highly dependent on the items in question, but common examples are price, manufacturer, and product type, in case of a recommender system for manufactured goods.
\end{description}
\begin{figure}[tb]
\centering
\begin{tikzpicture}[scale=1, every node/.style={scale=1}, node distance=1cm, font=\small\sffamily,
database/.style={
cylinder,
shape border rotate=90,
aspect=0.15,
draw
}],
\node[draw,align=center] (recommender) {Recommender\\system};
\coordinate[left=2cm of recommender,anchor=east] (midpoint);
\node[database,above=0.5cm of midpoint,align=center] (userinfo) {User\\information};
\node[database,below=0.5cm of midpoint,align=center] (iteminfo) {Item\\information};
\node[right=of recommender] (recommendations) {Recommendations};
\draw[->] (userinfo) -- (recommender);
\draw[->] (iteminfo) -- (recommender);
\draw[->] (recommender) -- (recommendations);
\end{tikzpicture}
\caption{Conceptual view of a recommender system}
\label{fig:kappa-recommender}
\end{figure}
The recommender system combines user and item information, and outputs a set of recommendations, which is a subset or ranking of items that is likely to be relevant to the user in question.
The output can be of many forms, some common examples include:
\begin{itemize}
\item A subset of all items, such that the subset contains items relevant to the user.
\item A ranking of items, such that items are ordered according to their perceived relevance to the user.
\end{itemize}
Recommender systems can generally be categorized depending on the internal method used to generate the recommendations \cite{aggarwal:2016}.
Three major methods are: knowledge-based methods, content-based methods, and collaborative filtering methods.
They are described in more detail in the following sections.
\subsection{Knowledge-based Methods}
In a knowledge-based recommender system, the input of the recommender takes the form of user \emph{requirements} and item information, together with knowledge of the domain the recommender works in.
A distinctive feature of this type of recommender is that it does not work with user ratings or user history from previous interactions, but rather with a set of user-specified requirements.
There are two major classes of knowledge-based system, case-based and constraint based \cite{ricci:2011}.
Case-based systems are based on a similarity function, which calculates the similarity between the desired item (user requirements), and items actually available.
Constructing the similarity function requires knowledge of the domain in question, but after construction it can be used to support several different interfaces.
The user can either specify individual requirements directly, or they can request items similar to other items.
Both interfaces are possible because they both use a similarity function.
The other class is constraint-based systems.
In these systems, the user typically provide their requirements in the form of constraints.
In case of a house or an apartment, it could be the minimum and maximum number of rooms, living area, etc.
The constraints do not have to explicitly map to item attributes; the designer of the recommender system can use domain-specific rules to map between user requirements and item attributes.
Compared to case-based systems above, the main difference lies in how case-based systems use a similarity function, while constraint-based systems use a set of rules that defines the mapping from requirements to items.
In general, knowledge-based systems are suitable in situations where items are bought rarely, such as cars, houses, apartments, and similar expensive items \cite{aggarwal:2016}.
In these cases, it can be hard to gather enough history of purchase to generate suitable recommendations.
Requirements may also change with time, consider the previous example of a recommender for real estate; it is very likely that the user's preferences have changed since the last time they bought a house or an apartment -- after all, there is a reason they are looking for something new.
Knowledge-based systems solve this by not trying to learn from past user behaviour, instead focusing on their current requirements, and use domain-specific knowledge to try to compare these requirements to item attributes.
\subsection{Content-based Methods}
Content-based methods take the individual history of the user into account when generating recommendations.
By considering the user's rating of previous items, the recommender tries to find similar items that the user may like.
This is done by comparing the attributes of previously liked items, and finding other items with similar attribute values.
A typical example is a movie recommender system, where a user has given a movie a high rank.
The recommender will then look at the attributes of the movie, and then try to find other movies with similar attributes, and presents these movies to the user.
Content-based systems have both advantages and disadvantages \cite{aggarwal:2016}.
Advantages include that newly added items can be recommended, since their attributes can be compared with older items immediately.
On the contrary, a well-known disadvantage is that new users cannot be given any reasonable recommendations, since they have not yet rated any item.
This is an example of the cold-start problem of recommender systems.
\subsection{Collaborative Filtering}
Collaborative filtering methods are typically designed around a rating matrix, such as the matrix in Figure~\ref{fig:kappa-cf}.
The rating matrix is sparse, and contains the ratings from a user $u_a$ of an item $i_b$.
Generating recommendations can now be seen as filling out this sparse matrix, such that unknown ratings are estimated by using data from other parts of the matrix.
An important distinction between collaborative filtering methods and previously described methods is that collaborative filtering also considers information from \emph{other} users of the system.
\begin{figure}[ht]
\centering
\begin{tikzpicture}[scale=0.5, every node/.style={scale=0.5}, node distance=1cm]
\draw[step=1cm,very thin] (0, 0) grid (5, 5);
\foreach \x in {0,...,4}
\node at (\x+0.5, 5.5) {\LARGE $i_{\x}$};
\foreach \y in {0,...,4}
\node at (-0.5, 4-\y+0.5) {\LARGE $u_{\y}$};
\draw[fill=gray!20] (3,3) rectangle (4,4);
\foreach \x/\y/\v in {3/1/?, 1/2/5, 3/3/3, 0/0/1, 1/3/2, 4/1/4, 2/0/5, 2/4/1}
\node[] at (\x+0.5, 4-\y+0.5) {\huge \v};
\end{tikzpicture}
\caption{Example of a rating matrix used in collaborative filtering}
\label{fig:kappa-cf}
\end{figure}
Consider the rating matrix in Figure~\ref{fig:kappa-cf}.
The goal is to estimate the rating of some unknown $(u_a, i_b)$-pair in the rating matrix, for example the $(u_1, i_3)$-pair in Figure~\ref{fig:kappa-cf}, marked by the grey background.
A common methodology is to use neighbourhood-based collaborative filtering, which works by finding neighbourhoods of two different types.
A neighbourhood can either consist of similar users, or of similar items.
Again, using Figure~\ref{fig:kappa-cf} as an example, the former case corresponds to finding users with similar taste to $u_1$, and then finding \emph{their} ratings for the item $i_3$.
Thus, we look at ratings within the column $i_3$ from several users.
This is called user-based collaborative filtering, or user-user methods, and a more in-depth discussion can be found in \cite{herlocker:1999}.
The latter case, finding items that are similar to $i_3$, instead corresponds to looking at ratings on the same row, for those items that are similar to $i_3$.
This is called item-based collaborative filter, or item-item methods.
One advantage of item-item methods is that the relations between items are less likely to change, as opposed to user-user relations.
This can be used for performance optimizations, since the online-phase of computations can be reduced \cite{sarwar:2001}.
\subsection{Construction of Recommender Systems}
In this section, some aspects the construction of recommender systems are discussed.
We start by discussing similarity and utility functions, and how they are used in the different recommender system methods.
We then discuss how several recommender system methods can be combined together -- creating hybrid recommender systems.
\subsubsection{Similarity and Utility Functions}
A concept used in several of the previously mentioned recommender methods is similarity and utility functions.
At first, a discussion and definition of the two kind of functions are required; the definitions will then be used throughout the rest of this dissertation.
This is followed by examples of different similarity functions.
As the name implies, similarity functions are used to measure the similarity between two values.
In the general case, a similarity function can be defined in two ways: as the similarity between two users or items as a whole, or between two individual attribute values.
\begin{figure}[ht]
\begin{subfigure}[b]{0.5\linewidth}
\centering
\begin{tikzpicture}[scale=1, every node/.style={scale=1}, node distance=1.0cm, font=\small\sffamily,
item/.style={minimum width=2cm, minimum height=1.5cm,draw}]
\node[item] (item1) {Item 1};
\node[item,below=of item1] (item2) {Item 2};
\coordinate (item15) at ($(item1)!0.5!(item2)$) {};
\node[draw,right=1.5cm of item15,align=center] (sim) {\textsf{sim}};
\draw[->,thick] (item1) -| (sim);
\draw[->,thick] (item2) -| (sim);
\draw[->] (sim.east) -- +(0.5cm,0);
\end{tikzpicture}
% \captionsetup{singlelinecheck=true}
\caption{Between two items}
\label{fig:kappa-simlargevsattributelarge}
\end{subfigure}
\begin{subfigure}[b]{0.5\linewidth}
\centering
\begin{tikzpicture}[scale=1, every node/.style={scale=1}, node distance=1.0cm, font=\small\sffamily,
item/.style={anchor=west,minimum width=2cm, minimum height=1.5cm,text width=1.8cm,draw},
attr/.style={anchor=north east,minimum width=0.85cm,minimum height=0.375cm,draw,font=\scriptsize}]
\node[item] (item1) {Item 1};
\node[item,below=of item1] (item2) {Item 2};
\coordinate (item15) at ($(item1)!0.5!(item2)$) {};
% attributes
\node[attr] (i1a1) at (item1.north east) {$\text{attr}_1$};
\node[attr,below=0cm of i1a1] (i1a2) {$\text{attr}_2$};
\node[anchor=north east,below=-0.1cm of i1a2,inner sep=0] (i1a3) {\vdots};
\node[attr] (i2a1) at (item2.north east) {$\text{attr}_1$};
\node[attr,below=0cm of i2a1] (i2a2) {$\text{attr}_2$};
\node[anchor=north east,below=-0.1cm of i2a2,inner sep=0] (i2a3) {\vdots};
% sim func and arrows
\node[draw,right=1.5cm of item15,align=center] (sim) {\textsf{sim}};
\draw[->] (i1a1) -| (sim);
\draw[->] (i2a1) -| (sim);
\draw[->] (sim.east) -- +(0.5cm,0);
\end{tikzpicture}
% \captionsetup{singlelinecheck=true}
\caption{Between individual attributes}
\label{fig:kappa-simlargevsattributeattribute}
\end{subfigure}
\caption{Two ways to define similarity functions (\textsf{sim})}
\label{fig:kappa-simlargevsattribute}
\end{figure}
The distinction is visualized in Figure~\ref{fig:kappa-simlargevsattribute}, where \figref{fig:kappa-simlargevsattributelarge} shows the case where the similarity function takes two complete items as input, and outputs a similarity metric.
In \figref{fig:kappa-simlargevsattributeattribute} the similarity function is instead defined between individual attributes.
Also, while the figures show items and item attributes, note that it is also possible to use users and user attributes instead.
Depending on the recommender type, it may also be applicable for the similarity function to be defined between an item and a user.
This is the case for knowledge-based recommender systems, where user requirements and item attributes are compared using a similarity function.
Utility functions can be defined in the same way as similarity functions, but as the naming suggests, they are often used to give a more general metric of \emph{utility}, as opposed to just similarity.
Consider the example in Figure~\ref{fig:kappa-simvsutility}, where the user requirements for a computer are compared to two different items.
\begin{figure}[ht]
\centering
\begin{tikzpicture}[scale=1.0, every node/.style={scale=1.0}, node distance=1cm, font=\small\sffamily]
\node[draw,align=left] (req) {\emph{User requirements}\\CPU: 3.0 GHz\\RAM: 16 GiB\\Price: 4000 SEK};
\node[draw,align=left,below left=of req] (item1) {\emph{Item 1}\\CPU: 3.0 GHz\\RAM: 16 GiB\\Price: 4500 SEK};
\node[draw,align=left,below right=of req] (item2) {\emph{Item 2}\\CPU: 3.0 GHz\\RAM: 16 GiB\\Price: 2000 SEK};
\draw[<->] (req) -- node[above] {?} (item1);
\draw[<->] (req) -- node[above] {?} (item2);
\end{tikzpicture}
\caption{User requirements being compared with two different items}
\label{fig:kappa-simvsutility}
\end{figure}
Assuming CPU, RAM, and price being the only possible attributes of a computer, which of the two items matches best with the user requirements?
If a strict definition of similarity is used, clearly the more expensive Item 1 is more similar to the user requirements; the desired price of 4000 SEK is clearly closer to 4500 SEK than to 2000 SEK.
While mathematically sound, this is most probably not the desired behaviour from a user perspective; clearly, if the user can get the same specifications to a lower price, that is more desirable.
Therefore, it is reasonable to say that Item 2 has a higher utility than Item 1.
Utility metrics are highly dependent on the domain in which they are used, while a lower price is generally more desirable than a higher price, the opposite may be true for other attributes.
However, the distinction between similarity and utility can sometimes be unclear, and the notation varies between different contexts.
After these generic descriptions, we now move on to the definitions used in the remainder of this dissertation.
With the regard to similarity and utility functions, the following notation will be used:
\begin{itemize}
\item Similarity functions will compare \emph{individual attributes}.
Attributes can be both user attributes, and item attributes.
For consistency, the term \emph{similarity function} will be used both when comparisons are done based on similarity, or based on a metric that could be considered more utility-based.
\item Utility functions will be used to describe the utility of a \emph{complete item} for a specific user.
To calculate the utility, it will apply similarity functions to the individual attributes, and combine the similarities to derive a utility metric.
\end{itemize}
The use is visualized in Figure~\ref{fig:kappa-simutilitydefs}.
\begin{figure}[htb]
\centering
\begin{tikzpicture}[scale=1, every node/.style={scale=1}, node distance=1.2cm, font=\small\sffamily,
item/.style={anchor=west,minimum width=2cm, minimum height=1.9cm,text width=1.8cm,draw},
attr/.style={anchor=north east,minimum width=0.85cm,minimum height=0.45cm,draw,font=\scriptsize}]
\node[item] (item1) {Item};
\node[item,below=of item1] (user1) {User};