This repository has been archived by the owner on Sep 27, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
faq-bits+pieces.tex
1174 lines (1032 loc) · 52.1 KB
/
faq-bits+pieces.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% $Id: faq-bits+pieces.tex,v 1.32 2014/01/28 18:17:36 rf10 Exp rf10 $
\section{Bits and pieces of \AllTeX{}}
\Question[Q-dvi]{What is a \acro{DVI} file?}
`\acro{DVI}' is supposed to be an acronym for
\acro{D}e\acro{V}ice-\acro{I}ndependent, meaning that the file may be
processed for printing or viewing on most kinds of typographic output
device or display.
A \acro{DVI} file (that is, a file with the type or extension
\extension{dvi}) is the main output file of ``original'' \TeX{} (later
\TeX{}-like systems, such as \Qref*{\pdftex{}}{Q-whatpdftex} may use
other formats).
A \acro{DVI} file contains all the information that is needed for
printing or previewing, except for the actual bitmaps or outlines of
fonts, and any material to be introduced by means of % !line break
\Qref*{\csx{special} commands}{Q-specials}. Characters in the
\acro{DVI} file (representing glyphs for printing or display) appear
in an encoding determined in the document.
Any \TeX{} input file should produce the same \acro{DVI} file
regardless of which implementation of \TeX{} is used to produce it.
An \acro{DVI} file may be processed by a \Qref*{DVI driver}{Q-driver}
to produce further output designed specifically for a particular
printer, or for output in another format (for distribution), or it may
be used by a previewer for display on a computer screen.
Note that \Qref*{\xetex{}}{Q-xetex} (released some time after
\pdftex{}) uses an ``extended \acro{DVI} format'' (\acro{XDV}) to send
its output to a close-coupled \Qref*{\acro{DVI} driver}{Q-driver},
\ProgName{xdvipdfmx}.
The canonical reference for the structure of a \acro{DVI} file is the
source of Knuth's program \ProgName{dvitype} (whose original purpose,
as its name implies, was to view the content of a \acro{DVI} file).
A partially complete ``standard'' for the way they should be
processed may offer further enlightenment.
\begin{ctanrefs}
\item[\nothtml{rmfamily}DVI processing standard]\CTANref{dvistd}
\item[dvitype]\CTANref{dvitype}
\end{ctanrefs}
\LastEdit{2013-03-15}
\Question[Q-driver]{What is a \acro{DVI} driver?}
A \acro{DVI} driver is a program that takes as input a
\Qref*{\acro{DVI} file}{Q-dvi}
and (usually) produces a file in a format that something \emph{other}
than a \TeX{}-related program can process.
A driver may be designed for producing output for printing (e.g.,
\PS{}), for later processing (e.g., \PS{} for inclusion in a later
document), or for document exchange (e.g., \acro{PDF}).
As well as the \acro{DVI} file, the driver typically also needs font
information. Font information may be held as bitmaps or as outlines,
or simply as a set of pointers into the fonts that a printer itself
provides. Each driver will expect the font information in a particular
form.
For more information on the forms of font information, see
\Qref[questions]{\acro{PK} files}{Q-pk},
% ! line break
\Qref[]{\acro{TFM} files}{Q-tfm},
\Qref[]{virtual fonts}{Q-virtualfonts}
and \Qref[]{Using \PS{} fonts with \TeX{}}{Q-usepsfont}.
\LastEdit{2011-10-10}
\Question[Q-pk]{What are \acro{PK} files?}
\acro{PK} files (packed raster) are the canonical form of \TeX{} font
bitmaps. The output from \Qref*{\MF{}}{Q-useMF} includes a generic
font (\acro{GF}) file and the utility \ProgName{gftopk} produces a
\acro{PK} file from that.
There are potentially a lot of \acro{PK} files, as one
is needed for each font: that is for each magnification of each
design (point) size for each weight for each font in each family.
Further, since the \acro{PK} files for one printer do not necessarily
work well for another, the whole set needs to be duplicated for each
printer type at a site.
While this menagerie of bitmaps can (in principle) provide fonts that
are closely matched to the capabilities of each printer, the size of
the collection (and the resulting difficulty of maintaining it) has
been a potent driver to the move towards outline fonts such as
\Qref*{Adobe Type 1 fonts}{Q-adobetypen}.
\LastEdit{2012-10-20}
\Question[Q-tfm]{What are \acro{TFM} files?}
\acro{TFM} is an acronym for `\TeX{} Font Metrics'; \acro{TFM} files hold
information about the sizes of the characters of the font in question,
and about ligatures and kerns within that font. One \acro{TFM} file is
needed for each font used by \TeX{}, that is for each design (point)
size for each weight for each family; each \acro{TFM} file serves for all
magnifications of `its' font, so that there are (typically) fewer
\acro{TFM} files than there are \Qref*{\acro{PK}}{Q-pk} files. \TeX{},
\LaTeX{}, etc.,\@
themselves need only know about the sizes of characters and their
interactions with each other, but not what characters look like. By
contrast, \acro{TFM} files are not, in principle, needed by the
\acro{DVI} driver, which only needs to know about the glyphs that each
character selects, so as to print or display them.
Note that TrueType and OpenType fonts contain the necessary metrics,
so that \Qref{\xetex{}}{Q-xetex} and \Qref{\LuaTeX{}}{Q-luatex}, using
such fonts, have no need of \acro{TFM} files. A corollary of this is
that setting up fonts for use by these engines is far \emph{easier}.
\LastEdit{2012-10-20}
\Question[Q-virtualfonts]{What are virtual fonts?}
Virtual fonts provide a means of collecting bits and pieces together
to make the glyphs of a font: the bits and pieces may be glyphs from
``other'' fonts, rules and other ``basic'' typesetting commands, and
the positioning information that specifies how everything comes
together.
An early instance of something like virtual fonts for \TeX{} was
implemented by David Fuchs to use an unusual printer. However, for
practical purposes for the rest of us, virtual fonts date from when Knuth
specified a format and wrote some support software, in 1989 (he
published an % ! line break
\href{http://tug.org/TUGboat/tb11-1/tb27knut.pdf}{article in \textsl{TUGboat}}
at the time; a plain text copy is available on \acro{CTAN}).
Virtual fonts provide a way of telling \TeX{} about something more
complicated than just a one-to-one character mapping. \TeX{} reads a
\acro{TFM} file of the font, just as before, but the \acro{DVI}
processor will read the \acro{VF} and use its content to specify how
each glyph is to be processed.
The virtual font may contain commands:
\begin{itemize}
\item to `open' one or more (real) fonts for subsequent use,
\item to remap a glyph from one of the (real) fonts for use in the
virtual font,
\item to build up a more complicated effect (using \acro{DVI} commands).
\end{itemize}
% !this has to be generated as a new paragraph by the translator, so
% leave the blank line in place
In practice, the most common use of virtual fonts is to remap
Adobe Type 1 fonts (see \Qref[question]{font metrics}{Q-metrics}),
though there has also been useful useful work building `fake' maths
fonts (by bundling glyphs from several fonts into a single virtual
font). Virtual Computer Modern fonts, making a % ! line break
\Qref*{Cork encoded}{Q-ECfonts} font from Knuth's originals by using
remapping and fragments of \acro{DVI} for single-glyph `accented
characters', were the first ``Type~1 format'' Cork-encoded Computer
Modern fonts available.
Virtual fonts are normally created in a single \acro{ASCII} \acro{VPL}
(Virtual Property List) file, which includes two sets of information.
The \ProgName{vptovf} utility will use the \acro{VPL} file to create
the binary \acro{TFM} and \acro{VF} files.
A ``how-to'' document, explaining how to generate a \acro{VPL},
describes the endless hours of fun that may be had, doing the job by
hand. Despite the pleasures to be had, the commonest way (nowadays)
of generating an \acro{VPL} file is to use the
\ProgName{fontinst} package, which is described in more detail
\htmlonly{together with the discussion of}
\Qref[in answer]{\PS{} font metrics}{Q-metrics}.
\Package{Qdtexvpl} is another utility for creating ad-hoc virtual
fonts (it uses \TeX{} to parse a description of the virtual font, and
\ProgName{qdtexvpl} itself processes the resulting \acro{DVI} file).
\begin{ctanrefs}
\item[fontinst]\CTANref{fontinst}
\item[\nothtml{\rmfamily}Knuth on virtual fonts]\CTANref{vf-knuth}
\item[\nothtml{\rmfamily}Virtual fonts ``how to'']\CTANref{vf-howto}
\item[qdtexvpl]\CTANref{qdtexvpl}
\end{ctanrefs}
\LastEdit{2012-10-20}
\Question[Q-whatmacros]{What are (\TeX{}) macros}
\TeX{} is a \emph{macro processor}: this is a computer-science-y term
meaning ``text expander'' (more or less); \TeX{} typesets text as it
goes along, but \emph{expands} each macro it finds. \TeX{}'s macros
may include instructions to \TeX{} itself, on top of the simple text
generation one might expect.
Macros are a \emph{good thing}, since they allow the user to
manipulate documents according to context. For example, the macro
\csx{TeX} is usually defined to produce ``TEX'' with the `E' lowered
(the original idea was Knuth's),
but in these \acro{FAQ}s the default definition of the macro is
overridden, and it simply expands to the letters ``TeX''. (\emph{You}
may not think this a good thing, but the author of the macros has his
reasons~-- see \Qref[question]{\TeX{}-related logos}{Q-logos}.)
Macro names are conventionally built from a \texttt{\textbackslash }
followed by a sequence of letters, which may be upper or lower case
(as in \csx{TeX}, mentioned above). They may also be % ! line break
\texttt{\textbackslash \meta{any single character}}, which allows all
sorts of oddities (many built in to most \TeX{} macro sets, all the
way up from the apparently simple `\csx{ }' meaning ``insert a space
here'').
Macro programming can be a complicated business, but at their very
simplest they need little introduction~--- you'll hardly need to be
told that:
\begin{quote}
\begin{verbatim}
\def\foo{bar}
\end{verbatim}
\end{quote}
replaces each instance of \csx{foo} with the text ``bar''. The
command \csx{def} is \plaintex{} syntax for defining commands;
\LaTeX{} offers a macro \csx{newcommand} that goes some way towards
protecting users from themselves, but basically does the same thing:
\begin{quote}
\begin{verbatim}
\newcommand{\foo}{bar}
\end{verbatim}
\end{quote}
Macros may have ``arguments'' , which are used to substitute for marked
bits of the macro expansion:
\begin{quote}
\begin{verbatim}
\def\foo#1{This is a #1 bar}
...
\foo{2/4}.
\end{verbatim}
\end{quote}
which produces:
\begin{quote}
This is a 2/4 bar.
\end{quote}
or, in \LaTeX{} speak:
\begin{quote}
\begin{verbatim}
\newcommand{\foo}[1]{This is a #1 bar}
...
\foo{3/4}.
\end{verbatim}
\end{quote}
which produces:
\begin{quote}
This is 3/4 bar.
\end{quote}
(\LaTeX{} users waltz through life, perhaps?)
You will have noticed that the arguments, above, were enclosed in
braces (\texttt{\obracesymbol{}\dots{}\cbracesymbol{}}); this is the
normal way of typing arguments, though \TeX{} is enormously flexible,
and you may find all sorts of other ways of passing arguments (if you
stick with it).
Macro writing can get very complicated, very quickly. If you are a
beginner \AllTeX{} programmer, you are well advised to read something
along the lines of the \Qref*{\TeX{}book}{Q-tex-books}; once you're under
way, \Qref*{\TeX{} by Topic}{Q-ol-books} is possibly a more satisfactory
choice. Rather a lot of the answers in these \acro{FAQ}s tell you
about various issues of how to write macros.
\LastEdit{2011-10-12}
\Question[Q-specials]{\csx{special} commands}
\TeX{} provides the means to express things that device drivers can
do, but about which \TeX{} itself knows nothing. For example, \TeX{}
itself knows nothing about how to include \PS{} figures into
documents, or how to set the colour of printed text; but some device
drivers do.
Instructions for such things are introduced to your document by means
of \csx{special} commands; all that \TeX{} does with these commands is
to expand their
arguments and then pass the command to the \acro{DVI} file. In most
cases, there are macro packages provided (often with the driver) that
provide a human-friendly interface to the \csx{special}; for example,
there's little point including a figure if you leave no gap for it in
your text, and changing colour proves to be a particularly fraught
operation that requires real wizardry. \LaTeXe{}
has standard graphics and colour packages that make figure inclusion,
rotation and scaling, and colour typesetting relatively
straightforward, despite the rather daunting \csx{special} commands
involved. (\CONTeXT{} provides similar support, though not by way of
packages.)
The allowable arguments of \csx{special} depend on the device driver
you're using. Apart from the examples above, there are \csx{special}
commands in the em\TeX{} drivers (e.g., \ProgName{dvihplj}, \ProgName{dviscr},
\emph{etc}.)~that will draw lines at arbitrary orientations, and
commands in \ProgName{dvitoln03} that permit the page to be set in
landscape orientation.
Note that \csx{special} behaves rather differently in \PDFTeX{}, since
there is no device driver around. There \emph{is} a concept of
\acro{PDF} specials, but in most cases \csx{special} will provoke a
warning when used in \PDFTeX{}.
\LastEdit{2011-10-15}
\Question[Q-write]{Writing (text) files from \TeX{}}
\TeX{} allows you to write to output files from within your document.
The facility is handy in many circumstances, but it is vital for
several of the things \LaTeX{} (and indeed almost any higher-level
\TeX{}-based macro package) does for you.
The basic uses of writing to an external file are ``obvious''~---
remembering titles of sections for a table of contents, remembering
label names and corresponding section or figure numbers, all for a
later run of your document. However, the ``non-obvious'' thing is
easy to forget: that page numbers, in \TeX{}, are slippery beasts, and
have to be captured with some care. The trick is that \csx{write}
operations are only executed as the page is sent to the \acro{DVI}
or \acro{PDF} file. Thus, if you arrange that your page-number macro
(\csx{thepage}, in \LaTeX{}) is not expanded until the page is
written, then the number written is correct, since that time is where
\TeX{} guarantees the page number tallies with the page being sent
out.
Now, there are times when you want to write something straight away:
for example, to interact with the user. \TeX{} captures that
requirement, too, with the primitive command \csx{immediate}:
\begin{quote}
\begin{verbatim}
\immediate\write\terminal{I'm waiting...}
\end{verbatim}
\end{quote}
writes a ``computer-irritates-user'' message, to the terminal.
Which brings us to the reason for that \csx{terminal}. \TeX{} can
``\csx{write}'' up to 16 streams simultaneously, and that argument to
\csx{write} says which is to be used. Macro packages provide the
means of allocating streams for your use: \plaintex{} provides a macro
\csx{newwrite} (used as ``\csx{newwrite}\csx{streamname}'', which sets
\csx{streamname} as the stream number). In fact, \csx{terminal} (or
its equivalent) is the first output stream ever set up (in most macro
packages): it is never attached to a file, and if \TeX{} is asked to
write to \emph{any} stream that isn't attached to a file it will send
the output to the terminal (and the log).
\LastEdit{2011-10-15}
\Question[Q-spawnprog]{Spawning programs from \AllTeX{}: \csx{write18}}
The \TeX{} \Qref*{\csx{write} primitive instruction}{Q-write} is used
to write to different file `streams'; TeX refers to each open file by
a number, not by a file name (although most of the time we hide this).
Originally, \TeX{} would write to a file connected to a stream
numbered 0--15. More recently, a special ``stream 18'' has been
implemented: it is not writing to a file, but rather tells TeX to ask
the operating system to do something. To run a command, we put it as
the argument to \csx{write18}. So to run the \progname{epstopdf}
utility on a file with name stored as \csx{epsfilename}, we would
write:
\begin{quote}
\begin{verbatim}
\write18{epstopdf \epsfilename}
\end{verbatim}
\end{quote}
When using something like the \Package{epstopdf} package, the `stream'
write operation is hidden away and you don't need to worry about the
exact way it's done.
However, there is a security issue. If you download some \alltex{} code from
the Internet, can you be sure that there is not some command in it
(perhaps in a hidden way) to do stuff that might be harmful to your
computer (let's say: delete everything on the hard disk!)? In the
face of this problem, both \miktex{} and \TeX{}~Live have, for some
time, disabled \csx{write18} by default. To turn the facility on,
both distributions support an additional argument when starting \TeX{}
from the command shell:
\begin{quote}
\begin{verbatim}
(pdf)(la)tex --shell-escape <file>
\end{verbatim}
\end{quote}
The problem with this is that many people use \alltex{} via a graphical
editor, so to use \csx{write18} for a file the editor's settings must
be changed. Of course, the settings need restoring after the file is
processed: you defeat the point of the original protection, that way.
The latest \miktex{} (version 2.9), and recent \TeX{}~Live (from the
2010 release) get
around this by having a special ``limited'' version of \csx{write18}
enabled `out of the box'. The idea is to allow only a pre-set list of
commands (for example, \BibTeX{}, \progname{epstopdf}, \TeX{} itself,
and so on). Those on the list are regarded as safe enough to allow,
whereas anything else (for example deleting files) still needs to be
authorised by the user. This seems to be a good balance: most people
most of the time will not need to worry about \csx{write18} at all,
but it will be available for things like \Package{epstopdf}.
Note that the \TeX{} system may tell you that the mechanism is in use:
\begin{wideversion}
\begin{quote}
\begin{verbatim}
This is pdfTeX, Version 3.1415926-1.40.11 (TeX Live 2010)
restricted \write18 enabled.
\end{verbatim}
\end{quote}
\end{wideversion}
\begin{narrowversion}
\begin{quote}
\begin{verbatim}
This is pdfTeX, Version 3.1415926-1.40.11
(TeX Live 2010)
restricted \write18 enabled.
\end{verbatim}
\end{quote}
\end{narrowversion}
when it starts.
\begin{ctanrefs}
\item[epstopdf.sty]Distributed with Heiko Oberdiek's packages
\CTANref{oberdiek}[epstopdf-pkg]
\end{ctanrefs}
\LastEdit{2012-12-03}
\Question[Q-hyphen]{How does hyphenation work in \TeX{}?}
Everyone knows what hyphenation is: we see it in most books we read,
and (if we're alert) will spot occasional ridiculous mis-hyphenation
(at one time, British newspapers were a fertile source).
Hyphenation styles are culturally-determined, and the same language
may be hyphenated differently in different countries~--- for example,
British and American styles of hyphenation of English are very
different. As a result, a typesetting system that is not restricted
to a single language at a single locale needs to be able to change its
hyphenation rules from time to time.
\TeX{} uses a pretty good system for hyphenation (originally designed
by Frank Liang~--- you may view his % ! line break
\href{http://tug.org/docs/liang/}{Ph.D.\ thesis} online) and while
it's capable of missing ``sensible'' hyphenation points, it seldom
selects grossly wrong ones. The
algorithm matches candidates for hyphenation against a set of
``hyphenation patterns''. The candidates for hyphenation must be
sequences of letters (or other single characters that \TeX{} may be
persuaded to think of as letters). Non-letters interrupt hyphenation;
this applies to \TeX{}'s \csx{accent} primitive (as in `syst\`eme')
just as much as the exclamation in`syst!eme'.
(Hyphenation takes place on the characters ``sent to the printer''.
The problem with \csx{accent} is avoided~---in \LaTeX{}~--- by the use
of the \Package{fontenc} package, as discussed in % ! line break
``\Qref*{Accented words aren t hyphenated}{Q-hyphenaccents}''.)
Sets of hyphenation patterns are usually derived from analysis of
a list of valid hyphenations (the process of derivation, using a tool
called \Package{patgen}, is not ordinarily a sport to be played by
ordinary mortals).
The patterns for the languages a \TeX{} system is going to deal with
may only be loaded when the system is installed. To change the set of
hyphenation patterns recognised by a \TeX{}-based or \xetex{} system,
a \Qref*{partial reinstallation}{Q-newlang} is necessary (note that
\Qref*{\LuaTeX{}}{Q-luatex} relaxes this constraint).
\TeX{} provides two ``user-level'' commands for control of
hyphenation: \csx{language} (which selects a hyphenation style), and
\csx{hyphenation} (which gives explicit instructions to the hyphenation
engine, overriding the effect of the patterns).
The ordinary \LaTeX{} user need not worry about \csx{language}, since
it is very thoroughly managed by the \Package{babel} package; use of
\csx{hyphenation} is discussed in
\begin{wideversion}
the context of
\end{wideversion}
% beware line wrap
\Qref[question]{hyphenation failure}{Q-nohyph}.
\LastEdit{2012-12-03}
\Question[Q-clsvpkg]{What are \LaTeX{} classes and packages?}
\LaTeX{} aims to be a general-purpose document processor. Such an aim
could be achieved by a selection of instructions which would enable
users to use \TeX{} primitives, but such a procedure is considered too
inflexible (and probably too daunting for ordinary users). Thus the
designers of \LaTeX{} created a model which offered an
\emph{abstraction} of the design of documents. Obviously, not all
documents can look the same (even with the defocussed eye of
abstraction), so the model uses \emph{classes} of document. Base
\LaTeX{} offers five classes of document: \Class{book},
\Class{report}, \Class{article} and \Class{letter}.
For each class, \LaTeX{} provides a \emph{class file}; the user
arranges to use it via a \csx{documentclass} command at the top of the
document. So a document starting
\begin{quote}
\cmdinvoke{documentclass}{article}
\end{quote}
may be called ``an \emph{article} document''.
This is a good scheme, but it has a glaring flaw: the actual
typographical designs provided by the \LaTeX{} class files aren't
widely liked. The way around this is to \emph{refine} the class. To
refine a class, a programmer may write a new class file that loads an
existing class, and then does its own thing with the document design.
If the user finds such a refined class, all is well, but if not, the
common way is to load a \emph{package} (or several).
The \LaTeX{} distribution, itself, provides rather few package files,
but there are lots of them, by a wide variety of authors, to be found
on the archives. Several packages are designed just to adjust the
design of a document~--- using such packages achieves what the
programmer might have achieved by refining the class.
Other packages provide new facilities: for example, the
\Package{graphics} package (actually provided as part of any \LaTeX{}
distribution) allows the user to load externally-provided graphics
into a document, and the \Package{hyperref} package enables the user
to construct hyper-references within a document.
On disc, class and package files only appear different by virtue of
their name ``extension''~--- class files are called \File{*.cls} while
package files are called \File{*.sty}. Thus we find that the \LaTeX{}
standard \Class{article} class is represented on disc by a file called
\File{article.cls}, while the \Package{hyperref} package is
represented on disc by a file called \File{hyperref.sty}.
The class vs.~package distinction was not clear in \LaTeXo{}~---
everything was called a style (``document style'' or ``document style
option''). It doesn't really matter that the nomenclature has
changed: the important requirement is to understand what other people
are talking about.
\LastEdit{2013-10-21}
\Question[Q-whatenv]{What are \LaTeX{} ``environments''}
While \TeX{} makes direct provision for commands, \LaTeX{} adds a
concept of ``environment''; environments perform an action on a block
(of something or other) rather than than just doing something at one
place in your document.
A totally trivial environment could change the font in use for a chunk
of text, as
\begin{quote}
\begin{verbatim}
\newenvironment{monoblock}%
{\ttfamily}%
{}
\end{verbatim}
\end{quote}
which defines a \environment{monoblock} which may be used as
\begin{quote}
\begin{verbatim}
\begin{monoblock}
some text set in monospace
\end{monoblock}
\end{verbatim}
\end{quote}
which will look like:
\begin{quote}
\texttt{some text set in monospace}
\end{quote}
so it is a particularly simple example. A rather complicated
environment is introduced by \cmdinvoke{begin}{document}; it looks
simple, but needs all sorts of special \TeX{} code to make it work
`transparently'; most environments are more elaborate than
\environment{monoblock} and \emph{much} simpler than
\environment{document}.
An environment puts its content inside a \TeX{} \emph{group}, so that
commands used inside the environment don't `leak out'~--- the
\environment{monoblock} environment, above, restricts its effect to
its own contents (the stuff between the \cmdinvoke{begin}{monoblock}
and \cmdinvoke{end}{monoblock}), which is just what you need for this
sort of thing.
So that's ``simple'' environments; the \environment{monoblock}, above
doesn't actually gain us much over
\begin{quote}
\begin{verbatim}
{\ttfamily some text set in monospace}
\end{verbatim}
\end{quote}
though in fact many useful environments are just as simple (to look
at). Some, such as \environment{verbatim}, look simple but are
actually very tricky inside.
\LaTeX{} also allows arguments to an environment:
\begin{quote}
\begin{verbatim}
\newenvironment{fontblock}[1]%
{#1\selectfont}%
{}
\end{verbatim}
\end{quote}
and use of \environment{fontblock} as:
\begin{quote}
\begin{verbatim}
\begin{fontblock}{\ttfamily}
\end{verbatim}
\end{quote}
would produce the same effect as the \environment{monoblock}
environment.
Environments may also have optional arguments, in much the same way as
commands:
\begin{quote}
\begin{verbatim}
\newenvironment{normaltext}[1][\itshape]%
{#1}%
{}
\end{verbatim}
\end{quote}
which will ordinarily set its body in italic, but
\begin{quote}
\begin{verbatim}
\begin{normaltext}[\ttfamily]
...
\end{normaltext}
\end{verbatim}
\end{quote}
will observe its optional argument, and behave the same as the
\environment{monoblock} we started with.
Note that an environments argument(s) (mandatory or optional) are
\emph{not} passed to the `\csx{end}' text of the environment~--- that
is specified as a macro with no arguments, so that
\begin{quote}
\begin{verbatim}
\newenvironment{normaltext}[1][\itshape]%
{#1}%
{\typeout{what was #1, again?}
\end{verbatim}
\end{quote}
produces an error message
\begin{quote}
\begin{verbatim}
! Illegal parameter number in definition of \endnormaltext.
\end{verbatim}
\end{quote}
So, if you need to pass an environment argument to the end-code, you
have to wrap it in a macro of its own:
\begin{quote}
\begin{verbatim}
\newenvironment{normaltext}[1][Intro]%
{#1%
\newcommand{\foo}{#1}}%
{\typeout{what was \foo{}, again?}
\end{verbatim}
\end{quote}
\LastEdit*{2013-02-20}
\Question[Q-dtx]{Documented \LaTeX{} sources (\extension{dtx} files)}
\LaTeXe{}, and many contributed \LaTeX{} macro packages, are written
in a \Qref*{literate programming style}{Q-lit}, with source and
documentation in the
same file. This format in fact originated before the
days of the \LaTeX{} project as one of the ``Mainz'' series of
packages. A documented source file conventionally has the suffix
\extension{dtx}, and will normally be `stripped' before use with
\LaTeX{}; an installation (\extension{ins}) file is normally provided,
to automate this process of removing comments for speed of loading.
If the \extension{ins} file is available, you may process \emph{it}
with \LaTeX{} to produce the package (and, often, auxiliary files).
Output should look something like:
\begin{quote}
\begin{verbatim}
Generating file(s) ./foo.sty
Processing file foo.dtx (package) -> foo.sty
File foo.dtx ended by \endinput.
Lines processed: 2336
Comments removed: 1336
Comments passed: 2
Codelines passed: 972
\end{verbatim}
\end{quote}
The lines ``\texttt{Processing \dots{}\ ended by \csx{endinput}}'' may
be repeated if the \extension{dtx} file provides more than one
`unpacked' file.
To read the comments ``as a document'', you can run \LaTeX{} on the
\extension{dtx} file to produce a nicely formatted version of the
documented code. (Most \LaTeX{} packages on \ctan{}, nowadays, already
have \acro{PDF} of the result of processing the \extension{dtx} file,
as ``documentation''.)
Several packages may be included in one \extension{dtx} file, with
conditional sections, and there are facilities for indexes of macros,
etc. All of this m\'elange is sorted out by directives in the
\extension{ins} file; conventional indexing utilities may be necessary
for ``full'' output.
Anyone may write \extension{dtx} files; the format is explained in
\Qref*{The \LaTeX{} Companion}{Q-latex-books}, and a tutorial is available
from \acro{CTAN} (which comes with skeleton \extension{dtx} and
\extension{ins} files).
Composition of \extension{dtx} files is supported in \ProgName{emacs} by
\Qref*{\acro{AUC}-\TeX{}}{Q-editors}.
The (unix-based) script \ProgName{dtxgen} generates a proforma basic
\extension{dtx} file, which could be useful when starting a new
project.
Another route to an \extension{dtx} file is to write the
documentation and the code separately, and then to combine them using
the \ProgName{makedtx} system. This technique has particular value in
that the documentation file can be used separately to generate
\acro{HTML} output; it is often quite difficult to make % ! line break
\Qref*{\LaTeX{} to \acro{HTML} conversion}{Q-LaTeX2HTML} tools deal
with \extension{dtx} files, since they use an unusual class file.
The \ProgName{sty2dtx} system goes one step further: it attempts to
create a \extension{dtx} file from a `normal' \extension{sty} file
with comments. It works well, in some circumstances, but can become
confused by comments that aspire to ``structure'' (e.g., tabular
material, as in many older packages' file headers).
The \extension{dtx} files are not used by \LaTeX{} after they have been
processed to produce \extension{sty} or \extension{cls} (or whatever)
files. They need not be kept with the working system; however, for
many packages the \extension{dtx} file is the primary source of
documentation, so you may want to keep \extension{dtx} files elsewhere.
An interesting sideline to the story of \extension{dtx} files is the
\Package{docmfp} package, which extends the model of the \Package{doc}
package to
\begin{flatversion}
\MF{} and \MP{} (\Qref[see questions]{}{Q-MF} and \Qref[\nothtml]{}{Q-MP})
\end{flatversion}
\begin{hyperversion}
\Qref{\MF{}}{Q-MF} and \Qref{\MP{}}{Q-MP},
\end{hyperversion}
thus permitting documented distribution of bundles containing code for
\MF{} and \MP{} together with related \LaTeX{} code.
\begin{ctanrefs}
\item[AUC-TeX]\CTANref{auctex}
\item[clsguide.pdf]\CTANref{clsguide}
\item[docmfp.sty]\CTANref{docmfp}
\item[docstrip.tex]Part of the \LaTeX{} distribution
\item[DTX tutorial]\CTANref{dtxtut}
\item[dtxgen]\CTANref{dtxgen}
\item[makedtx]\CTANref{makedtx}
\item[sty2dtx]\CTANref{sty2dtx}
\end{ctanrefs}
\LastEdit{2014-06-03}
\Question[Q-whatenc]{What are encodings?}
Let's start by defining two concepts, the \emph{character} and the
\emph{glyph}.
The character is the abstract idea of the `atom' of a
language or other dialogue: so it might be a letter in an alphabetic
language, a syllable in a syllabic language, or an ideogram in an
ideographic language. The glyph is the mark created on screen or
paper which represents a character. Of
course, if reading is to be possible, there must be some agreed
relationship between the glyph and the character, so while the precise
shape of the glyph can be affected by many other factors, such as the
capabilities of the writing medium and the designer's style, the
essence of the underlying character must be retained.
Whenever a computer has to represent characters, someone has to define
the relationship between a set of numbers and the characters they
represent. This is the essence of an encoding: it is a mapping
between a set of numbers and a set of things to be represented.
\TeX{} of course deals in encoded characters all the time: the
characters presented to it in its input are encoded, and it emits
encoded characters in its \acro{DVI} or \acro{PDF} output. These
encodings have rather different properties.
The \TeX{} input stream was pretty unruly back in the days when Knuth
first implemented the language. Knuth himself prepared documents on
terminals that produced all sorts of odd characters, and as a result
\TeX{} contains some provision for translating its input (however
encoded) to something regular. Nowadays,
the operating system translates keystrokes into a code appropriate for
the user's language: the encoding used is usually a national or
international standard, though some operating systems use ``code
pages'' (as defined by Microsoft). These standards and code pages often
contain characters that may not appear in the \TeX{} system's input
stream. Somehow, these characters have to be dealt with~--- so
an input character like ``\'e'' needs to be interpreted by \TeX{} in
a way that that at least mimics the way it interprets ``\csx{'}\texttt{e}''.
The \TeX{} output stream is in a somewhat different situation:
characters in it are to be used to select glyphs from the fonts to be
used. Thus the encoding of the output stream is notionally a font
encoding (though the font in question may be a
% beware line break (twice)
\nothtml{virtual one~--- see }%
\Qref[question]{virtual font}{Q-virtualfonts}). In principle, a
fair bit of what appears in the output stream could be direct
transcription of what arrived in the input, but the output stream
also contains the product of commands in the input, and translations
of the input such as ligatures like %
\texttt{fi}\nothtml{\ensuremath\Rightarrow``fi''}.
Font encodings became a hot topic when the
\Qref*{Cork encoding}{Q-ECfonts}
appeared, because of the possibility of suppressing
\csx{accent} commands in the output stream (and hence improving the
quality of the hyphenation of text in inflected languages, which is
interrupted by the \csx{accent} commands~--- see
% beware line break
\Qref[question]{``how does hyphenation work''}{Q-hyphen}).
To take advantage of the diacriticised characters represented in the
fonts, it is necessary to arrange that whenever the
command sequence ``\csx{'}\texttt{e}'' has been input
(explicitly, or implicitly via the sort of mapping of input mentioned
above), the character that codes the position of the ``\'e'' glyph is
used.
Thus we could have the odd arrangement that the diacriticised character in
the \TeX{} input stream is translated into \TeX{} commands that would
generate something looking like the input character; this sequence of
\TeX{} commands is then translated back again into a single
diacriticised glyph as the output is created. This is in fact
precisely what the \LaTeX{} packages \Package{inputenc} and
\Package{fontenc} do, if operated in tandem on (most) characters in
the \acro{ISO}~Latin-1 input encoding and the \acro{T}1 font encoding.
At first sight, it seems eccentric to have the first package do a thing, and
the second precisely undo it, but it doesn't always happen that way:
most font encodings can't match the corresponding input encoding
nearly so well, and the two packages provide the sort of symmetry the
\LaTeX{} system needs.
\Question[Q-ECfonts]{What are the \acro{EC} fonts?}
A font provides a number of \emph{glyphs}. In order that the glyphs
may be printed, they are \Qref*{\emph{encoded}}{Q-whatenc}, and the
encoding is used as an index into tables within the font. For various
reasons, Knuth chose deeply eccentric encodings for his Computer
Modern family of fonts; in particular, he chose different encodings
for different fonts, so that the application using the fonts has to
remember which font of the family it's using before selecting a
particular glyph.
When \TeX{} version 3 arrived, most of the drivers for the
eccentricity of Knuth's encodings went away, and at \acro{TUG}'s Cork
meeting, an encoding for a set of 256 glyphs, for use in \TeX{} text,
was defined. The intention was that these glyphs should cover `most'
European languages that use Latin alphabets, in the sense of including
all accented letters needed. (Knuth's \acro{CMR} fonts missed things
necessary for Icelandic and Polish, for example, which the Cork fonts
do have, though even Cork encoding's coverage isn't complete.)
\LaTeX{} refers to the Cork encoding as \acro{T}1, and provides the
means to use fonts thus encoded to avoid problems with the interaction
of accents and hyphenation % ! line break
(see \Qref[question]{hyphenation of accented words}{Q-hyphenaccents}).
The first \MF{}-fonts to conform to the Cork encoding were the \acro{EC}
fonts. They look \acro{CM}-like, though their metrics differ from \acro{CM}-font
metrics in several areas. They have long been regarded as `stable' (in
the same sense that the \acro{CM} fonts are stable: their metrics are
unlikely ever to change). Each \acro{EC} font is, of course, roughly twice the
size of the corresponding \acro{CM} font, and there are far more of them than
there are CM fonts. The simple number of fonts proved problematic in
the production of Type~1 versions of the fonts, but \acro{EC} or
\acro{EC}-equivalent fonts in Type~1 or TrueType form (the latter only from
\begin{wideversion}
\Qref{commercial suppliers}{Q-commercial}).
\end{wideversion}
\begin{narrowversion}
% ( <- paren matching
commercial suppliers~--- \Qref{question}{Q-commercial}).
\end{narrowversion}
Free \Qref*{auto-traced versions}{Q-textrace}~--- the \acro{CM}-super and
the \acro{LGC} fonts, and the Latin Modern series (rather directly generated
from Metafont sources), are available.
Note that the Cork encoding doesn't cover mathematics (so that no
``T1-encoded'' font families can not support it). If you're using
Computer-Modern-alike fonts, this doesn't actually matter: your system
will have the original Computer Modern mathematical fonts (or the
those distributed with the Latin Modern set), which cover `basic' \TeX{}
mathematics; more advanced mathematics are likely to need separate
fonts anyway. Suitable mathematics fonts for use with other font
families are discussed in % ! line break
``\Qref*{choice of scalable fonts}{Q-psfchoice}''.
The \acro{EC} fonts are distributed with a set of `Text Companion' (\acro{TC}) fonts
that provide glyphs for symbols commonly used in text. The \acro{TC} fonts
are encoded according to the \LaTeX{} \acro{TS}1 encoding, and are not
necessarily as `stable' are the \acro{EC} fonts are. Note that modern
distributions tend not to distribute the \acro{EC} fonts in outline format, but
rather to provide Latin Modern for \acro{T}1-encoded Computer Modern-style
fonts. This can sometimes cause confusion when users are recompiling
old documents.
The Cork encoding is also implemented by virtual fonts provided in the
\acro{PSNFSS} system, for Adobe Type 1 fonts, and also by most other such
fonts that have been developed (or otherwise made available) for use
with \alltex{}.
Note that \acro{T}1 (and other eight-bit font encodings) are superseded in
the developing \TeX{}-family members \Qref*{\xetex{}}{Q-xetex} and
\Qref*{\LuaTeX{}}{Q-luatex}, which use Unicode as their base encoding,
and use Unicode-encoded fonts (typically in \FontFormat{ttf} or
\FontFormat{otf} formats). The \Package{cm-unicode} fonts carry the
flag in this arena, along with the Latin Modern set.
\begin{ctanrefs}
\item[CM-super fonts]\CTANref{cm-super}
\item[CM-LGC fonts]\CTANref{cm-lgc}
\item[CM unicode fonts]\CTANref{cm-unicode}
\item[EC and TC fonts]\CTANref{ec}
\item[Latin Modern fonts]\CTANref{lm}
\end{ctanrefs}
\Question[Q-unicode]{Unicode and \TeX{}}
Unicode is a character code scheme that has the capacity to express
the text of the languages of the world, as well as important symbols
(including mathematics). Any coding scheme that is directly
applicable to \TeX{} may be expressed in single bytes (expressing up
to 256 characters); Unicode characters may require several bytes, and
the scheme may express a very large number of characters.
For ``old-style'' applications (\TeX{} or \pdftex{}) to deal with
Unicode input, the sequence of bytes to make up Unicode character are
processed by a set of macros that deliver a glyph number in an
appropriate font. The macros that read these bytes is complicated,
and manifests as \pkgoption{utf8} option for the \LaTeX{} distribution
\Package{inputenc} package; the coverage of that option is limited to
Unicode characters that can be represented using ``\LaTeX{} standard
encodings''. The separate package \Package{ucs} provides wider, but
less robust, coverage via an \Package{inputenc} option
\pkgoption{utf8x}. As a general rule, you should never use
\pkgoption{utf8x} until you have convinced yourself that
\pkgoption{utf8} can not do the job for you.
`Modern' \TeX{}-alike applications, \Qref*{\xetex{}}{Q-xetex} and
\Qref*{\LuaTeX{}}{Q-luatex} read their input using \acro{UTF}-8
representations of Unicode as standard. They also use TrueType or
OpenType fonts for output; each such font has tables that tell the
application which part(s) of the Unicode space it covers; the tables
enable the engines to decide which font to use for which character
(assuming there is any choice at all).
\begin{ctanrefs}
\item[inputenc.sty]Part of the \CTANref{latex} distribution
\item[ucs.sty]\CTANref{ucs}
\end{ctanrefs}
\LastEdit{2012-04-20}
\Question[Q-tds]{What is the \acro{TDS}?}
\acro{TDS} is an acronym for ``\TeX{} Directory Structure''; it
specifies a standard way of organising all the \TeX{}-related files on
a computer system.
Most modern distributions arrange their \TeX{} files in conformance
with the \acro{TDS}, using both a `distribution' directory tree and a
(set of) `local' directory trees, each containing \TeX{}-related
files. The \acro{TDS} recommends the name \texttt{texmf} for the name
of the root directory (folder) of an hierarchy; in practice there are
typically several such trees, each of which has a name that compounds
that (e.g., \texttt{texmf-dist}, \texttt{texmf-var}).
Files supplied as part of the distribution are put into the
distribution's tree, but the location of the distribution's hierarchy is
system dependent. (On a Unix system it might be at
\path{/usr/share/texmf} or \path{/opt/texmf}, or a similar location.)
There may be more than one `local' hierarchy in which additional files
can be stored. An installation will also typically offer a local
hierarchy, while each user may have an individual local hierarchy.
The \acro{TDS} itself is published as the output of a \acro{TUG} % ! line break
\Qref*{Technical Working Group}{Q-TUG*}. You may browse an
\href{http://tug.org/tds/}{on-line version} of the standard, and
copies in several other formats (including source) are available on
\acro{CTAN}.
\begin{ctanrefs}
\item[\nothtml{\rmfamily}\acro{TDS} specification]\CTANref{tds}
\end{ctanrefs}
\Question[Q-eps]{What is ``Encapsulated \PS{}'' (``\acro{EPS}'')?}
\PS{} has been for many years a \emph{lingua franca} of powerful
printers (though modern high-quality printers now tend to require some
constrained form of Adobe Acrobat, instead); since \PS{} is also a
powerful graphical programming language, it is commonly used as an
output medium for drawing (and other) packages.
However, since \PS{} \emph{is} such a powerful language, some
rules need to be imposed, so that the output drawing may be included
in a document as a figure without ``leaking'' (and thereby destroying
the surrounding document, or failing to draw at all).
Appendix \acro{H} of the \PS{} Language Reference Manual (second
and subsequent editions), specifies a set of rules for \PS{} to
be used as figures in this way. The important features are:
\begin{itemize}
\item certain ``structured comments'' are required; important ones are
the identification of the file type, and information about the
``bounding box'' of the figure (i.e., the minimum rectangle
enclosing it);
\item some commands are forbidden~--- for example, a \texttt{showpage}
command will cause the image to disappear, in most \TeX{}-output
environments; and
\item ``preview information'' is permitted, for the benefit of things
such as word processors that don't have the ability to draw