Files
This branch is up to date with farisawan-2000/cfront-3:master.
demangler
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
/*ident "@(#)cls4:tools/demangler/README.dem 1.5" */ ############################################################################### # # C++ source for the C++ Language System, Release 3.0. This product # is a new release of the original cfront developed in the computer # science research center of AT&T Bell Laboratories. # # Copyright (c) 1993 UNIX System Laboratories, Inc. # Copyright (c) 1991, 1992 AT&T and UNIX System Laboratories, Inc. # Copyright (c) 1984, 1989, 1990 AT&T. All Rights Reserved. # # THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE of AT&T and UNIX System # # Laboratories, Inc. The copyright notice above doesnotevidence # any actual or intended publication of such source code. # ############################################################################### Demangler Release Notes for 3.0.1 January 1992 INTRODUCTION This is the first release of the new C++ external name demangler. It was rewritten for a couple of reasons, notably the desire to separate the unmangling and printing functions and the need to fully support template names and nested class names. In what follows it is assumed that the reader is familiar with page 122 and following of the Annotated C++ Reference Manual, where the name mangling scheme is given. INSTALLATION AND CUSTOMIZATION The demangler has two files, dem.h and dem.c. dem.h is #included in user programs. dem.c can be compiled either for a standalone program: $ cc -DDEM_MAIN dem.c -o dem or to produce an object to be linked with an application: $ cc -c dem.c ... $ cc appl.o dem.o The standard main() driver program in dem.c will read from standard input or from a set of command-line files, and unmangle names in place, while passing all whitespace and invalid names through unchanged. There are a couple of customization options. CLIP_UNDER should be set (#defined) at the top of dem.c if external names on your system have a leading underscore added to them. This is common, for example on the Sun SPARCstation. An #if has been added for the most common machine types. dem.c uses its own space management system both for speed and to avoid having to call malloc(). SP_ALIGN is the alignment of blocks that are allocated; 4 bytes is the default. If your system requires alignment on other than a power-of-2 boundary, you will have to adjust the function gs() to use the mod operator rather than the mask technique that is used by default. There is a constant MAXDBUF defined in dem.h. This is the maximum size of buffer required for an unmangled name's data structure. The 4096 size is the worst-case setting for a maximum input mangled name length of 256. If you routinely have names longer than this, you should adjust MAXDBUF upwards accordingly. Finally, there is a compile-time value EXPLAIN that can be set. If turned on with -DEXPLAIN, the demanger will append a brief description of the type of each input name to the demangled output. USE OF THE STANDALONE DEMANGLER The standalone program reads from standard input or from one or more files and processes each line in turn. All whitespace and non-alphanumeric characters are passed through unchanged. All other characters are separated into names and given to the demangler. If the demangler fails, the name is printed unchanged. Otherwise, the unmangled name is printed. The standalone demangler has an exit status of 0 unless one or more input files couldnotbe opened for reading; in that case the exit status is the number of unopenable files. USE OF THE DEMANGLER LIBRARY The basic idea is to call the dem() function: int ret; char inbuf[1024]; DEM d; char sbuf[MAXDBUF]; ret = dem(inbuf, &d, sbuf); inbuf is the input name, d the data structure that dem() fills up, and sbuf the internal buffer that the demangler writes into to allocate its data structure. dem() returns -1 on error, otherwise 0. dem.h has comments describing each field in the data structures. The data structures are somewhat complicated by the need to handle nested types and function arguments which themselves are function pointers with their own arguments. To format this data structure, there are several functions: dem_print - format a complete name dem_printcl - format a class name dem_printarg - format a function argument dem_printarglist - format a function argument list dem_printfunc - format a function name dem_explain - return a string explaining a type demangle - demangle in (char*) to out (char*) AN EXAMPLE APPLICATION This particular application reads from standard input and displays the class name for each mangled name read, or "(none)" on errors and C functions/data. #include <stdio.h> #include "dem.h" main() { char sbuf[MAXDBUF]; DEM d; int ret; char buf[1024]; char buf2[1024]; while (gets(buf) != NULL) { ret = dem(buf, &d, sbuf); if (ret || d.cl == NULL) { printf("%s --> (none)\n", buf); } else { dem_printcl(d.cl, buf2); printf("%s --> %s\n", buf, buf2); } } } TYPENAMES The demangler handles mangled class typenames, whether they are simple, nested, or template classes. For example: A__pt__2_i --> A<int> __Q2_1A1B --> A::B LOCAL VARIABLES The demangler also handles local variables of the form: __nnnxxx For example: __2x --> x USE OF THE OLD C++FILT PROGRAM Many of the previously supported options to c++filt are no longer supported. If you need to use them, you can build the old c++filt using the source in the subdirectory "osrc". Note that we are planning to eliminate them in a future release, so please send mail to your C++ support person explaining what option you needed and why. KNOWN BUGS AND AMBIGUITIES 1. "signed" and "volatile" encodings arenothandled. 2. The encoding for nested classes as mentioned on page 123 of the ARM is handled slightly differently in cfront; there is a "_" after the digit after the "Q". 3. A nested class starting with "Q" sometimes has the length encoded before it; the demangler handles either case. 4. The "Tnn" and "Nnnn" notations mentioned on page 124 arenotfully supported. It is assumed that the number of the designated argument is less than or equal to 9. So if you have 11 or more arguments, and you want to repeat argument 10 or greater, the demangler will reject the encoded name. 5. All literal arguments to templates are assumed to be const. For example, the non-const literal value "37" is encoded as "Ci". 6. Some compilers will add a gratuitous "_" before external names. 7. The grammar allows class names up to 999 characters. This is considered important for handling templates. THE GRAMMAR FOR EXTERNAL NAMES start --> name ######################### COMPLETE NAMES ######################### name --> sti | std | ptbl | func | data | vtbl | cname3 | local sti --> "__sti" "__" id std --> "__std" "__" id ptbl --> "__ptbl_vec" "__" id func --> "__op" arg funcpost | id funcpost funcpost --> "__" funcpost2 | "__" cname funcpost2 funcpost2 --> csv "F" arglist csv --> "" | "C" | "S" | "V" data --> id | id "__" cname vtbl --> "__vtbl" "__" cname local --> "__" num regid ######################### CLASS NAMES ######################### cname --> cname2 | nest nest --> "Q" digit "_" cnamelist cnamelist --> cname2 | cnamelist cname2 cname2 --> cnlen cnid cname3 --> cnid | "__" nest cnlen --> digit | digit digit | digit digit digit cnid --> id | id "__pt__" cnlen "_" arglist ######################### ARGUMENT LISTS ######################### arglist --> arg | arglist arg arg --> modlist arg2 | "X" modlist arg2 lit modlist --> mod | modlist mod mod --> "" | "U" | "C" | "V" | "S" | "P" | "R" | arr | mptr arr --> "A" num "_" mptr --> "M" cname arg2 --> fund | cname | funcp | repeat1 | repeat2 fund --> "v" | "c" | "s" | "i" | "l" | "f" | "d" | "r" | "e" funcp --> "F" arglist "_" arg repeat1 --> "T" digit | "T" digit digit repeat2 --> "N" digit digit | "N" digit digit digit lit --> litnum | zero | litmptr | cnlen id | sptr litnum --> "L" digit lnum | "L" digit digit "_" lnum litmptr --> "LM" num "_" litnum "_" cnlen id lnum --> num | "n" num sptr --> cnlen id "__" cname zero --> 0 ######################### LOW LEVEL STUFF ######################### digit --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 id --> special | regid special --> "__ct" | "__pp" # etc. regid --> letter | letter restid restid --> letter | digit | restid letter | restid digit letter --> "A"-"Z" | "a" - "z" | "_" num --> digit | num digit