The Jim Roskind C and C++ Grammars
2018-02-15 - By Robert Elder
Introduction
The purpose of this article is to discuss the works of computer programmer Jim Roskind. Specifically, his work related to the C/C++ programming languages will be emphasized. I became motivated to do this short write-up when I realized that a lot of his work on C/C++ programming was done in the 1990-1996 era, just after digital communication formats became popular, but just before 'blog' type documentation format became popular. As a result, a number of references to his works are becoming dead links, or are simply lacking any context somewhere in a decaying ftp site that nobody knows about.
This article is unlikely to be interesting to anyone with a casual interest in C programming, but it is instead intended for individuals looking to develop an extremely intimate understanding of the C/C++ programming languages and their histories. My interest in reviewing Roskind's work has been to look for compiler test cases.
Briefly, I'll discuss a bit about the man himself: As far as I can tell in my Googling, and assuming that I haven't confused multiple people with the same name, he was an independent consultant around 1990[1], was a co-founder of Infoseek Corporation[2][3] in 1994, joined Netscape in 1995[2], appeared in the popular documentary Project Code Rush around 1998, joined Google in 2008 and designed QUIC in 2012. He has a number of other technical accomplishments that I haven't listed.
The Jim Roskind Grammar
In terms of contributions to C programming, here are a couple files from around 1990 that are particularly worth mentioning:
c++grammar1.1.tar.Z
c++grammar2.0.tar.Z
Because you're reading this in the future and various links to these files may no longer be working, below are various hashes of these files. If you are lucky, searching for them might hopefully lead you to a mirror somewhere (at the moment they don't). I also checked for different sources of these files, and although I can find lots of references, I could only find a single mirror where I could download them:
md5 5c5e10f21f7f77dba73f2fff792eb5d4 c++grammar1.1.tar.Z
md5 dcb3b207920ae02674676b0dae63d78b c++grammar2.0.tar.Z
sha256 d6776adfab4def7f3f4c2411b5870edbd018eafe0a7badb2c5be48374ff15893 c++grammar1.1.tar.Z
sha256 9cb8bf51f9b54b998e2058bec9892f170985e303a24c6d774ed186ec49ee065b c++grammar2.0.tar.Z
sha512 1a80d3931ba154844d27efa9b8e88520a578a2938d0c5a4816d45c3753c4eef4d027dcd6df7fae48e3d9b24b05388b875c9dd6d03f5751827a7017baee447662 c++grammar1.1.tar.Z
sha512 be7618f638f120cb5ec21dbdf0d044391c6b936bea82aa02b0600b31342e2229a02da78ea5bf94a383c1d1cb128d72cd2d5d38db63e0424ee19d5177d151b27d c++grammar2.0.tar.Z
In fact, after checking the license of these files, I think it should be fine to mirror them myself too:
The c++grammar1.1.tar.Z document appears to simply be an earlier version of the c++grammar2.0.tar.Z version. Inside the c++grammar2.0.tar.Z version, you will find the following files (according to freegrm5.txt):
- FREEGRM5.TXT - This introductory file
- GRAMMAR5.TXT - Parsing ambiguities in C++, and in my grammar
- CPP5.Y - My YACC compatible C++ grammar
- C5.Y - My YACC compatible, ANSI C conformant grammar
- CPP5.L - Flex input file defining a C++ lexical analyzer
- SKELGRPH.C - A hacked file from the Berkeley YACC distribution
- AUTODOC5.TXT - Documentationfor my machine generated analysis
- Y.OUTPUT - Machine generated analysis of the C++ grammar.
The word 'my' as used in the above list should be interpreted to refer to Jim Roskind.
The file 'autodocs.txt' contains the following table of contents:
- 1 INTRODUCTION TO the YACC CROSS REFERENCE SYSTEM
- 2 DETAILED DISCUSSION OF TABLES
- 2.1 Reference Grammar
- 2.2 Alphabetized Grammar
- 2.3 Sample Expansions for the Non-terminal Tokens
- 2.4 Summary Descriptions of the Terminal Tokens
- 2.5 Symbol and Grammar Cross Reference for the Tokens
- 2.6 Sample Stack Context and Accessing Sentences for each State
- 2.7 Concise list of Conflicts
- 2.8 Canonical Description of Conflicts
- 2.9 Verbose listing of state transitions
- 2.10 Explanations for all reductions suggested in conflicts
- 3 CONFLICT ANALYSIS Via the YACC CROSS REFERENCE SYSTEM
- 3.1 LR(1) Parsing
- 3.1.1 Ambiguous grammars
- 3.1.2 LR(1) ambiguous grammars
- 3.1.2.1 Removing LR(1) conflicts from unambiguous grammars
- 3.1.3 LALR-only ambiguous grammars, and LALR-only conflict components
- 3.1.3.1 LR parsers vs parsers generated by LALR parser generators
- 3.1.3.2 Sample LALR-only ambiguous grammar
- 3.1.3.2.1 Analysis of the sample LALR-only ambiguous grammar
- 3.1.3.2.2 Sample Removal of LALR-only conflict via state splitting
- 3.1.3.2.3 Sample Removal of LALR-only conflict via grammar augmentation
- 3.2 Interpreting Conflict Analysis Documentation
- 3.2.1 Conflict Reduction Demonstrations
- 3.2.2 Summarizing Reduction Contexts
- 3.2.3 Automatic LALR-only Conflict Summaries
- 3.2.3.1 Direct Pruning of LR conflicts From the Context Tree
- 3.2.3.2 Pruning of LR conflicts From the Context Tree using State Information
- 3.2.3.3 LALR-only conflicts With No Consequences
- 3.2.3.4 Significant LALR-only conflicts
- 3.2.4 Augmentation Proposals to Remove LALR-only Conflicts
One might question what the purpose is of digging up ancient grammars for YACC now that gcc and clang have moved on from this paradigm, but for me this information is still useful. The detailed analysis provided by Roskind is a good source for developing compiler test cases and learning how the language evolved.
In fact, there were a few discussions made on comp.compilers in the early 90s by Roskind that, at first glance, might seem like any other un-noteworthy piece of commentary in technical discussion forum, but they are in fact widely cited because they are the only time anyone has ever discussed the issue. One that I found very useful was Jim Roskind on C ambiguity. The post was useful because of several very special corner cases involving typedefs that it illustrates. In case that link goes dead, here is another reference to the general email thread on Google Groups comp.compilers. And to back up my claim about citations, that post by Roskind is cited in "A Simple, Possibly Correct LR Parser for C11", Jacques-Henri Jourdan, François Pottier, among a number of other places.
If you end up browsing the Wikipedia article on The Lexer Hack, you might notice that the first reference in the article is a source of Jim Roskind. Unfortunately, as of today, the reference is a dead link! But if you pay attention, you'll note that the link references a file called 'grammar5.txt' with a date of 1991-07-11, and this is exactly the date inside the 'grammar5.txt' that you can find in the tar file mentioned above!
I'm not sure, but I believe the actual use of the term 'lexer hack' might actually be a reference specifically to Jim Roskind's lexer hack. Inside of grammar5.txt, there is a note: "I will refer to this feedback loop (from the parser that stores information in a symbol table, wherein the lexer extracts the information) as the "lex hack"." That sounds to me like he came up with the term first!
In addition, if it piques your interest, here is the table of contents listed in grammar5.txt:
- INTRODUCTION
- REVIEW: STANDARD LEXICAL ANALYSIS HACK: TYPEDEFname vs IDENTIFIER
- STATUS OF MY "DISAMBIGUATED" GRAMMAR
- SUMMARY OF CONFLICTS
- 17 EASY CONFLICTS, WITH HAPPY ENDINGS
- 1 SR CONFLICT WITH AN ALMOST HAPPY ENDING
- 6 NOVEL CONFLICT THAT YIELD TO SEMANTIC DISAMBIGUATION
- 1 CONFLICT THAT CONSTRAINTS SUPPORT THE RESOLUTION FOR
- THE TOUGH AMBIGUITIES: FUNCTION LIKE CASTS AND COMPANY (17 CONFLICTS)
- THE TOUGH AMBIGUITIES: FUNCTION LIKE CASTS AND COMPANY
- LALR-only CONFLICTS IN THE GRAMMAR
- SAMPLE RESOLUTIONS OF AMBIGUITIES BY MY C++ GRAMMAR
- DIFFICULT AMBIGUITIES FOR C++ 2.0 PARSER TO TRY
- COMMENTARY ON CURRENT C++ DISAMBIGUATING RULES
- SOURCE OF CONFLICTS IN C++ (MIXING TYPES AND EXPRESSIONS)
- FUNCTION LIKE CAST vs DECLARATION AMBIGUITIES
- FUNCTION LIKE CAST vs DECLARATION : THE TOUGH EXAMPLE:
- CONCLUSION
- APPENDIX A:
- PROPOSED GRAMMAR MODIFICATIONS (fixing '*', and '&' conflicts)
- APPENDIX B:
- CANONICAL DESCRIPTION OF CONFLICTS and STATES
Conclusion
If you've read up until this point, you should be sold on the value of this information, either for historical purposes, or for the investigation into grammar ambiguities into the C or C++ languages. There are a number of other useful posts made by Jim Roskind on the comp.compilers newsgroup back in the 90s, but I won't go to the effort of listing them here. You'll just have to hope that those pages don't go dead before you try to search for the content on them.
References
[1] https://pdos.csail.mit.edu/archive/l/c/roskind.html "Tues, Jan 14 1992 4:23 pm... Jim Roskind Independent Consultant"
[2] https://www2.cs.arizona.edu/colloquia/03-04/Roskind.html "Jim was a co-founder of Infoseek Corporation, and later Chief Scientist."
[3] https://en.wikipedia.org/wiki/Infoseek "January 10, 1994;"
How to Get Fired Using Switch Statements & Statement Expressions
Published 2016-10-27 |
$40.00 CAD |
7 Scandalous Weird Old Things About The C Preprocessor
Published 2015-09-20 |
Building A C Compiler Type System - Part 1: The Formidable Declarator
Published 2016-07-07 |
Modelling C Structs And Typedefs At Parse Time
Published 2017-03-30 |
Building A C Compiler Type System - Part 2: A Canonical Type Representation
Published 2016-07-21 |
The Magical World of Structs, Typedefs, and Scoping
Published 2016-05-09 |
An Artisan Guide to Building Broken C Parsers
Published 2017-03-30 |
Join My Mailing List Privacy Policy |
Why Bother Subscribing?
|