The Magical World of Structs, Typedefs, and Scoping
2016-05-09 - By Robert Elder
Introduction
Updated December 20, 2016: Added a couple new examples.
Updated January 1, 2017: Added another example.
Updated January 24, 2017: Added another example showing how identifiers can change meaning inside a declarator list.
Updated March 16, 2017: Added cases showing necessity of parse-time relationship of typedef redefinition management and the tag namespace hierarchy.
I decided to write this article to force myself to understand all of the complexities related to struct and union declarations, incomplete references, scoping rules and how they interact with the 'typedef' qualifier. If you read about scoping issues with structs, you'll realize that things can get pretty complicated, and to really understand things it is necessary to enumerate a list of compiler test cases. Reading through these test cases is a great alternative to spending a night out at the club, and I hope you have as much fun reading them as I had writing them. This article considers only ISO C89 and all testing was done with gcc and clang. gcc version: (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4, Ubuntu clang version: 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4).
What Happens In Each Case?
Test Case | Notes |
---|---|
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c |
|
error: storage size of ‘a’ isn’t known |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c |
|
main.c:7:8: error: redefinition of ‘struct foo’ struct foo{ ^ main.c:1:8: note: originally defined here struct foo{ |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c |
|
In function ‘main’: main.c:13:11: error: ‘struct foo’ has no member named ‘i’ (void)b.f->i; |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c |
|
main.c:14:18: error: ‘struct foo’ has no member named ‘i’
(void)b.f->i; This case demonstrates that the side-effects of 'struct foo;' and 'struct foo ty;' are different with respect to what 'struct foo' references later in this scope. |
|
10:16: error: redefinition of ‘struct foo’ struct foo{ |
|
main.c:2:15: error: ‘foo’ defined as wrong kind of tag |
|
main.c:11:13: error: dereferencing pointer to incomplete type (void)f1.b->i; ^ main.c:18:13: error: dereferencing pointer to incomplete type (void)f2.b->j; ^ |
|
main.c: In function ‘main’: main.c:11:14: error: storage size of ‘f1’ isn’t known struct foo f1; ^ main.c:11:14: warning: unused variable ‘f1’ [-Wunused-variable] main.c:19:14: error: storage size of ‘f2’ isn’t known struct foo f2; ^ main.c:19:14: warning: unused variable ‘f2’ [-Wunused-variable] |
|
main.c:4:14: error: storage size of ‘f1’ isn’t known struct foo f1; |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c |
|
main.c:8:6: error: storage size of ‘f’ isn’t known |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c This case is interesting, because when compared with the previous one, it re-inforces the idea that 'struct abc;' is treated in a very special (and difficult to detect) way. |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c This is an important test case, as it illustrates that typedefs are not fully resolved as if they were a string replacement, as soon as they are encountered. |
|
main.c:8:9: warning: useless type name in empty declaration [enabled by default]; still compiles Important because it demonstrates that a typedefed type cannot be used to declare an incomplete structure in the same way that the specifier it resolves to can be. |
|
expected identifier or ‘(’ before ‘{’ token Demonstrates that a typedefed type cannot be used to complete a structure type, even though it would resolve to be the same specifier for that type. |
|
main.c:2:16: warning: ‘struct boo’ declared inside parameter list [enabled by default]
int foo(struct boo {int a;});
^
main.c:2:16: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] This demonstrates that you can declare or implement a function that cannot be called since no type will be compatible with it due to the fact that the scope of the structure definition is only inside the function parameter list. |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c Demonstrates that structures can be defined as a type specifier to virtually any declarator. |
|
Produces warning in gcc: main.c:6:22: warning: empty declaration with type qualifier does not redeclare tag [enabled by default]
const struct foo; But an error with clang: main.c:7:20: error: variable has incomplete type 'struct foo' struct foo f; |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c |
|
main.c:3:2: error: expected specifier-qualifier-list before ‘typedef’ typedef struct boo {int i;}; |
|
main.c:8:6: error: storage size of ‘f’ isn’t known str f; |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c An interesting case that demonstrates that you can have a structure that contains 2 members of type 'struct foo' where each implementation of 'struct foo' is different. |
|
Compiles with warning in gcc: main.c:7:27: warning: declaration does not declare anything [enabled by default]
struct foo;
^
Error in clang: main.c:8:28: error: field has incomplete type 'struct foo' struct foo g; |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c Clang warns about padding in structure. This case demonstrates that the incomplete structure 'foo' is registered in the 'main' scope just after the '{' character of the struct definition, and before the interpretation of the member 'f' of the same struct. |
|
main.c:4:9: error: ‘foo’ defined as wrong kind of tag
struct foo * f; Demonstrates that a reference to the tag 'foo' will attempt to refer to the incompatible union type from the outer scope instead of declaring an incomplete type in the inner scope which would be completed later. |
|
Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c Demonstrates that the incomplete type reference is not a problem for function pointers inside the struct. |
|
Compiles in gcc with warnings, issues error in clang: error: reference to 'foo' is ambiguous |
|
Compiles without errors in gcc and clang using: gcc -std=c89 -pedantic -Wall main.c && ./a.out && clang -std=c89 -pedantic -Weverything main.c && ./a.out Produces output: 1 4 4 1 4 4 Important because it demonstrates that the meaning (and size of) an identifier can change inside of a declarator list (and even inside of an init_declarator) |
|
Compiles in gcc and clang with warnings: main.c:8:15: warning: redefinition of typedef ‘type1’ [-Wpedantic] typedef type2 type1; Interesting because it demonstrates that non-trivial typedef re-definitions will compile just fine even when they require some form of type introspection. |
|
Does not compile with error: error: typedef redefinition with different types ('type1' (aka 'struct foo') vs 'struct foo') Very interesting because it demonstrates that non-trivial typedef re-definitions (which must be managed at parse time due to the ambiguity of typedef identifiers) need to have some awareness of the tree of tag namespaces that exist from struct, union or enum types. |
Conclusion
The point of all this was to try and deduce what rules should be used by the parser for struct and union specifiers. One observation is that
struct foo;
is very much a special case when compared with any of
struct foo f;
struct foo {int i;};
typedef struct foo ty;
The key difference comes from the fact that immediately after parsing 'struct foo', the 'struct foo' can almost always be taken to refer to whatever the closest declaration of 'struct foo' is, except when the 'struct foo' is followed by a semicolon. This is somewhat problematic for the parser, because the grammar rules allow for the possibility of a declarator (or possibly more specifiers) between the struct or union specifier and the semicolon. Furthermore, if 'struct foo' is not followed by a declarator, but is also preceded by a type qualifier or storage class specifier, then the behaviour is sometimes treated differently than the special case observed with an unqualified 'struct foo;'. This is seen in the test case above where gcc emits a warning, and clang emits an error.
In addition, using
struct foo;
is colloquially understood to 'declare' the tag 'foo' as a structure, and using
struct foo{
int i;
}
is colloquially understood to 'declare' and 'define' the tag 'foo' as a structure.
However, the effect of 'declaring' is different in these two approaches: In the first, 'struct foo;' will only declare the tag 'foo' in the current scope, but the second method will 'declare' it in the current scope, and any enclosing scope. This is described in the C89 standard section 3.5.2.3 Tags: "struct-or-union identifier ; specifies a structure or union type and declares a tag, both visible only within the scope in which the declaration occurs. It specifies a new type distinct from any type with the same tag in an enclosing scope (if any)."
Another observation is that an incomplete type can only be completed in the same scope in which it is declared (but not any deeper scope). Once it has been completed, the completed structure can be used used in any deeper scope.
Thoughts On Parsing
After parsing 'struct foo', do a lookup in the current scope for a tag of the name 'foo'. If nothing is found in the current scope, repeat this process outward to any enclosing scope looking for a complete or incomplete reference. If one is found, use that struct_or_union_id. The struct_or_union_id is an id that uniquely identifies the 'struct <tagname>' and the scope in which it resides. If no matching tag is found, declare an incomplete type of 'struct foo' in the current scope.
The only exception to the above paragraph would be the special case of 'struct foo;', where upon parsing the semicolon, any id from an outer scope would be discarded and an incomplete structure type would be declared in the current scope.
For struct definitions, when a '}' character is parsed, complete the reference to that type using the definition just parsed.
Easy as pie.
How to Get Fired Using Switch Statements & Statement Expressions
Published 2016-10-27 |
$40.00 CAD |
The Jim Roskind C and C++ Grammars
Published 2018-02-15 |
7 Scandalous Weird Old Things About The C Preprocessor
Published 2015-09-20 |
Building A C Compiler Type System - Part 1: The Formidable Declarator
Published 2016-07-07 |
Modelling C Structs And Typedefs At Parse Time
Published 2017-03-30 |
Building A C Compiler Type System - Part 2: A Canonical Type Representation
Published 2016-07-21 |
An Artisan Guide to Building Broken C Parsers
Published 2017-03-30 |
Join My Mailing List Privacy Policy |
Why Bother Subscribing?
|