The Magical World of Structs, Typedefs, and Scoping

2016-05-09 - By Robert Elder

Introduction

Updated December 20, 2016: Added a couple new examples.

Updated January 1, 2017: Added another example.

Updated January 24, 2017: Added another example showing how identifiers can change meaning inside a declarator list.

Updated March 16, 2017: Added cases showing necessity of parse-time relationship of typedef redefinition management and the tag namespace hierarchy.

I decided to write this article to force myself to understand all of the complexities related to struct and union declarations, incomplete references, scoping rules and how they interact with the 'typedef' qualifier. If you read about scoping issues with structs, you'll realize that things can get pretty complicated, and to really understand things it is necessary to enumerate a list of compiler test cases. Reading through these test cases is a great alternative to spending a night out at the club, and I hope you have as much fun reading them as I had writing them. This article considers only ISO C89 and all testing was done with gcc and clang. gcc version: (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4, Ubuntu clang version: 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4).

What Happens In Each Case?

Test Case	Notes
`struct foo{ int i; }; int main(void){ struct foo a; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c
`struct foo; struct foo{ int i; }; int main(void){ struct foo a; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c
`struct foo; int main(void){ struct foo a; return 0; }`	error: storage size of ‘a’ isn’t known
`struct boo{ struct foo * f; }; struct foo{ int i; }; int main(void){ struct boo b; (void)b.f->i; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c
`struct foo{ int j; }; struct boo{ struct foo * f; }; struct foo{ int i; }; int main(void){ struct boo b; (void)b.f->i; return 0; }`	main.c:7:8: error: redefinition of ‘struct foo’ struct foo{ ^ main.c:1:8: note: originally defined here struct foo{
`int main(void){ struct boo{ struct foo * f; }; struct foo{ int i; }; struct boo b; (void)b.f->i; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c
`struct foo{ int j; }; int main(void){ struct boo{ struct foo * f; }; struct foo{ int i; }; struct boo b; (void)b.f->i; return 0; }`	In function ‘main’: main.c:13:11: error: ‘struct foo’ has no member named ‘i’ (void)b.f->i;
`struct foo{ int j; }; int main(void){ struct foo; struct boo{ struct foo * f; }; struct foo{ int i; }; struct boo b; (void)b.f->i; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c
`struct foo{ int i; }; int main(void){ struct boo{ struct foo * f; }; struct foo{ int i; }; struct boo b; (void)b.f->i; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c
`struct foo{ int j; }; int main(void){ struct foo ty; struct boo{ struct foo * f; }; struct foo{ int i; }; struct boo b; (void)b.f->i; return 0; }`	main.c:14:18: error: ‘struct foo’ has no member named ‘i’ (void)b.f->i; This case demonstrates that the side-effects of 'struct foo;' and 'struct foo ty;' are different with respect to what 'struct foo' references later in this scope.
`struct foo{ int j; }; int main(void){ struct boo{ struct foo {int k;} abc; struct foo * f; }; struct foo{ int i; }; struct boo b; (void)b.f->i; return 0; }`	10:16: error: redefinition of ‘struct foo’ struct foo{
`struct foo{ union foo * i; }; int main(void){ return 0; }`	main.c:2:15: error: ‘foo’ defined as wrong kind of tag
`struct foo{ struct boo * b; }; int main(void){ { struct boo{ int i; }; struct foo f1; (void)f1.b->i; } { struct boo{ int j; }; struct foo f2; (void)f2.b->j; } return 0; }`	main.c:11:13: error: dereferencing pointer to incomplete type (void)f1.b->i; ^ main.c:18:13: error: dereferencing pointer to incomplete type (void)f2.b->j; ^
`struct foo{ struct boo * b; }; int main(void){ { struct foo; struct boo{ int i; }; struct foo f1; (void)f1.b->i; } { struct foo; struct boo{ int j; }; struct foo f2; (void)f2.b->j; } return 0; }`	main.c: In function ‘main’: main.c:11:14: error: storage size of ‘f1’ isn’t known struct foo f1; ^ main.c:11:14: warning: unused variable ‘f1’ [-Wunused-variable] main.c:19:14: error: storage size of ‘f2’ isn’t known struct foo f2; ^ main.c:19:14: warning: unused variable ‘f2’ [-Wunused-variable]
`int main(void){ struct foo; { struct foo f1; } struct foo{ int i; }; return 0; }`	main.c:4:14: error: storage size of ‘f1’ isn’t known struct foo f1;
`int main(void){ struct foo{ struct foo * f; }; struct foo f; (void)f; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c
`struct foo{ int i; }; int main(void){ struct foo; typedef struct foo str; str f; (void)f; return 0; }`	main.c:8:6: error: storage size of ‘f’ isn’t known
`struct foo{ int i; }; int main(void){ (struct foo*)0; typedef struct foo str; str f; (void)f; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c This case is interesting, because when compared with the previous one, it re-inforces the idea that 'struct abc;' is treated in a very special (and difficult to detect) way.
`struct foo{ int i; }; int main(void){ typedef struct foo str; struct foo; str f; (void)f; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c
`struct foo{ int i; }; int main(void){ typedef struct foo abc; typedef struct foo def; abc g; def h; (void)g; (void)h; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c
`struct foo{ int i; }; int main(void){ typedef struct foo str; struct foo; typedef str dir; dir f; (void)f; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c This is an important test case, as it illustrates that typedefs are not fully resolved as if they were a string replacement, as soon as they are encountered.
`struct foo{ int i; }; typedef struct foo str; int main(void){ str; struct foo f; (void)f; return 0; }`	main.c:8:9: warning: useless type name in empty declaration [enabled by default]; still compiles Important because it demonstrates that a typedefed type cannot be used to declare an incomplete structure in the same way that the specifier it resolves to can be.
`struct foo; typedef struct foo str; str{ int i; }; int main(void){ return 0; }`	expected identifier or ‘(’ before ‘{’ token Demonstrates that a typedefed type cannot be used to complete a structure type, even though it would resolve to be the same specifier for that type.
`int foo(struct boo {int a;}); int main(void){ return 0; }`	main.c:2:16: warning: ‘struct boo’ declared inside parameter list [enabled by default] int foo(struct boo {int a;}); ^ main.c:2:16: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] This demonstrates that you can declare or implement a function that cannot be called since no type will be compatible with it due to the fact that the scope of the structure definition is only inside the function parameter list.
`int main(void){ struct foo {int i;} (*a[3])(int (int (int (int)))); (void)a; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c Demonstrates that structures can be defined as a type specifier to virtually any declarator.
`struct foo{ int i; }; int main(void){ const struct foo; struct foo f; (void)f; return 0; }`	Produces warning in gcc: main.c:6:22: warning: empty declaration with type qualifier does not redeclare tag [enabled by default] const struct foo; But an error with clang: main.c:7:20: error: variable has incomplete type 'struct foo' struct foo f;
`typedef struct foo {int i;} koo(struct foo); int main(void){ return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c
`typedef struct foo { typedef struct boo {int i;}; }; int main(void){ return 0; }`	main.c:3:2: error: expected specifier-qualifier-list before ‘typedef’ typedef struct boo {int i;};
`typedef struct foo str; int main(void){ struct foo{ int i; }; str f; return 0; }`	main.c:8:6: error: storage size of ‘f’ isn’t known str f;
`struct foo{ int j; }; int main(void){ struct boo{ struct foo f; struct foo {int i;}g; }; struct boo b; (void)b.f.j; (void)b.g.i; return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c An interesting case that demonstrates that you can have a structure that contains 2 members of type 'struct foo' where each implementation of 'struct foo' is different.
`struct foo{ int j; }; int main(void){ struct boo{ struct foo; struct foo g; }; struct boo b; (void)b.g.j; return 0; }`	Compiles with warning in gcc: main.c:7:27: warning: declaration does not declare anything [enabled by default] struct foo; ^ Error in clang: main.c:8:28: error: field has incomplete type 'struct foo' struct foo g;
`struct foo {int i;}; int main(void){ struct foo {struct foo * f; int j;}; struct foo a; return a.f->j; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c Clang warns about padding in structure. This case demonstrates that the incomplete structure 'foo' is registered in the 'main' scope just after the '{' character of the struct definition, and before the interpretation of the member 'f' of the same struct.
`union foo {int i;}; int main(void){ struct foo * f; struct foo {int k;}; struct foo a; return a.k; }`	main.c:4:9: error: ‘foo’ defined as wrong kind of tag struct foo * f; Demonstrates that a reference to the tag 'foo' will attempt to refer to the incompatible union type from the outer scope instead of declaring an incomplete type in the inner scope which would be completed later.
`struct foo{ void (*f)(struct foo); }; int main(void){ return 0; }`	Compiles without warnings or errors: gcc -std=c89 -pedantic -Wall main.c && clang -std=c89 -pedantic -Weverything main.c Demonstrates that the incomplete type reference is not a problem for function pointers inside the struct.
`int foo(struct foo {int i;} a, int (* b)(struct foo {double j;} c)){ struct foo lol; } int main(void){ return 0; }`	Compiles in gcc with warnings, issues error in clang: error: reference to 'foo' is ambiguous
`#include <stdio.h> typedef char foo; int main(void) { unsigned int a = sizeof(foo), foo=sizeof(foo), b=sizeof(foo); printf("%u %u %u\n", a, foo, b); return 0; }`	Compiles without errors in gcc and clang using: gcc -std=c89 -pedantic -Wall main.c && ./a.out && clang -std=c89 -pedantic -Weverything main.c && ./a.out Produces output: 1 4 4 1 4 4 Important because it demonstrates that the meaning (and size of) an identifier can change inside of a declarator list (and even inside of an init_declarator)
`struct foo{ int i; }; typedef struct foo type1; typedef type1 type2; typedef type2 type1; int main(void){ return 0; }`	Compiles in gcc and clang with warnings: main.c:8:15: warning: redefinition of typedef ‘type1’ [-Wpedantic] typedef type2 type1; Interesting because it demonstrates that non-trivial typedef re-definitions will compile just fine even when they require some form of type introspection.
`struct foo{ int i; }; typedef struct foo type1; int main(void){ struct foo { int j;}; typedef struct foo type2; typedef type1 type2; /* Error here. */ return 0; }`	Does not compile with error: error: typedef redefinition with different types ('type1' (aka 'struct foo') vs 'struct foo') Very interesting because it demonstrates that non-trivial typedef re-definitions (which must be managed at parse time due to the ambiguity of typedef identifiers) need to have some awareness of the tree of tag namespaces that exist from struct, union or enum types.

Conclusion

The point of all this was to try and deduce what rules should be used by the parser for struct and union specifiers. One observation is that

struct foo;

is very much a special case when compared with any of

struct foo f;
struct foo {int i;};
typedef struct foo ty;

The key difference comes from the fact that immediately after parsing 'struct foo', the 'struct foo' can almost always be taken to refer to whatever the closest declaration of 'struct foo' is, except when the 'struct foo' is followed by a semicolon. This is somewhat problematic for the parser, because the grammar rules allow for the possibility of a declarator (or possibly more specifiers) between the struct or union specifier and the semicolon. Furthermore, if 'struct foo' is not followed by a declarator, but is also preceded by a type qualifier or storage class specifier, then the behaviour is sometimes treated differently than the special case observed with an unqualified 'struct foo;'. This is seen in the test case above where gcc emits a warning, and clang emits an error.

In addition, using

struct foo;

is colloquially understood to 'declare' the tag 'foo' as a structure, and using

struct foo{
	int i;
}

is colloquially understood to 'declare' and 'define' the tag 'foo' as a structure.

However, the effect of 'declaring' is different in these two approaches: In the first, 'struct foo;' will only declare the tag 'foo' in the current scope, but the second method will 'declare' it in the current scope, and any enclosing scope. This is described in the C89 standard section 3.5.2.3 Tags: "struct-or-union identifier ; specifies a structure or union type and declares a tag, both visible only within the scope in which the declaration occurs. It specifies a new type distinct from any type with the same tag in an enclosing scope (if any)."

Another observation is that an incomplete type can only be completed in the same scope in which it is declared (but not any deeper scope). Once it has been completed, the completed structure can be used used in any deeper scope.

Thoughts On Parsing

After parsing 'struct foo', do a lookup in the current scope for a tag of the name 'foo'. If nothing is found in the current scope, repeat this process outward to any enclosing scope looking for a complete or incomplete reference. If one is found, use that struct_or_union_id. The struct_or_union_id is an id that uniquely identifies the 'struct <tagname>' and the scope in which it resides. If no matching tag is found, declare an incomplete type of 'struct foo' in the current scope.

The only exception to the above paragraph would be the special case of 'struct foo;', where upon parsing the semicolon, any id from an outer scope would be discarded and an incomplete structure type would be declared in the current scope.

For struct definitions, when a '}' character is parsed, complete the reference to that type using the definition just parsed.

Easy as pie.

How to Get Fired Using Switch Statements & Statement Expressions Published 2016-10-27	$40.00 CAD C Programming Fridge Magnets	The Jim Roskind C and C++ Grammars Published 2018-02-15	7 Scandalous Weird Old Things About The C Preprocessor Published 2015-09-20
Building A C Compiler Type System - Part 1: The Formidable Declarator Published 2016-07-07	Modelling C Structs And Typedefs At Parse Time Published 2017-03-30	Building A C Compiler Type System - Part 2: A Canonical Type Representation Published 2016-07-21	An Artisan Guide to Building Broken C Parsers Published 2017-03-30

Why Bother Subscribing?

Free Software/Engineering Content. I publish all of my educational content publicly for free so everybody can make use of it. Why bother signing up for a paid 'course', when you can just sign up for this email list?
Read about cool new products that I'm building. How do I make money? Glad you asked! You'll get some emails with examples of things that I sell. You might even get some business ideas of your own :)
People actually like this email list. I know that sounds crazy, because who actually subscribes to email lists these days, right? Well, some do, and if you end up not liking it, I give you permission to unsubscribe and mark it as spam.