Format for C compiler code generator

RoelH 20191228. Not the first draft.

Introduction

Traditionally, C compilers have three main components: When modifying an existing C compiler to support a new instruction set, the general approach is to choose a C compiler and change its code generator.
The problem is, that after this is done, you are stuck with the C compiler that was chosen. Also, making a code generator might not be easy, depending on the interface between the parser and code generator of the chosen compiler. It might also be that the parser does optimizations that are not effective for your instruction set, or even might make things worse.

I here propose a standard interface between the parser and code generator. If this interface is standardized, a codegenerator for a certain CPU can be used with all C compilers that comply to this standard.

The goals for this interface were: The two main parts of the chosen interface are:

Semantics

The main job of a parser is to find out what the structure of the program is. So that should be all that we let the parser do! The interface format will be almost equal to the source code, but with structure added in the form of brackets and comma's, and perhaps a small change in the ordering of the program elements.

Syntax

The syntax should be easy to parse. The JSON format was chosen because of its simple and well-known syntax, and good availability of standard software parts for reading and writing this format.

JSON introduction

JavaScript Object Notation (JSON) is an open-standard file format or data interchange format that uses human-readable text to transmit data objects.

JSON's basic data types are:

Strings are always in double-quotes.

example:
{ "Name": "John",
  "age": 27,
  },
  "phoneNumbers": [
    { "type": "home",
      "number": "212 555-1234"
    },
    { "type": "mobile",
      "number": "123 456-7890"
    }
  ],
  "children": [],
}

Standard code generator input format

Literals are in BOLD.

Expression

Expression
FormatMeaning
stringvariable name
numbernumeric constant
{ "const": A } structured constant or string constant
[ T, "op-prefix", A ]expression with prefix operation, like:
* & ++ -- ! ~ -
[ T, A, "op-postfix" ] expression with postfix operator, like:
++ --
[ T, A, "op-infix", B ]general infix expression, like:
+ - * / % == != > < >= <=
&& || & | ^ << >>
[ T, A, ":", B ]array indexing A[B]
[ T, A, ".", B, N ]struct/union member selection
[ T, A, "->", B ]indirect struct/union member selection
[ T, A, "?", B, C ]conditional expression
[ T, A, arg-list ]function call.
A is the function name
{ "function" : function-name }for assigning a function to a function-pointer

Remarks:
T is the name of the type of the expression.
Arg-list is a list of expressions: [ A, B, C ]. It can be an empty list [] when there are no arguments.
For struct, N is the offset (in bytes) from the base address of the structure. For union, this is zero.
A typecast is handled as infix operator: [ A, "castto", B ]
The function call also accepts a function pointer instead of function name
See also the list of C operators

Statement

Statement
FormatMeaning
[ T, A, "=", B ] Assignment statement. Statements
like a += 2 are presented as a = a + 2
and i++ is presented as i = i + 1
[ T, A, arg-list ] function call
{"if": A,
"then": statm-list,
"else": statm-list }
if-statement
{"while": A,
"do_": statm-list }
while-statement
{"do": statm-list,
"while_": A }
do-statement
{"for": [ A, B, C ] ,
"do_": statm-list }
for-statement
{ "//": "comment" } comment
{ "seq": statm-list } statement sequence. Not used !
{ "return": A } return with value
{ "var": decl-list }declaration of variables

Remarks:
Statm-list is a list of statements: [ statm1, statm2, statm3 ]
In assignment statements, The T field may be empty
In a return statement, expression A is ignored in void functions.
In a for statement, A, B, or C must be replaced by "//" if it is empty.

type-spec

This specifies a type.

type-spec
FormatMeaning
[ "array", size, typename ] array with elements of
type "type-spec"
[ "pt", size, typename ]pointer to "type-spec"
[ "struct", size, decl-list ]structure definition
[ "union", size, decl-list ]union definition
[ "function", size, par-list, return-type ]used for defining a function pointer
[ "alias", size, typename ]used to define a second name for a type

Remarks:
"Size" is the amount of storage bytes needed for a variable of this type.
Note that this differs from the terrible 'inside-out' type-spec syntax of C.
The par-list or decl-list is a list of var-declarations, example : { "n" : "int", "i": "int", "x": "float" }

declaration-list

declaration-list
FormatMeaning
{ name1 : type-spec1,
name2 : type-spec2,
name3 : type-spec3 }
specify a type for one or more variables

Remarks:
TODO Add var initialization

A typedef list:

typedef list
FormatMeaning
{ name1 : type-spec1,
name2 : type-spec2,
name3 : type-spec3 }
Specifies a name for one or more types

function definition

function definition
FormatMeaning
{ "function": name,
"par": par-list,
"return" : return-type,
"mod" : mod-list,
"body" : statm-list, }
specifies a function
par = parameterlist
return-type can be void
mod = list of modifiers

Previous work

A similar idea was presented in this discussion. But the resulting syntax is much too verbose in my opinion.

The End