diff options
-rw-r--r-- | FAQ | 68 | ||||
-rw-r--r-- | LICENSE | 31 | ||||
-rw-r--r-- | README | 72 | ||||
-rw-r--r-- | lib.c | 4 | ||||
-rw-r--r-- | lib.h | 9 | ||||
-rw-r--r-- | parse.c | 10 | ||||
-rw-r--r-- | parse.h | 5 | ||||
-rw-r--r-- | pre-process.c | 2 | ||||
-rw-r--r-- | scope.c | 2 | ||||
-rw-r--r-- | scope.h | 5 | ||||
-rw-r--r-- | symbol.c | 5 | ||||
-rw-r--r-- | symbol.h | 5 | ||||
-rw-r--r-- | test-lexing.c | 6 | ||||
-rw-r--r-- | test-parsing.c | 7 | ||||
-rw-r--r-- | token.h | 7 | ||||
-rw-r--r-- | tokenize.c | 8 |
16 files changed, 235 insertions, 11 deletions
@@ -0,0 +1,68 @@ + FAQ - Why sparse? + +Q. Why not just use gcc? + +A. Gcc is big, complex, and the gcc maintainers are not interested in + other uses of the gcc front-end. In fact, gcc has explicitly + resisted splitting up the front and back ends and having some common + intermediate language because or religious license issues - you can + have multiple front ends and back ends, but they all have to be part + of gcc and licensed under the GPL. + + This all (in my opinion) makes gcc development harder than it should + be, and makes the end result very ungainly. With "sparse", the + front-end is very explicitly separated into its own independent + project, and is totally independent from the users. I don't want to + know what you do in the back-end, because I don't think I _should_ + know or care. + + +Q. Why not GPL? + +A. See the previous question: I personally think that the front end + must be a totally separate project from the back end: any other + approach just leads to insanity. However, at the same time clearly + we cannot write intermediate files etc crud (since then the back end + would have to re-parse the whole thing and would have to have its + own front end and just do a lot of things that do not make any sense + from a technical standpoint). + + I like the GPL, but as rms says, "Linus is just an engineer". I + refuse to use a license if that license causes bad engineering + decisions. I want the front-end to be considered a separate + project, yet the GPL considers the required linking to make the + combined thing a derived work. Which is against the whole point + of 'sparse'. + + I'm not interested in code generation. I'm not interested in what + other people do with their back-ends. I _am_ interested in making a + good front-end, and "good" means that people find it usable. And + they shouldn't be scared away by politics or licenses. If they want + to make their back-end be BSD/MIT licensed, that's great. And if + they want to have a proprietary back-end, that's ok by me too. It's + their loss, not mine. + + At the same time, I'm a big believer in "quid pro quo". I wrote the + front-end, and if you make improvements to the semantic parsing part + (as opposed to just using the resulting parse tree), you'd better + cough up. The front-end is intended to be an open-source project in + its own right, and if you improve the front end, you must give those + improvements back. That's your "quid" to my "quo". + + +Q. So what _is_ the license? + +A. I don't know yet. I originally thought it would be LGPL, but I'm + possibly going for a license that is _not_ subsumable by the GPL. + In other words, I don't want to see a GPL'd project suck in the + LGPL'd front-end, and then make changes to the front end under the + GPL (this is something that the LGPL expressly allows, and see the + previous question for why I think it's the _only_ thing that I will + not allow). + + So I'm currently considering just taking the LGPL and removing the + GPL subsumption clause, and calling it the LLPL ("Lesser Linus + Public License" or something). In the meantime, you have no rights + at all, except to send me useful suggestions about a license that + still requires people who work on the front-end to work as open + source, while allowing arbitrary back-ends. @@ -0,0 +1,31 @@ +This is just a placeholder. I haven't decided on what the final license +will be. But it most likely (note the "likely" - I'm not promising +anything until I've made a real decision) will have the following +properties: + + - it will _require_ source code for the library itself (ie GPL-like in + that respect). Much like the LGPL. + + - but it will expressly allow linking with arbitrary back-ends, and + require that too in perpetuam (ie anti-GPL in that respects, and this + means that it's almost certainly not going to be LGPL) + +and, if possible: + + - it will be "open source(tm)" compatible as far as I can tell, + although if the anti-GPL part ends up being a problem, I may not care + enough to conform fully to OSI guidelines. Will need to check with + the OSI guys. + +In the meantime, if you agree with the above, and expect to agree with +whatever license I will choose with the above in mind, you can play with +this freely, and make changes and send patches if you explicitly mark +those patches as being compatible with whatever I do (yeah yeah, you'll +just need to trust me). + +Oh, and keep in mind that I'm famous for changing my mind. Maybe I'll +call the license the "sucker" license, and sell whatever you send me for +billions and billions of dollars without crediting you in the +slightest.. Sucka! + + Linus Torvalds @@ -0,0 +1,72 @@ + + sparse (spärs), adj,., spars-er, spars-est. + 1. thinly scattered or distributed; "a sparse population" + 2. thin; not thick or dense: "sparse hair" + 3. scanty; meager. + 4. semantic parse + [ from Latin: spars(us) scattered, past participle of + spargere 'to sparge' ] + + Antonym: abundant + +Sparse is a semantic parser of source files: it's neither a compiler +(although it could be used as a front-end for one) nor is it a +preprocessor (although it contains as a part of it a preprocessing +phase). + +It is meant to be a small - and simple - library. Scanty and meager, +and partly because of that easy to use. It has one mission in life: +create a semantic parse tree for some arbitrary user for further +analysis. It's not a tokenizer, nor is it some generic context-free +parser. In fact, context (semantics) is what it's all about - figuring +out not just what the grouping of tokens are, but what the _types_ are +that the grouping implies. + +And no, it doesn't use lex and yacc (or flex and bison). In my personal +opinion, the result of using lex/yacc tends to end up just having to +fight the assumptions the tools make. + +The parsing is done in three phases: + + - full-file tokenization + - pre-processing (which can cause another tokenization phase of another + file) + - semantic parsing. + +Note the "full file" part. Partly for efficiency, but mostly for ease of +use, there are no "partial results". The library completely parses one +whole source file, and builds up the _complete_ parse tree in memory. + +This means that a user of the library will literally just need to do + + struct token *token; + int fd = open(filename, O_RDONLY); + struct symbol_list *list = NULL; + + if (fd < 0) + exit_with_complaint(); + + // Initialize parse symbols + init_symbols(); + + // Tokenize the input stream + token = tokenize(filename, fd, NULL); + + // Pre-process the stream + token = preprocess(token); + + // Parse the resulting C code + translation_unit(token, &list); + +and he is now done - having a full C parse of the file he opened. The +library doesn't need any more setup, and once done does not impose any +more requirements. The user is free to do whatever he wants with the +parse tree that got built up, and needs not worry about the library ever +again. There is no extra state, there are no parser callbacks, there is +only the parse tree that is described by the header files. + +The library also contains (as an example user) a few clients that do the +preprocessing and the parsing and just print out the results. These +clients were done to verify and debug the library, and also as trivial +examples of what you can do with the parse tree once it is formed, so +that users can see how the tree is organized. @@ -1,5 +1,7 @@ /* - * Helper routines + * 'sparse' library helper routines. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. */ #include <stddef.h> #include <stdarg.h> @@ -1,5 +1,10 @@ -#ifndef LIST_H -#define LIST_H +#ifndef LIB_H +#define LIB_H +/* + * Basic helper routine descriptions for 'sparse'. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. + */ extern unsigned int hexval(unsigned int c); @@ -1,11 +1,15 @@ -#ifndef __GNUC__ -typedef int __builtin_va_list; -#endif /* * Stupid C parser, version 1e-6. * * Let's see how hard this is to do. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. */ + +#ifndef __GNUC__ +typedef int __builtin_va_list; +#endif + #include <stdarg.h> #include <stdlib.h> #include <stdio.h> @@ -1,5 +1,10 @@ #ifndef PARSE_H #define PARSE_H +/* + * Basic parsing data structures. Statements and symbols. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. + */ #include "symbol.h" diff --git a/pre-process.c b/pre-process.c index aa73dae..8dd0a68 100644 --- a/pre-process.c +++ b/pre-process.c @@ -3,6 +3,8 @@ * the tokenizer. * * This may not be the smartest preprocessor on the planet. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. */ #include <stdio.h> #include <stdlib.h> @@ -2,6 +2,8 @@ * Symbol scoping. * * This is pretty trivial. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. */ #include <stdlib.h> #include <string.h> @@ -1,5 +1,10 @@ #ifndef SCOPE_H #define SCOPE_H +/* + * Symbol scoping is pretty simple. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. + */ struct scope { struct token *token; /* Scope start information */ @@ -1,3 +1,8 @@ +/* + * Symbol lookup and handling. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. + */ #include <stdlib.h> #include <stdio.h> #include <stdlib.h> @@ -1,5 +1,10 @@ #ifndef SEMANTIC_H #define SEMANTIC_H +/* + * Basic symbol and namespace definitions. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. + */ #include "token.h" diff --git a/test-lexing.c b/test-lexing.c index cb076e8..1cd82c9 100644 --- a/test-lexing.c +++ b/test-lexing.c @@ -1,3 +1,9 @@ +/* + * Example test program that just uses the tokenization and + * preprocessing phases, and prints out the results. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. + */ #include <stdarg.h> #include <stdlib.h> #include <stdio.h> diff --git a/test-parsing.c b/test-parsing.c index 4b9a931..154b11a 100644 --- a/test-parsing.c +++ b/test-parsing.c @@ -1,3 +1,10 @@ +/* + * Example trivial client program that uses the sparse library + * to tokenize, pre-process and parse a C file, and prints out + * the results. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. + */ #include <stdarg.h> #include <stdlib.h> #include <stdio.h> @@ -1,5 +1,12 @@ #ifndef TOKEN_H #define TOKEN_H +/* + * Basic tokenization structures. NOTE! Those tokens had better + * be pretty small, since we're going to keep them all in memory + * indefinitely. + * + * Copyright (C) 2003 Linus Torvalds, all rights reserved. + */ #include <sys/types.h> @@ -1,10 +1,8 @@ /* - * This is a really stupid C tokenizer, intended to run after the - * preprocessor. + * This is a really stupid C tokenizer. It doesn't do any include + * files or anything complex at all. That's the pre-processor. * - * A smart preprocessor would be integrated and pass the compiler - * the tokenized input directly, but lacking that we just tokenize - * the preprocessor output. + * Copyright (C) 2003 Linus Torvalds, all rights reserved. */ #include <stdio.h> #include <stdlib.h> |