16 files changed, 235 insertions, 11 deletions
diff --git a/FAQ b/FAQ
new file mode 100644
index 0000000..a4041e5
--- /dev/null
+++ b/FAQ
@@ -0,0 +1,68 @@
+	FAQ - Why sparse?
+
+Q.  Why not just use gcc?
+
+A.  Gcc is big, complex, and the gcc maintainers are not interested in
+    other uses of the gcc front-end.  In fact, gcc has explicitly
+    resisted splitting up the front and back ends and having some common
+    intermediate language because or religious license issues - you can
+    have multiple front ends and back ends, but they all have to be part
+    of gcc and licensed under the GPL. 
+
+    This all (in my opinion) makes gcc development harder than it should
+    be, and makes the end result very ungainly.  With "sparse", the
+    front-end is very explicitly separated into its own independent
+    project, and is totally independent from the users.  I don't want to
+    know what you do in the back-end, because I don't think I _should_
+    know or care. 
+
+
+Q.  Why not GPL?
+
+A.  See the previous question: I personally think that the front end
+    must be a totally separate project from the back end: any other
+    approach just leads to insanity.  However, at the same time clearly
+    we cannot write intermediate files etc crud (since then the back end
+    would have to re-parse the whole thing and would have to have its
+    own front end and just do a lot of things that do not make any sense
+    from a technical standpoint).
+
+    I like the GPL, but as rms says, "Linus is just an engineer". I
+    refuse to use a license if that license causes bad engineering
+    decisions.  I want the front-end to be considered a separate
+    project, yet the GPL considers the required linking to make the
+    combined thing a derived work. Which is against the whole point
+    of 'sparse'.
+
+    I'm not interested in code generation. I'm not interested in what
+    other people do with their back-ends.  I _am_ interested in making a
+    good front-end, and "good" means that people find it usable. And
+    they shouldn't be scared away by politics or licenses. If they want
+    to make their back-end be BSD/MIT licensed, that's great. And if
+    they want to have a proprietary back-end, that's ok by me too. It's
+    their loss, not mine.
+
+    At the same time, I'm a big believer in "quid pro quo". I wrote the
+    front-end, and if you make improvements to the semantic parsing part
+    (as opposed to just using the resulting parse tree), you'd better
+    cough up.  The front-end is intended to be an open-source project in
+    its own right, and if you improve the front end, you must give those
+    improvements back. That's your "quid" to my "quo".
+
+
+Q.  So what _is_ the license?
+
+A.  I don't know yet.  I originally thought it would be LGPL, but I'm
+    possibly going for a license that is _not_ subsumable by the GPL. 
+    In other words, I don't want to see a GPL'd project suck in the
+    LGPL'd front-end, and then make changes to the front end under the
+    GPL (this is something that the LGPL expressly allows, and see the
+    previous question for why I think it's the _only_ thing that I will
+    not allow). 
+
+    So I'm currently considering just taking the LGPL and removing the
+    GPL subsumption clause, and calling it the LLPL ("Lesser Linus
+    Public License" or something). In the meantime, you have no rights
+    at all, except to send me useful suggestions about a license that
+    still requires people who work on the front-end to work as open
+    source, while allowing arbitrary back-ends. 
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..02282e5
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,31 @@
+This is just a placeholder.  I haven't decided on what the final license
+will be.  But it most likely (note the "likely" - I'm not promising
+anything until I've made a real decision) will have the following
+properties:
+
+ - it will _require_ source code for the library itself (ie GPL-like in
+   that respect). Much like the LGPL.
+
+ - but it will expressly allow linking with arbitrary back-ends, and
+   require that too in perpetuam (ie anti-GPL in that respects, and this
+   means that it's almost certainly not going to be LGPL)
+
+and, if possible:
+
+ - it will be "open source(tm)" compatible as far as I can tell,
+   although if the anti-GPL part ends up being a problem, I may not care
+   enough to conform fully to OSI guidelines.  Will need to check with
+   the OSI guys. 
+
+In the meantime, if you agree with the above, and expect to agree with
+whatever license I will choose with the above in mind, you can play with
+this freely, and make changes and send patches if you explicitly mark
+those patches as being compatible with whatever I do (yeah yeah, you'll
+just need to trust me).
+
+Oh, and keep in mind that I'm famous for changing my mind.  Maybe I'll
+call the license the "sucker" license, and sell whatever you send me for
+billions and billions of dollars without crediting you in the
+slightest..  Sucka!
+
+			Linus Torvalds
diff --git a/README b/README
new file mode 100644
index 0000000..82b90e1
--- /dev/null
+++ b/README
@@ -0,0 +1,72 @@
+
+  sparse (sp�rs), adj,., spars-er, spars-est.
+	1. thinly scattered or distributed; "a sparse population"
+	2. thin; not thick or dense: "sparse hair"
+	3. scanty; meager.
+	4. semantic parse
+  	[ from Latin: spars(us) scattered, past participle of
+	  spargere 'to sparge' ]
+
+	Antonym: abundant
+
+Sparse is a semantic parser of source files: it's neither a compiler
+(although it could be used as a front-end for one) nor is it a
+preprocessor (although it contains as a part of it a preprocessing
+phase). 
+
+It is meant to be a small - and simple - library.  Scanty and meager,
+and partly because of that easy to use.  It has one mission in life:
+create a semantic parse tree for some arbitrary user for further
+analysis.  It's not a tokenizer, nor is it some generic context-free
+parser.  In fact, context (semantics) is what it's all about - figuring
+out not just what the grouping of tokens are, but what the _types_ are
+that the grouping implies.
+
+And no, it doesn't use lex and yacc (or flex and bison).  In my personal
+opinion, the result of using lex/yacc tends to end up just having to
+fight the assumptions the tools make. 
+
+The parsing is done in three phases:
+
+ - full-file tokenization
+ - pre-processing (which can cause another tokenization phase of another
+   file)
+ - semantic parsing.
+
+Note the "full file" part. Partly for efficiency, but mostly for ease of
+use, there are no "partial results". The library completely parses one
+whole source file, and builds up the _complete_ parse tree in memory.
+
+This means that a user of the library will literally just need to do
+
+	struct token *token;
+	int fd = open(filename, O_RDONLY);
+	struct symbol_list *list = NULL;
+
+	if (fd < 0)
+		exit_with_complaint();
+
+	// Initialize parse symbols
+	init_symbols();
+
+	// Tokenize the input stream
+	token = tokenize(filename, fd, NULL);
+
+	// Pre-process the stream
+	token = preprocess(token);
+
+	// Parse the resulting C code
+	translation_unit(token, &list);
+
+and he is now done - having a full C parse of the file he opened.  The
+library doesn't need any more setup, and once done does not impose any
+more requirements.  The user is free to do whatever he wants with the
+parse tree that got built up, and needs not worry about the library ever
+again.  There is no extra state, there are no parser callbacks, there is
+only the parse tree that is described by the header files. 
+
+The library also contains (as an example user) a few clients that do the
+preprocessing and the parsing and just print out the results. These
+clients were done to verify and debug the library, and also as trivial
+examples of what you can do with the parse tree once it is formed, so
+that users can see how the tree is organized.
diff --git a/lib.c b/lib.c
index 165463f..d44f08f 100644
--- a/lib.c
+++ b/lib.c
@@ -1,5 +1,7 @@
 /*
- * Helper routines
+ * 'sparse' library helper routines.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
  */
 #include <stddef.h>
 #include <stdarg.h>
diff --git a/lib.h b/lib.h
index 5069987..6ad1a77 100644
--- a/lib.h
+++ b/lib.h
@@ -1,5 +1,10 @@
-#ifndef LIST_H
-#define LIST_H
+#ifndef LIB_H
+#define LIB_H
+/*
+ * Basic helper routine descriptions for 'sparse'.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
 
 extern unsigned int hexval(unsigned int c);
 
diff --git a/parse.c b/parse.c
index 87464f8..3802d62 100644
--- a/parse.c
+++ b/parse.c
@@ -1,11 +1,15 @@
-#ifndef __GNUC__
-typedef int __builtin_va_list;
-#endif
 /*
  * Stupid C parser, version 1e-6.
  *
  * Let's see how hard this is to do.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
  */
+
+#ifndef __GNUC__
+typedef int __builtin_va_list;
+#endif
+
 #include <stdarg.h>
 #include <stdlib.h>
 #include <stdio.h>
diff --git a/parse.h b/parse.h
index f95f249..7828b46 100644
--- a/parse.h
+++ b/parse.h
@@ -1,5 +1,10 @@
 #ifndef PARSE_H
 #define PARSE_H
+/*
+ * Basic parsing data structures. Statements and symbols.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
 
 #include "symbol.h"
 
diff --git a/pre-process.c b/pre-process.c
index aa73dae..8dd0a68 100644
--- a/pre-process.c
+++ b/pre-process.c
@@ -3,6 +3,8 @@
  * the tokenizer.
  *
  * This may not be the smartest preprocessor on the planet.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
  */
 #include <stdio.h>
 #include <stdlib.h>
diff --git a/scope.c b/scope.c
index 50edcad..c13fe5e 100644
--- a/scope.c
+++ b/scope.c
@@ -2,6 +2,8 @@
  * Symbol scoping.
  *
  * This is pretty trivial.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
  */
 #include <stdlib.h>
 #include <string.h>
diff --git a/scope.h b/scope.h
index f564018..b0006a8 100644
--- a/scope.h
+++ b/scope.h
@@ -1,5 +1,10 @@
 #ifndef SCOPE_H
 #define SCOPE_H
+/*
+ * Symbol scoping is pretty simple.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
 
 struct scope {
 	struct token *token;		/* Scope start information */
diff --git a/symbol.c b/symbol.c
index ba2096d..c0f71d5 100644
--- a/symbol.c
+++ b/symbol.c
@@ -1,3 +1,8 @@
+/*
+ * Symbol lookup and handling.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
 #include <stdlib.h>
 #include <stdio.h>
 #include <stdlib.h>
diff --git a/symbol.h b/symbol.h
index 3934e27..24a8b60 100644
--- a/symbol.h
+++ b/symbol.h
@@ -1,5 +1,10 @@
 #ifndef SEMANTIC_H
 #define SEMANTIC_H
+/*
+ * Basic symbol and namespace definitions.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
 
 #include "token.h"
 
diff --git a/test-lexing.c b/test-lexing.c
index cb076e8..1cd82c9 100644
--- a/test-lexing.c
+++ b/test-lexing.c
@@ -1,3 +1,9 @@
+/*
+ * Example test program that just uses the tokenization and
+ * preprocessing phases, and prints out the results.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
 #include <stdarg.h>
 #include <stdlib.h>
 #include <stdio.h>
diff --git a/test-parsing.c b/test-parsing.c
index 4b9a931..154b11a 100644
--- a/test-parsing.c
+++ b/test-parsing.c
@@ -1,3 +1,10 @@
+/*
+ * Example trivial client program that uses the sparse library
+ * to tokenize, pre-process and parse a C file, and prints out
+ * the results.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
 #include <stdarg.h>
 #include <stdlib.h>
 #include <stdio.h>
diff --git a/token.h b/token.h
index 7e7d9de..a5f24e0 100644
--- a/token.h
+++ b/token.h
@@ -1,5 +1,12 @@
 #ifndef TOKEN_H
 #define TOKEN_H
+/*
+ * Basic tokenization structures. NOTE! Those tokens had better
+ * be pretty small, since we're going to keep them all in memory
+ * indefinitely.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
 
 #include <sys/types.h>
 
diff --git a/tokenize.c b/tokenize.c
index aee4248..a2a63bc 100644
--- a/tokenize.c
+++ b/tokenize.c
@@ -1,10 +1,8 @@
 /*
- * This is a really stupid C tokenizer, intended to run after the
- * preprocessor.
+ * This is a really stupid C tokenizer. It doesn't do any include
+ * files or anything complex at all. That's the pre-processor.
  *
- * A smart preprocessor would be integrated and pass the compiler
- * the tokenized input directly, but lacking that we just tokenize
- * the preprocessor output.
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
  */
 #include <stdio.h>
 #include <stdlib.h>