aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--FAQ68
-rw-r--r--LICENSE31
-rw-r--r--README72
-rw-r--r--lib.c4
-rw-r--r--lib.h9
-rw-r--r--parse.c10
-rw-r--r--parse.h5
-rw-r--r--pre-process.c2
-rw-r--r--scope.c2
-rw-r--r--scope.h5
-rw-r--r--symbol.c5
-rw-r--r--symbol.h5
-rw-r--r--test-lexing.c6
-rw-r--r--test-parsing.c7
-rw-r--r--token.h7
-rw-r--r--tokenize.c8
16 files changed, 235 insertions, 11 deletions
diff --git a/FAQ b/FAQ
new file mode 100644
index 0000000..a4041e5
--- /dev/null
+++ b/FAQ
@@ -0,0 +1,68 @@
+ FAQ - Why sparse?
+
+Q. Why not just use gcc?
+
+A. Gcc is big, complex, and the gcc maintainers are not interested in
+ other uses of the gcc front-end. In fact, gcc has explicitly
+ resisted splitting up the front and back ends and having some common
+ intermediate language because or religious license issues - you can
+ have multiple front ends and back ends, but they all have to be part
+ of gcc and licensed under the GPL.
+
+ This all (in my opinion) makes gcc development harder than it should
+ be, and makes the end result very ungainly. With "sparse", the
+ front-end is very explicitly separated into its own independent
+ project, and is totally independent from the users. I don't want to
+ know what you do in the back-end, because I don't think I _should_
+ know or care.
+
+
+Q. Why not GPL?
+
+A. See the previous question: I personally think that the front end
+ must be a totally separate project from the back end: any other
+ approach just leads to insanity. However, at the same time clearly
+ we cannot write intermediate files etc crud (since then the back end
+ would have to re-parse the whole thing and would have to have its
+ own front end and just do a lot of things that do not make any sense
+ from a technical standpoint).
+
+ I like the GPL, but as rms says, "Linus is just an engineer". I
+ refuse to use a license if that license causes bad engineering
+ decisions. I want the front-end to be considered a separate
+ project, yet the GPL considers the required linking to make the
+ combined thing a derived work. Which is against the whole point
+ of 'sparse'.
+
+ I'm not interested in code generation. I'm not interested in what
+ other people do with their back-ends. I _am_ interested in making a
+ good front-end, and "good" means that people find it usable. And
+ they shouldn't be scared away by politics or licenses. If they want
+ to make their back-end be BSD/MIT licensed, that's great. And if
+ they want to have a proprietary back-end, that's ok by me too. It's
+ their loss, not mine.
+
+ At the same time, I'm a big believer in "quid pro quo". I wrote the
+ front-end, and if you make improvements to the semantic parsing part
+ (as opposed to just using the resulting parse tree), you'd better
+ cough up. The front-end is intended to be an open-source project in
+ its own right, and if you improve the front end, you must give those
+ improvements back. That's your "quid" to my "quo".
+
+
+Q. So what _is_ the license?
+
+A. I don't know yet. I originally thought it would be LGPL, but I'm
+ possibly going for a license that is _not_ subsumable by the GPL.
+ In other words, I don't want to see a GPL'd project suck in the
+ LGPL'd front-end, and then make changes to the front end under the
+ GPL (this is something that the LGPL expressly allows, and see the
+ previous question for why I think it's the _only_ thing that I will
+ not allow).
+
+ So I'm currently considering just taking the LGPL and removing the
+ GPL subsumption clause, and calling it the LLPL ("Lesser Linus
+ Public License" or something). In the meantime, you have no rights
+ at all, except to send me useful suggestions about a license that
+ still requires people who work on the front-end to work as open
+ source, while allowing arbitrary back-ends.
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..02282e5
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,31 @@
+This is just a placeholder. I haven't decided on what the final license
+will be. But it most likely (note the "likely" - I'm not promising
+anything until I've made a real decision) will have the following
+properties:
+
+ - it will _require_ source code for the library itself (ie GPL-like in
+ that respect). Much like the LGPL.
+
+ - but it will expressly allow linking with arbitrary back-ends, and
+ require that too in perpetuam (ie anti-GPL in that respects, and this
+ means that it's almost certainly not going to be LGPL)
+
+and, if possible:
+
+ - it will be "open source(tm)" compatible as far as I can tell,
+ although if the anti-GPL part ends up being a problem, I may not care
+ enough to conform fully to OSI guidelines. Will need to check with
+ the OSI guys.
+
+In the meantime, if you agree with the above, and expect to agree with
+whatever license I will choose with the above in mind, you can play with
+this freely, and make changes and send patches if you explicitly mark
+those patches as being compatible with whatever I do (yeah yeah, you'll
+just need to trust me).
+
+Oh, and keep in mind that I'm famous for changing my mind. Maybe I'll
+call the license the "sucker" license, and sell whatever you send me for
+billions and billions of dollars without crediting you in the
+slightest.. Sucka!
+
+ Linus Torvalds
diff --git a/README b/README
new file mode 100644
index 0000000..82b90e1
--- /dev/null
+++ b/README
@@ -0,0 +1,72 @@
+
+ sparse (spärs), adj,., spars-er, spars-est.
+ 1. thinly scattered or distributed; "a sparse population"
+ 2. thin; not thick or dense: "sparse hair"
+ 3. scanty; meager.
+ 4. semantic parse
+ [ from Latin: spars(us) scattered, past participle of
+ spargere 'to sparge' ]
+
+ Antonym: abundant
+
+Sparse is a semantic parser of source files: it's neither a compiler
+(although it could be used as a front-end for one) nor is it a
+preprocessor (although it contains as a part of it a preprocessing
+phase).
+
+It is meant to be a small - and simple - library. Scanty and meager,
+and partly because of that easy to use. It has one mission in life:
+create a semantic parse tree for some arbitrary user for further
+analysis. It's not a tokenizer, nor is it some generic context-free
+parser. In fact, context (semantics) is what it's all about - figuring
+out not just what the grouping of tokens are, but what the _types_ are
+that the grouping implies.
+
+And no, it doesn't use lex and yacc (or flex and bison). In my personal
+opinion, the result of using lex/yacc tends to end up just having to
+fight the assumptions the tools make.
+
+The parsing is done in three phases:
+
+ - full-file tokenization
+ - pre-processing (which can cause another tokenization phase of another
+ file)
+ - semantic parsing.
+
+Note the "full file" part. Partly for efficiency, but mostly for ease of
+use, there are no "partial results". The library completely parses one
+whole source file, and builds up the _complete_ parse tree in memory.
+
+This means that a user of the library will literally just need to do
+
+ struct token *token;
+ int fd = open(filename, O_RDONLY);
+ struct symbol_list *list = NULL;
+
+ if (fd < 0)
+ exit_with_complaint();
+
+ // Initialize parse symbols
+ init_symbols();
+
+ // Tokenize the input stream
+ token = tokenize(filename, fd, NULL);
+
+ // Pre-process the stream
+ token = preprocess(token);
+
+ // Parse the resulting C code
+ translation_unit(token, &list);
+
+and he is now done - having a full C parse of the file he opened. The
+library doesn't need any more setup, and once done does not impose any
+more requirements. The user is free to do whatever he wants with the
+parse tree that got built up, and needs not worry about the library ever
+again. There is no extra state, there are no parser callbacks, there is
+only the parse tree that is described by the header files.
+
+The library also contains (as an example user) a few clients that do the
+preprocessing and the parsing and just print out the results. These
+clients were done to verify and debug the library, and also as trivial
+examples of what you can do with the parse tree once it is formed, so
+that users can see how the tree is organized.
diff --git a/lib.c b/lib.c
index 165463f..d44f08f 100644
--- a/lib.c
+++ b/lib.c
@@ -1,5 +1,7 @@
/*
- * Helper routines
+ * 'sparse' library helper routines.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
*/
#include <stddef.h>
#include <stdarg.h>
diff --git a/lib.h b/lib.h
index 5069987..6ad1a77 100644
--- a/lib.h
+++ b/lib.h
@@ -1,5 +1,10 @@
-#ifndef LIST_H
-#define LIST_H
+#ifndef LIB_H
+#define LIB_H
+/*
+ * Basic helper routine descriptions for 'sparse'.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
extern unsigned int hexval(unsigned int c);
diff --git a/parse.c b/parse.c
index 87464f8..3802d62 100644
--- a/parse.c
+++ b/parse.c
@@ -1,11 +1,15 @@
-#ifndef __GNUC__
-typedef int __builtin_va_list;
-#endif
/*
* Stupid C parser, version 1e-6.
*
* Let's see how hard this is to do.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
*/
+
+#ifndef __GNUC__
+typedef int __builtin_va_list;
+#endif
+
#include <stdarg.h>
#include <stdlib.h>
#include <stdio.h>
diff --git a/parse.h b/parse.h
index f95f249..7828b46 100644
--- a/parse.h
+++ b/parse.h
@@ -1,5 +1,10 @@
#ifndef PARSE_H
#define PARSE_H
+/*
+ * Basic parsing data structures. Statements and symbols.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
#include "symbol.h"
diff --git a/pre-process.c b/pre-process.c
index aa73dae..8dd0a68 100644
--- a/pre-process.c
+++ b/pre-process.c
@@ -3,6 +3,8 @@
* the tokenizer.
*
* This may not be the smartest preprocessor on the planet.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
*/
#include <stdio.h>
#include <stdlib.h>
diff --git a/scope.c b/scope.c
index 50edcad..c13fe5e 100644
--- a/scope.c
+++ b/scope.c
@@ -2,6 +2,8 @@
* Symbol scoping.
*
* This is pretty trivial.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
*/
#include <stdlib.h>
#include <string.h>
diff --git a/scope.h b/scope.h
index f564018..b0006a8 100644
--- a/scope.h
+++ b/scope.h
@@ -1,5 +1,10 @@
#ifndef SCOPE_H
#define SCOPE_H
+/*
+ * Symbol scoping is pretty simple.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
struct scope {
struct token *token; /* Scope start information */
diff --git a/symbol.c b/symbol.c
index ba2096d..c0f71d5 100644
--- a/symbol.c
+++ b/symbol.c
@@ -1,3 +1,8 @@
+/*
+ * Symbol lookup and handling.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
#include <stdlib.h>
#include <stdio.h>
#include <stdlib.h>
diff --git a/symbol.h b/symbol.h
index 3934e27..24a8b60 100644
--- a/symbol.h
+++ b/symbol.h
@@ -1,5 +1,10 @@
#ifndef SEMANTIC_H
#define SEMANTIC_H
+/*
+ * Basic symbol and namespace definitions.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
#include "token.h"
diff --git a/test-lexing.c b/test-lexing.c
index cb076e8..1cd82c9 100644
--- a/test-lexing.c
+++ b/test-lexing.c
@@ -1,3 +1,9 @@
+/*
+ * Example test program that just uses the tokenization and
+ * preprocessing phases, and prints out the results.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
#include <stdarg.h>
#include <stdlib.h>
#include <stdio.h>
diff --git a/test-parsing.c b/test-parsing.c
index 4b9a931..154b11a 100644
--- a/test-parsing.c
+++ b/test-parsing.c
@@ -1,3 +1,10 @@
+/*
+ * Example trivial client program that uses the sparse library
+ * to tokenize, pre-process and parse a C file, and prints out
+ * the results.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
#include <stdarg.h>
#include <stdlib.h>
#include <stdio.h>
diff --git a/token.h b/token.h
index 7e7d9de..a5f24e0 100644
--- a/token.h
+++ b/token.h
@@ -1,5 +1,12 @@
#ifndef TOKEN_H
#define TOKEN_H
+/*
+ * Basic tokenization structures. NOTE! Those tokens had better
+ * be pretty small, since we're going to keep them all in memory
+ * indefinitely.
+ *
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
+ */
#include <sys/types.h>
diff --git a/tokenize.c b/tokenize.c
index aee4248..a2a63bc 100644
--- a/tokenize.c
+++ b/tokenize.c
@@ -1,10 +1,8 @@
/*
- * This is a really stupid C tokenizer, intended to run after the
- * preprocessor.
+ * This is a really stupid C tokenizer. It doesn't do any include
+ * files or anything complex at all. That's the pre-processor.
*
- * A smart preprocessor would be integrated and pass the compiler
- * the tokenized input directly, but lacking that we just tokenize
- * the preprocessor output.
+ * Copyright (C) 2003 Linus Torvalds, all rights reserved.
*/
#include <stdio.h>
#include <stdlib.h>