|
| 1 | +# Notes on loading/compiling |
| 2 | + |
| 3 | +## The interface in clojure.core |
| 4 | + |
| 5 | +We start in `core.clj` and then trace our way into the underlying code (C#). |
| 6 | + |
| 7 | +There is a quite long and elaborate set of functions in `core.clj` that relate to loading in its various guises. |
| 8 | +That code currently starts at line 5862, and probably won't be far from there. Look for |
| 9 | + |
| 10 | +``` |
| 11 | +;;;;;;;;;;; require/use/load, contributed by Stephen C. Gilardi ;;;;;;;;;;;;;;;;;; |
| 12 | +``` |
| 13 | + |
| 14 | +The two functions to start looking at are `compile` and `load`, at the bottom of this section of code. |
| 15 | +The `load` function is actually used by functions preceding it in the file -- there is a `forward` declaration up at the beginning. |
| 16 | + |
| 17 | +``` |
| 18 | +(defn compile |
| 19 | + "Compiles the namespace named by the symbol lib into a set of |
| 20 | + classfiles. The source for the lib must be in a proper |
| 21 | + classpath-relative directory. The output files will go into the |
| 22 | + directory specified by *compile-path*, and that directory too must |
| 23 | + be in the classpath." |
| 24 | + {:added "1.0"} |
| 25 | + [lib] |
| 26 | + (binding [*compile-files* true] |
| 27 | + (load-one lib true true)) |
| 28 | + lib) |
| 29 | +``` |
| 30 | + |
| 31 | +`load-one` essentially calls `load` and then does some checks and some bookkeeping. |
| 32 | +So, essentially, `compile` just calls `load` with the `*compile-files*` flag set to `true`. |
| 33 | + |
| 34 | +Note in the comment the reference to `*compile-path*` and the suspicious statements |
| 35 | +"source for the lib must be in a proper classpath-relative directory" and "that directory too must be on the classpath". |
| 36 | +We'll need appropriate translation for the .Net world. |
| 37 | + |
| 38 | +Note also that the argument to `compile` is a symbol, as opposed to the string used in `load`. |
| 39 | +`load-one` does that translation: the symbol `a-b.c.d` would convert to `"/a_b/c/d"`. |
| 40 | + |
| 41 | +Now on to `load`. |
| 42 | + |
| 43 | +``` |
| 44 | +(defn load |
| 45 | + "Loads Clojure code from resources in classpath. A path is interpreted as |
| 46 | + classpath-relative if it begins with a slash or relative to the root |
| 47 | + directory for the current namespace otherwise." |
| 48 | + {:redef true |
| 49 | + :added "1.0"} |
| 50 | + [& paths] |
| 51 | + (doseq [^String path paths] |
| 52 | + (let [^String path (if (.StartsWith path "/") |
| 53 | + path |
| 54 | + (str (root-directory (ns-name *ns*)) \/ path))] |
| 55 | + (when *loading-verbosely* |
| 56 | + (printf "(clojure.core/load \"%s\")\n" path) |
| 57 | + (flush)) |
| 58 | + (check-cyclic-dependency path) |
| 59 | + (when-not (= path (first *pending-paths*))` |
| 60 | + (binding [*pending-paths* (conj *pending-paths* path)] |
| 61 | + (clojure.lang.RT/load (.Substring path 1))))))) |
| 62 | +``` |
| 63 | + |
| 64 | +Basically some name hacking and some bookkeeping, then a call to `clojure.lang.RT.load`. (Note that `load` take a seq of strings, doing the load for each one.) |
| 65 | +We see here the distinction made between the input argument (a string) being "classpath-relative if it begins with a slash or relative to the root |
| 66 | +directory for the current namespace otherwise." What does that mean? |
| 67 | + |
| 68 | +If a supplied path looks like `"/a_b/c/d"`, i.e., starting with a slash, it will be used as is. |
| 69 | +If there is no initial slash, say `"a_b/c/d"`, then the root of the current namespace is prepended. |
| 70 | +If the current namespace is `my-precious.sss`, you will end up with the path `"/my_precious/sss/a_b/c/d"`. |
| 71 | + |
| 72 | +(Make sure you've already converted your hyphens to underscores before you get here. The conversion will be done on the namespace name for you, but not on the path you supplied.) |
| 73 | + |
| 74 | +The binding of `*pending-path*` here is to record where we are in case we have load calls during this load -- we want to avoid trying to load recursively. |
| 75 | + |
| 76 | +## In the C# code: what we look for |
| 77 | + |
| 78 | +So we now transition to the C# code of `clojure.lang.RT.load`. Note that is called without the leading slash. |
| 79 | + |
| 80 | +This stuff is so arcane, I can barely understand it myself. Needs to be cleaned up, which is to say, rethought completely, when I do the rewrite. |
| 81 | + |
| 82 | +Given a string representing a path as shown above, say `"a_b/c/d"` (remember no leading slash on the input here), |
| 83 | +what are we looking for and where are looking for it? |
| 84 | + |
| 85 | +We are looking for either source files or assemblies, with preference given to whichever is newer. |
| 86 | +The source files will be named one of |
| 87 | + |
| 88 | +1. `a_b.c.d.clj` |
| 89 | +2. `a_b.c.d.cljc` |
| 90 | + |
| 91 | +The assemblies will be named one of |
| 92 | + |
| 93 | +3. `a_b.c.d.clj.dll` |
| 94 | +4. `a_b.c.d.cljc.dll` |
| 95 | + |
| 96 | +>>> Convention #1: When a .clj(c) source file is compiled, the compiled assembly with the same name. |
| 97 | +
|
| 98 | +We check for all four of these. |
| 99 | + |
| 100 | +>>> Convention #2: It is assumed that there will not be both a `.clj` and `.cljc` version of either kind. |
| 101 | +
|
| 102 | +If a DLL exists and either the source file does not exist or the DLL is newer, then we call `Compiler.LoadAssembly` on it. (See below.) |
| 103 | +Else if a source file exists, we either call `RT.Compile` or `RT.LoadScript` on it, depending on the the value of `*compile-files*`. (Also see below.) |
| 104 | + |
| 105 | +If neither is found, we are not done. There are two more possibilities. |
| 106 | + |
| 107 | +5. There might be a type called `__INIT__a_b$c$d` located in some loaded assembly. |
| 108 | +(Yes, we look at all loaded assemblies. I eventually would like to make this look at a more restricted set.) |
| 109 | +If it exists, we call the `Initialize` static method on that type. |
| 110 | + |
| 111 | +>>> Convention #3: The work of initializing a compiled Clojure assembly is done by calling the `Initialize` method of type in the assembly named |
| 112 | +as `__INIT__<name>`, where the `<name>` comes from the name of the source file/assembly with periods replaced by dollar signs. |
| 113 | +
|
| 114 | +6. We look for an embedded resource of the appropriate name in all loaded assemblies. (as in (5), I'd eventually like to restrict which assemblies we look at.) |
| 115 | +This is done because of how we chose to deliver Nuget packages for libraries: library source files are included as embedded resources. |
| 116 | +There are two types of resources we look for, distinguished by name. One is an embedded assembly, one is an embedded text file. They are distinguished by name. |
| 117 | + |
| 118 | +I'm not sure if I even have any cases where an embedded assembly is found and I think there may be a bug there created when `.cljc` extensions were introduced. |
| 119 | +At any rate, it appears now to look only for an embedded resource named `a_b.c.d.cljc.dll`. |
| 120 | +If it finds it, it uses an overload of `System.Reflection.Assembly.Load` that takes a byte array to load the assembly, |
| 121 | +then initializes it as described above. |
| 122 | + |
| 123 | +Otherwise we look for an embedded resource named either `a_b.c.d.clj` or `a_b.c.d.cljc`, which it treats like a source file and calls either `RT.Compile` or `RT.LoadScript`, depending. |
| 124 | + |
| 125 | +>>> Convention #4: an assembly or source file named appropriately as an embedded resource in a loaded assembly will be loaded. |
| 126 | +However, if something on the file system is found, that takes precedence. |
| 127 | +(Maybe all possibilities should be looked at and an error declared if more than one exists.) |
| 128 | + |
| 129 | +## Where do we look |
| 130 | + |
| 131 | +Where on the file system do we look. The answer is in `RT.FindFile`. |
| 132 | +Well, actually that just interates throught the result of `RT.GetFindFilePaths()` to find the directories to search. |
| 133 | +And what are those directories? |
| 134 | + |
| 135 | +1. `System.AppDomain.CurrentDomain.BaseDirectory` -- where the Clojure executable resides |
| 136 | +2. `System.AppDomain.CurrentDomain.BaseDirectory` + `\bin` -- I don't know why |
| 137 | +3. `Directory.GetCurrentDirectory()` -- the current working directory of the application |
| 138 | +4. `Path.GetDirectoryName(typeof(RT).Assembly.Location)` -- where the Clojure.dll assembly is located. Not sure if that is every different from (1). |
| 139 | +5. The directory of the assembly ` Assembly.GetEntryAssembly()` -- this was added for some case I don't remember |
| 140 | +6. The set of paths that is the value of the environment variable `CLOJURE_LOAD_PATH`. This is my workaround for not having the equivalent of the Java classpath. |
| 141 | + |
| 142 | +There is one final option. I no longer remember why this exists. I do not know if it is ever used. |
| 143 | +In `core-clr.clj` there is this: |
| 144 | + |
| 145 | +``` |
| 146 | +(defn add-ns-load-mapping |
| 147 | + "Convenience function to assist with loading .clj files embedded in |
| 148 | + C# projects. ns-root specifies part of a namespace such as MyNamespace.A and |
| 149 | + fs-root specifies the filesystem location in which to look for files within that |
| 150 | + namespace. For example, if MyNamespace.A mapped to MyNsA would allow |
| 151 | + MyNamespace.A.B to be loaded from MyNsA\\B.clj. When a .clj file is marked as an |
| 152 | + embedded resource in a C# project, it will be stored in the resulting .dll with |
| 153 | + the default project namespace prefixed to its path. To allow these files to |
| 154 | + be loaded dynamically during development, the paths to these files can be mapped |
| 155 | + to allow them to be loaded from a different directory other than their root namespace |
| 156 | + (i.e. the common case where the project directory is different from its default |
| 157 | + namespace)." |
| 158 | + {:added "1.5"} |
| 159 | + [^String ns-root ^String fs-root] |
| 160 | + (swap! *ns-load-mappings* conj |
| 161 | + [(.Replace ns-root "." "/") fs-root])) |
| 162 | +``` |
| 163 | + |
| 164 | +Essentially, the variable `*ns-load-mappings` is a sequence of two-element vectors, each vector having the form of `["MyNamespace.A", "MyNSA"]`. |
| 165 | +if we are looking for a source file for `MyNamespace/A/B.clj`, say, we will see if `MyNSA/B.clj` exists. Again, I have no idea if this is ever used. |
| 166 | + |
| 167 | +>>> Convention #(I don't want to dignify this): we can map namespaces to directories to search. |
| 168 | +
|
| 169 | +## The missing pieces |
| 170 | + |
| 171 | +As promised above: |
| 172 | + |
| 173 | +1. `Compiler.LoadAssembly` -- pretty straightforward. Just call `System.Reflection.LoadFrom` on the located assembly, then call its initialization method, as decribed above. |
| 174 | +2. `RT.Compile` calls `Compiler.Compile` on appropriate arguments. The var `*compile-path*` must be set.This creates a dynamic assembly, compiles all the forms from the source file into that assembly (as well as evaluating them), then saves the assembly. |
| 175 | +3. `RT.LoadScript` calls `Compiler.load` which iterates through all the forms and evaluates them. (They still have to be compiled in order to do this. There is just less work to do and some special handling for `do` and `def` forms. |
| 176 | + |
| 177 | + |
| 178 | + |
| 179 | + |
0 commit comments