Skip to content

Commit b709398

Browse files
committed
Add some doc for how loading works
1 parent c891ee1 commit b709398

File tree

1 file changed

+179
-0
lines changed

1 file changed

+179
-0
lines changed

docs/what-a-load-of-clojure.md

+179
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# Notes on loading/compiling
2+
3+
## The interface in clojure.core
4+
5+
We start in `core.clj` and then trace our way into the underlying code (C#).
6+
7+
There is a quite long and elaborate set of functions in `core.clj` that relate to loading in its various guises.
8+
That code currently starts at line 5862, and probably won't be far from there. Look for
9+
10+
```
11+
;;;;;;;;;;; require/use/load, contributed by Stephen C. Gilardi ;;;;;;;;;;;;;;;;;;
12+
```
13+
14+
The two functions to start looking at are `compile` and `load`, at the bottom of this section of code.
15+
The `load` function is actually used by functions preceding it in the file -- there is a `forward` declaration up at the beginning.
16+
17+
```
18+
(defn compile
19+
"Compiles the namespace named by the symbol lib into a set of
20+
classfiles. The source for the lib must be in a proper
21+
classpath-relative directory. The output files will go into the
22+
directory specified by *compile-path*, and that directory too must
23+
be in the classpath."
24+
{:added "1.0"}
25+
[lib]
26+
(binding [*compile-files* true]
27+
(load-one lib true true))
28+
lib)
29+
```
30+
31+
`load-one` essentially calls `load` and then does some checks and some bookkeeping.
32+
So, essentially, `compile` just calls `load` with the `*compile-files*` flag set to `true`.
33+
34+
Note in the comment the reference to `*compile-path*` and the suspicious statements
35+
"source for the lib must be in a proper classpath-relative directory" and "that directory too must be on the classpath".
36+
We'll need appropriate translation for the .Net world.
37+
38+
Note also that the argument to `compile` is a symbol, as opposed to the string used in `load`.
39+
`load-one` does that translation: the symbol `a-b.c.d` would convert to `"/a_b/c/d"`.
40+
41+
Now on to `load`.
42+
43+
```
44+
(defn load
45+
"Loads Clojure code from resources in classpath. A path is interpreted as
46+
classpath-relative if it begins with a slash or relative to the root
47+
directory for the current namespace otherwise."
48+
{:redef true
49+
:added "1.0"}
50+
[& paths]
51+
(doseq [^String path paths]
52+
(let [^String path (if (.StartsWith path "/")
53+
path
54+
(str (root-directory (ns-name *ns*)) \/ path))]
55+
(when *loading-verbosely*
56+
(printf "(clojure.core/load \"%s\")\n" path)
57+
(flush))
58+
(check-cyclic-dependency path)
59+
(when-not (= path (first *pending-paths*))`
60+
(binding [*pending-paths* (conj *pending-paths* path)]
61+
(clojure.lang.RT/load (.Substring path 1)))))))
62+
```
63+
64+
Basically some name hacking and some bookkeeping, then a call to `clojure.lang.RT.load`. (Note that `load` take a seq of strings, doing the load for each one.)
65+
We see here the distinction made between the input argument (a string) being "classpath-relative if it begins with a slash or relative to the root
66+
directory for the current namespace otherwise." What does that mean?
67+
68+
If a supplied path looks like `"/a_b/c/d"`, i.e., starting with a slash, it will be used as is.
69+
If there is no initial slash, say `"a_b/c/d"`, then the root of the current namespace is prepended.
70+
If the current namespace is `my-precious.sss`, you will end up with the path `"/my_precious/sss/a_b/c/d"`.
71+
72+
(Make sure you've already converted your hyphens to underscores before you get here. The conversion will be done on the namespace name for you, but not on the path you supplied.)
73+
74+
The binding of `*pending-path*` here is to record where we are in case we have load calls during this load -- we want to avoid trying to load recursively.
75+
76+
## In the C# code: what we look for
77+
78+
So we now transition to the C# code of `clojure.lang.RT.load`. Note that is called without the leading slash.
79+
80+
This stuff is so arcane, I can barely understand it myself. Needs to be cleaned up, which is to say, rethought completely, when I do the rewrite.
81+
82+
Given a string representing a path as shown above, say `"a_b/c/d"` (remember no leading slash on the input here),
83+
what are we looking for and where are looking for it?
84+
85+
We are looking for either source files or assemblies, with preference given to whichever is newer.
86+
The source files will be named one of
87+
88+
1. `a_b.c.d.clj`
89+
2. `a_b.c.d.cljc`
90+
91+
The assemblies will be named one of
92+
93+
3. `a_b.c.d.clj.dll`
94+
4. `a_b.c.d.cljc.dll`
95+
96+
>>> Convention #1: When a .clj(c) source file is compiled, the compiled assembly with the same name.
97+
98+
We check for all four of these.
99+
100+
>>> Convention #2: It is assumed that there will not be both a `.clj` and `.cljc` version of either kind.
101+
102+
If a DLL exists and either the source file does not exist or the DLL is newer, then we call `Compiler.LoadAssembly` on it. (See below.)
103+
Else if a source file exists, we either call `RT.Compile` or `RT.LoadScript` on it, depending on the the value of `*compile-files*`. (Also see below.)
104+
105+
If neither is found, we are not done. There are two more possibilities.
106+
107+
5. There might be a type called `__INIT__a_b$c$d` located in some loaded assembly.
108+
(Yes, we look at all loaded assemblies. I eventually would like to make this look at a more restricted set.)
109+
If it exists, we call the `Initialize` static method on that type.
110+
111+
>>> Convention #3: The work of initializing a compiled Clojure assembly is done by calling the `Initialize` method of type in the assembly named
112+
as `__INIT__<name>`, where the `<name>` comes from the name of the source file/assembly with periods replaced by dollar signs.
113+
114+
6. We look for an embedded resource of the appropriate name in all loaded assemblies. (as in (5), I'd eventually like to restrict which assemblies we look at.)
115+
This is done because of how we chose to deliver Nuget packages for libraries: library source files are included as embedded resources.
116+
There are two types of resources we look for, distinguished by name. One is an embedded assembly, one is an embedded text file. They are distinguished by name.
117+
118+
I'm not sure if I even have any cases where an embedded assembly is found and I think there may be a bug there created when `.cljc` extensions were introduced.
119+
At any rate, it appears now to look only for an embedded resource named `a_b.c.d.cljc.dll`.
120+
If it finds it, it uses an overload of `System.Reflection.Assembly.Load` that takes a byte array to load the assembly,
121+
then initializes it as described above.
122+
123+
Otherwise we look for an embedded resource named either `a_b.c.d.clj` or `a_b.c.d.cljc`, which it treats like a source file and calls either `RT.Compile` or `RT.LoadScript`, depending.
124+
125+
>>> Convention #4: an assembly or source file named appropriately as an embedded resource in a loaded assembly will be loaded.
126+
However, if something on the file system is found, that takes precedence.
127+
(Maybe all possibilities should be looked at and an error declared if more than one exists.)
128+
129+
## Where do we look
130+
131+
Where on the file system do we look. The answer is in `RT.FindFile`.
132+
Well, actually that just interates throught the result of `RT.GetFindFilePaths()` to find the directories to search.
133+
And what are those directories?
134+
135+
1. `System.AppDomain.CurrentDomain.BaseDirectory` -- where the Clojure executable resides
136+
2. `System.AppDomain.CurrentDomain.BaseDirectory` + `\bin` -- I don't know why
137+
3. `Directory.GetCurrentDirectory()` -- the current working directory of the application
138+
4. `Path.GetDirectoryName(typeof(RT).Assembly.Location)` -- where the Clojure.dll assembly is located. Not sure if that is every different from (1).
139+
5. The directory of the assembly ` Assembly.GetEntryAssembly()` -- this was added for some case I don't remember
140+
6. The set of paths that is the value of the environment variable `CLOJURE_LOAD_PATH`. This is my workaround for not having the equivalent of the Java classpath.
141+
142+
There is one final option. I no longer remember why this exists. I do not know if it is ever used.
143+
In `core-clr.clj` there is this:
144+
145+
```
146+
(defn add-ns-load-mapping
147+
"Convenience function to assist with loading .clj files embedded in
148+
C# projects. ns-root specifies part of a namespace such as MyNamespace.A and
149+
fs-root specifies the filesystem location in which to look for files within that
150+
namespace. For example, if MyNamespace.A mapped to MyNsA would allow
151+
MyNamespace.A.B to be loaded from MyNsA\\B.clj. When a .clj file is marked as an
152+
embedded resource in a C# project, it will be stored in the resulting .dll with
153+
the default project namespace prefixed to its path. To allow these files to
154+
be loaded dynamically during development, the paths to these files can be mapped
155+
to allow them to be loaded from a different directory other than their root namespace
156+
(i.e. the common case where the project directory is different from its default
157+
namespace)."
158+
{:added "1.5"}
159+
[^String ns-root ^String fs-root]
160+
(swap! *ns-load-mappings* conj
161+
[(.Replace ns-root "." "/") fs-root]))
162+
```
163+
164+
Essentially, the variable `*ns-load-mappings` is a sequence of two-element vectors, each vector having the form of `["MyNamespace.A", "MyNSA"]`.
165+
if we are looking for a source file for `MyNamespace/A/B.clj`, say, we will see if `MyNSA/B.clj` exists. Again, I have no idea if this is ever used.
166+
167+
>>> Convention #(I don't want to dignify this): we can map namespaces to directories to search.
168+
169+
## The missing pieces
170+
171+
As promised above:
172+
173+
1. `Compiler.LoadAssembly` -- pretty straightforward. Just call `System.Reflection.LoadFrom` on the located assembly, then call its initialization method, as decribed above.
174+
2. `RT.Compile` calls `Compiler.Compile` on appropriate arguments. The var `*compile-path*` must be set.This creates a dynamic assembly, compiles all the forms from the source file into that assembly (as well as evaluating them), then saves the assembly.
175+
3. `RT.LoadScript` calls `Compiler.load` which iterates through all the forms and evaluates them. (They still have to be compiled in order to do this. There is just less work to do and some special handling for `do` and `def` forms.
176+
177+
178+
179+

0 commit comments

Comments
 (0)