Skip to content

Commit d9eb33d

Browse files
committed
[Java] Permit block field access whenever it is safe (and, unfortunately, in some cases where it is not).
This commit makes the field order checks more-lenient. The aim is to allow the application code to access fields in the root/top-level block or blocks inside groups whenever it is safe. However, the approach is broken and does not work in general. To (try to) achieve this aim, we rely on the fact that once a block is positioned correctly, it is safe to access during the remainder of encoding/decoding. For example, the following code is safe: ``` encoder.wrap(...); encoder.blockField1(42); encoder.varData("abc"); encoder.blockField2(43); ``` As soon as `wrap(...)` is called, the top-level block will be positioned over a valid region for the remainder of encoding. Groups are slightly trickier. The following code is _safe_: ``` encoder.wrap(...); Foo.BarEncoder bar = encoder.groupBarCount(2); bar.next() .x(1).y(2) .next(); .x(3).y(4); ``` But the following code is _unsafe_: ``` encoder.wrap(...); Foo.BarEncoder bar = encoder.groupBarCount(2); bar.x(1).y(2) .next(); .x(3).y(4); ``` And the following code is _unsafe_ too: ``` encoder.wrap(...) Foo.BarEncoder bar = encoder.groupBarCount(0); encoder.baz("abc"); bar.x(1).y(2); ``` The `offset` for the group encoder is only set when `next()` is called. Therefore, calling `x(1)` and `y(2)` will likely overwrite some data elsewhere in the buffer (probably near offset 0). We can still rely on the fact that once `next()` has been called - even if it is called multiple times, or too many times (causing it to throw) - the group's block fields will be safe to encode/decode. Thinking about this in terms of our state machine that models codec state and field accesses, we can put this another way. There is at least one transition that represents the `next()` method call. The states (transitively) reachable from these transitions can safely access the associated group block. As already mentioned, despite the passing tests, the approach in this commit doesn't always work. I had thought it would be possible to assign a number to each state as we find them using DFS and use this number as a shortcut to determining whether a block access was safe. For example, in each block field we would check: ``` if (currentState.number() >= firstStateAfterNextIsCalled.number()) { // ALLOW } else { // DENY } ``` However, (even ignoring the cycles due to repeating groups and the use of DFS) this solution makes the following assumptions: 1. There is a single state: `firstStateAfterNextIsCalled`, such that all subsequent transitively reachable states are safe. 2. It is possible to number the states in such a way that `numberOfState(b) >= numberOfState(a)` implies `b` is (transitively) reachable from `a`. Let's consider the following psuedo-schema w.r.t. our assumptions above: ``` group a { x: int32 } group b { y: int32 } ``` Can we set `x` if we've called `BEncoder#next`? Sometimes. The following code is invalid. ``` AEncoder a = encoder.aCount(0); BEncoder b = encoder.bCount(1).next(); b.y(2); a.x(1); // Whoops! We overwrote some data somewhere. ``` But this code is valid. ``` AEncoder a = encoder.aCount(1).next(); BEncoder b = encoder.bCount(1).next(); b.y(2); a.x(1); ``` Therefore, multiple transitions must exist for the `BEncoder#next` call with different exit states, which breaks Assumption 1. Assumption 2 is also problematic. As alluded to above, we must have a bifurcation of states depending on whether a group has elements or is empty. With arbitrary bifurcations, it is not possible to number the states in such a way that Assumption 2 holds. Consider the following state diagram: ``` state N / \ empty non-empty / \ state X state Y ``` Depending on whether a group is empty, e.g., `aCount(0)`, or not, e.g., `aCount(1)`, we transition to either `state X` or `state Y` from `state N`. For Assumption 2 to hold, we need `numberOf(state X) >= numberOf(state N)` (reachable) and `numberOf(state Y) >= numberOf(state N)` (reachable) and `numberOf(state X) < numberOf(state Y)` (unreachable) and `numberOf(state Y) < numberOf(state X)` (unreachable). Clearly, this is impossible. Prior to this commit, we avoided such problems by enforcing a more-strict ordering of field accesses. With this more-strict ordering, setting fields in a group block in later states was not allowed. Therefore, the state machine didn't have to have to split into mutually exclusive subgraphs of reachable states depending on whether or not the group is empty. For example, let's consider a psuedo-schema with two groups. ``` group a { x: int32 } group b { y: int32 } ``` Once the application code calls `encoder.bCount(2).next()`, our state machine will enter the `V0_B_N_BLOCK` state regardless of whether previously the application code called `encoder.aCount(0)` or `encoder.aCount(1).next()`. In `V_B_N_BLOCK` the application code is not allowed to call `AEncoder#x`; therefore, it is safe, i.e., it has no false positives. However, it is also too-strict, i.e., it has false negatives, e.g., the `encoder.aCount(1).next()` case. How likely are these false negatives to come up in the "wild"? If we think false negatives will often come up, is it possible to remove them whilst still using a state machine? We could consider generating more states to represent the reachable states under each fork of group emptiness. The number of states would grow exponentially w.r.t. the number of top-level groups, but let's ignore that for the moment and just consider whether or not it is possible. Continuing the example above with two groups, when the application code calls `encoder.bCount(2).next()`, the state it enters would depend on whether it had previously called `encoder.aCount(0)` or `encoder.aCount(n)...next()` for some value `n > 0`. To safely model this example, we'd need at least the following states: 1. has no `a` elements 2. has no `a` elements, and has no `b` elements 3. has `a` elements, but has not called `AEncoder#next` 4. has `a` elements, and has called `AEncoder#next` 5. has `a` elements, has called `AEncoder#next`, has `b` elements, but has not called `BEncoder#next` 6. has `a` elements, has called `AEncoder#next`, has `b` elements, and has called `BEncoder#next` 7. has no `a` elements, has `b` elements, but has not called `BEncoder#next` 8. has no `a` elements, has `b` elements, and has called `BEncoder#next` A trickier case is with nested groups: ``` group foo { x: int32 group bar { y: int32 } r: string s: string } ``` When is `Bar#y` safe to call? I believe it is safe to call in states where `FooEncoder#next` and `BarEncoder#next` have been called in-order more recently than `BarEncoder#count(0)`. Note, as `bar` is a group within `foo`, the `BarEncoder#next` and `BarEncoder#count` methods may be called repeatedly during the encoding of a single message. Implementing a more-permissive safety check, using this model, would entail checking that the current state of the encoder/decoder is in a state where this condition holds. To safely model this example, we'd need (at least) the following states: 1. has no `foo` elements 2. has `foo` elements, but has not called `FooEncoder#next` 3. has `foo` elements, and has called `FooEncoder#next` 4. has `foo` elements, has called `FooEncoder#next`, and has no `bar` elements 5. has `foo` elements, has called `FooEncoder#next`, has no `bar` elements, and has encoded `r` 6. has `foo` elements, has called `FooEncoder#next`, has no `bar` elements, and has encoded `s` 7. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, but has not called `BarEncoder#next` 8. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, has not called `BarEncoder#next`, and has encoded `r` 9. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, has not called `BarEncoder#next`, and has encoded `s` 10. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next` 11. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next`, and has encoded `r` 12. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next`, and has encoded `s` ``` FooEncoder foo = encoder.fooCount(3); // State 2 foo.next(); // State 3 foo.barCount(0); // State 4 foo.r("abc"); // State 5 foo.s("def"); // State 6 foo.next(); // State 4 BarEncoder bar = foo.barCount(1); // State 7 foo.r("abc"); // State 8 bar.next(); // State 11 bar.y(42); // State 11 foo.s("def"); // State 12 foo.next(); // State X? bar.y(43); // Weird but allowed. foo.barCount(0); // State Y? State X cannot be State 10, as State 10 should not allow _re-encoding_ of the bar group count ``` OK. We didn't have enough states. In the case where we "wrap back around", we need to model whether `bar` block access is allowed (State 13 vs State 3): 13. has `foo` elements, has called `FooEncoder#next`, has called `BarEncoder#next` for last iteration, but hasn't set `bar` group size in this iteration. An even trickier case is with doubly-nested groups: ``` group foo { x: int32 group bar { y: int32 group baz { z: int32 } } } ``` When is `Baz#z` safe to call? I believe it is safe to call in states where `FooEncoder#next`, `BarEncoder#next` and `BazEncoder#next` have been called in-order more recently than `BarEncoder#count(0)` or `BazEncoder#count(0). Encoding this in states is likely to result in more cases like State 13 above. Generalising: a group block field may be accessed only when all outer groups' and its own `#next` methods have been called in-order more recently than any outer groups' or its own `#count` method with a `0` argument. Can model "more-recently" exactly, i.e., with no false positives and no false negatives, using a state machine? Another way of looking at the state machines we generate, is as parsers for a language. Where our language, based on an SBE schema, comprises valid "strings" of calls to an encoder/decoder. Is this language a regular language? If not, representing it using a FSM will not work. Cue Pumping Lemma? Next steps: - Add more tests to show where this approach breaks - Revert the changes and see if it is acceptable that some valid uses of codecs will be prevented when enabling field order checking. I.e., it will give false negatives when testing the hypothesis "this encoding/decoding is valid". - Personally, I think false negatives are better than false positives. (False positives are possible as of this commit.) Having said that, it will discourage use of these checks in existing systems where valid code already exists that does not pass the checks.
1 parent 0e05c73 commit d9eb33d

File tree

8 files changed

+331
-102
lines changed

8 files changed

+331
-102
lines changed

sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/java/FieldOrderModel.java

+40-9
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
final class FieldOrderModel
3636
{
3737
private final Int2ObjectHashMap<State> states = new Int2ObjectHashMap<>();
38-
private final Map<Token, TransitionGroup> transitions = new LinkedHashMap<>();
38+
private final Map<Token, FieldTransitions> transitions = new LinkedHashMap<>();
3939
private final Int2ObjectHashMap<State> versionWrappedStates = new Int2ObjectHashMap<>();
4040
private final Set<String> reservedNames = new HashSet<>();
4141
private final State notWrappedState = allocateState("NOT_WRAPPED");
@@ -71,13 +71,24 @@ public void forEachState(final Consumer<State> consumer)
7171

7272
public List<Transition> getTransitions(final TransitionContext context, final Token token)
7373
{
74-
final TransitionGroup transitionGroup = transitions.get(token);
75-
if (transitionGroup == null)
74+
final FieldTransitions fieldTransitions = transitions.get(token);
75+
if (fieldTransitions == null)
7676
{
7777
return Collections.emptyList();
7878
}
7979

80-
return transitionGroup.transitions.get(context);
80+
return fieldTransitions.transitions.get(context);
81+
}
82+
83+
public State minimumEntryState(final Token token)
84+
{
85+
final FieldTransitions fieldTransitions = transitions.get(token);
86+
if (fieldTransitions == null)
87+
{
88+
return null;
89+
}
90+
91+
return fieldTransitions.minimumEntryState();
8192
}
8293

8394
public static FieldOrderModel findTransitions(
@@ -128,8 +139,8 @@ public void generateGraph(
128139
final String indent)
129140
{
130141
sb.append(indent).append("digraph G {\n");
131-
transitions.values().forEach(transitionGroup ->
132-
transitionGroup.transitions.forEach((context, transitions1) ->
142+
transitions.values().forEach(fieldTransitions ->
143+
fieldTransitions.transitions.forEach((context, transitions1) ->
133144
{
134145
transitions1.forEach(transition ->
135146
{
@@ -382,9 +393,9 @@ private void allocateTransition(
382393
final List<State> from,
383394
final State to)
384395
{
385-
final TransitionGroup transitionGroup = transitions.computeIfAbsent(token, ignored -> new TransitionGroup());
396+
final FieldTransitions fieldTransitions = transitions.computeIfAbsent(token, ignored -> new FieldTransitions());
386397
final Transition transition = new Transition(description, from, to);
387-
transitionGroup.add(firingContext, transition);
398+
fieldTransitions.add(firingContext, transition);
388399
}
389400

390401
static final class State
@@ -461,7 +472,7 @@ enum TransitionContext
461472
LAST_ELEMENT_IN_GROUP
462473
}
463474

464-
private static final class TransitionGroup
475+
private static final class FieldTransitions
465476
{
466477
private final Map<Object, List<Transition>> transitions = new LinkedHashMap<>();
467478

@@ -490,5 +501,25 @@ public void add(final Object context, final Transition transition)
490501

491502
transitionsForContext.add(transition);
492503
}
504+
505+
public State minimumEntryState()
506+
{
507+
final MutableReference<State> minimumState = new MutableReference<>();
508+
transitions.values().forEach(transitionsForContext ->
509+
{
510+
transitionsForContext.forEach(transition ->
511+
{
512+
transition.from.forEach(state ->
513+
{
514+
if (null == minimumState.get() || state.number < minimumState.get().number)
515+
{
516+
minimumState.set(state);
517+
}
518+
});
519+
});
520+
});
521+
522+
return minimumState.get();
523+
}
493524
}
494525
}

sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/java/JavaGenerator.java

+47-9
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
import org.agrona.MutableDirectBuffer;
2424
import org.agrona.Strings;
2525
import org.agrona.Verify;
26+
import org.agrona.collections.MutableBoolean;
2627
import org.agrona.generation.DynamicPackageOutputManager;
2728
import org.agrona.sbe.*;
2829

@@ -295,10 +296,32 @@ private static CharSequence generateFieldOrderStates(final FieldOrderModel field
295296
sb.append(" * }</pre>\n");
296297
sb.append(" */\n");
297298
sb.append(" private enum CodecState\n")
298-
.append(" {\n");
299+
.append(" {");
300+
final MutableBoolean isFirstState = new MutableBoolean(true);
299301
fieldOrderModel.forEachState(state ->
300-
sb.append(" ").append(unqualifiedStateCase(state)).append(",\n"));
301-
302+
{
303+
if (isFirstState.get())
304+
{
305+
isFirstState.set(false);
306+
}
307+
else
308+
{
309+
sb.append(",");
310+
}
311+
sb.append("\n ").append(unqualifiedStateCase(state))
312+
.append("(").append(state.number()).append(")");
313+
});
314+
sb.append(";\n\n");
315+
316+
sb.append(" private final int stateNumber;\n\n");
317+
sb.append(" CodecState(final int stateNumber)\n");
318+
sb.append(" {\n");
319+
sb.append(" this.stateNumber = stateNumber;\n");
320+
sb.append(" }\n\n");
321+
sb.append(" int stateNumber()\n");
322+
sb.append(" {\n");
323+
sb.append(" return stateNumber;\n");
324+
sb.append(" }\n");
302325
sb.append(" }\n\n");
303326

304327
sb.append(" private CodecState codecState = ")
@@ -372,7 +395,7 @@ private static CharSequence generateFieldOrderStateTransitions(
372395

373396
generateFieldOrderStateTransitions(
374397
sb,
375-
indent + " ",
398+
indent + " ",
376399
fieldOrderModel,
377400
FieldOrderModel.TransitionContext.SELECT_MULTI_ELEMENT_GROUP,
378401
token);
@@ -404,11 +427,26 @@ private static void generateFieldOrderStateTransitions(
404427
.append(indent).append(" break;\n");
405428
});
406429

407-
sb.append(indent).append(" default:\n")
408-
.append(indent).append(" throw new IllegalStateException(")
409-
.append("\"Cannot access field \\\"").append(token.name())
410-
.append("\\\" in state: \" + codecState());\n")
411-
.append(indent).append("}\n");
430+
sb.append(indent).append(" default:\n");
431+
final FieldOrderModel.State minimumEntryState = fieldOrderModel.minimumEntryState(token);
432+
if (token.signal() == Signal.BEGIN_FIELD && minimumEntryState != null)
433+
{
434+
sb.append(indent).append(" if (codecState().stateNumber() < ")
435+
.append(qualifiedStateCase(minimumEntryState)).append(".stateNumber())\n");
436+
sb.append(indent).append(" {\n");
437+
sb.append(indent).append(" throw new IllegalStateException(")
438+
.append("\"Cannot access field \\\"").append(token.name())
439+
.append("\\\" in state: \" + codecState());\n");
440+
sb.append(indent).append(" }\n");
441+
}
442+
else
443+
{
444+
sb.append(indent).append(" throw new IllegalStateException(")
445+
.append("\"Cannot access field \\\"").append(token.name())
446+
.append("\\\" in state: \" + codecState());\n");
447+
}
448+
449+
sb.append(indent).append("}\n");
412450
}
413451

414452
private static CharSequence generateFieldOrderStateTransitionsForNextGroupElement(

sbe-tool/src/main/java/uk/co/real_logic/sbe/ir/generated/FrameCodecDecoder.java

+29-8
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,23 @@ public final class FrameCodecDecoder
3333
*/
3434
private enum CodecState
3535
{
36-
NOT_WRAPPED,
37-
V0_BLOCK,
38-
V0_PACKAGENAME_DONE,
39-
V0_NAMESPACENAME_DONE,
40-
V0_SEMANTICVERSION_DONE,
36+
NOT_WRAPPED(0),
37+
V0_BLOCK(1),
38+
V0_PACKAGENAME_DONE(2),
39+
V0_NAMESPACENAME_DONE(3),
40+
V0_SEMANTICVERSION_DONE(4);
41+
42+
private final int stateNumber;
43+
44+
CodecState(final int stateNumber)
45+
{
46+
this.stateNumber = stateNumber;
47+
}
48+
49+
int stateNumber()
50+
{
51+
return stateNumber;
52+
}
4153
}
4254

4355
private CodecState codecState = CodecState.NOT_WRAPPED;
@@ -249,7 +261,10 @@ public int irId()
249261
codecState(CodecState.V0_BLOCK);
250262
break;
251263
default:
252-
throw new IllegalStateException("Cannot access field \"irId\" in state: " + codecState());
264+
if (codecState().stateNumber() < CodecState.V0_BLOCK.stateNumber())
265+
{
266+
throw new IllegalStateException("Cannot access field \"irId\" in state: " + codecState());
267+
}
253268
}
254269
}
255270

@@ -312,7 +327,10 @@ public int irVersion()
312327
codecState(CodecState.V0_BLOCK);
313328
break;
314329
default:
315-
throw new IllegalStateException("Cannot access field \"irVersion\" in state: " + codecState());
330+
if (codecState().stateNumber() < CodecState.V0_BLOCK.stateNumber())
331+
{
332+
throw new IllegalStateException("Cannot access field \"irVersion\" in state: " + codecState());
333+
}
316334
}
317335
}
318336

@@ -375,7 +393,10 @@ public int schemaVersion()
375393
codecState(CodecState.V0_BLOCK);
376394
break;
377395
default:
378-
throw new IllegalStateException("Cannot access field \"schemaVersion\" in state: " + codecState());
396+
if (codecState().stateNumber() < CodecState.V0_BLOCK.stateNumber())
397+
{
398+
throw new IllegalStateException("Cannot access field \"schemaVersion\" in state: " + codecState());
399+
}
379400
}
380401
}
381402

sbe-tool/src/main/java/uk/co/real_logic/sbe/ir/generated/FrameCodecEncoder.java

+29-8
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,23 @@ public final class FrameCodecEncoder
3333
*/
3434
private enum CodecState
3535
{
36-
NOT_WRAPPED,
37-
V0_BLOCK,
38-
V0_PACKAGENAME_DONE,
39-
V0_NAMESPACENAME_DONE,
40-
V0_SEMANTICVERSION_DONE,
36+
NOT_WRAPPED(0),
37+
V0_BLOCK(1),
38+
V0_PACKAGENAME_DONE(2),
39+
V0_NAMESPACENAME_DONE(3),
40+
V0_SEMANTICVERSION_DONE(4);
41+
42+
private final int stateNumber;
43+
44+
CodecState(final int stateNumber)
45+
{
46+
this.stateNumber = stateNumber;
47+
}
48+
49+
int stateNumber()
50+
{
51+
return stateNumber;
52+
}
4153
}
4254

4355
private CodecState codecState = CodecState.NOT_WRAPPED;
@@ -206,7 +218,10 @@ public FrameCodecEncoder irId(final int value)
206218
codecState(CodecState.V0_BLOCK);
207219
break;
208220
default:
209-
throw new IllegalStateException("Cannot access field \"irId\" in state: " + codecState());
221+
if (codecState().stateNumber() < CodecState.V0_BLOCK.stateNumber())
222+
{
223+
throw new IllegalStateException("Cannot access field \"irId\" in state: " + codecState());
224+
}
210225
}
211226
}
212227

@@ -270,7 +285,10 @@ public FrameCodecEncoder irVersion(final int value)
270285
codecState(CodecState.V0_BLOCK);
271286
break;
272287
default:
273-
throw new IllegalStateException("Cannot access field \"irVersion\" in state: " + codecState());
288+
if (codecState().stateNumber() < CodecState.V0_BLOCK.stateNumber())
289+
{
290+
throw new IllegalStateException("Cannot access field \"irVersion\" in state: " + codecState());
291+
}
274292
}
275293
}
276294

@@ -334,7 +352,10 @@ public FrameCodecEncoder schemaVersion(final int value)
334352
codecState(CodecState.V0_BLOCK);
335353
break;
336354
default:
337-
throw new IllegalStateException("Cannot access field \"schemaVersion\" in state: " + codecState());
355+
if (codecState().stateNumber() < CodecState.V0_BLOCK.stateNumber())
356+
{
357+
throw new IllegalStateException("Cannot access field \"schemaVersion\" in state: " + codecState());
358+
}
338359
}
339360
}
340361

0 commit comments

Comments
 (0)