-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Conversation
Fixes hyperparameter sets universal_transformer_big and universal_transformer_base_tpu.
I can see |
The dimension of the multiplication of the partial targets was wrong: (a, b, c, d) --> (a, b, c, d, a, b, c, d) Correct multiplication needs to be: (a, b, c, d) --> (a, a, b, b, c, c, d, d) This is because it is (batch_size * beam_size) instead of (beam_size * batch_size). Basically, tf.tile needs to be replaced by tf.repeat which is introduced in tf 1.15. This is a workaround for tf 1.14.
The same problem with |
# Workaround for: tf.one_hot( | ||
# tf.repeat(partial_targets[:, i], [beam_size]), vocab_size, 0.0, | ||
# -1e9) | ||
# Can be replaced by the above in future versions (from tf 1.15). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @senarvi -- thanks for the PR, I'm going to be releasing on TF 1.15 really soon.
Can you "flip" the comments, i.e. make the 1.15 one main and put the other in a comment?
Also a brief explanation of why this workaround is needed would be great
@@ -957,6 +964,13 @@ def _decode_input_tensor_to_features_dict(feature_map, hparams): | |||
features["decode_length"] = ( | |||
IMAGE_DECODE_LENGTH if input_is_image else tf.shape(x)[1] + 50) | |||
features["inputs"] = x | |||
# Save inputs to "partial_targets" when prepending inputs to targets. Also | |||
# keep "inputs" as some models crash if they don't exist. | |||
if getattr(hparams, "prepend_mode", "none") != "none": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if hasattr(hparams, "prepend_mode"):
...
seems more idiomatic python, here and above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @senarvi -- Thanks for the PR, a few minor things and then I'll be able to merge this in. Thanks!
Thanks @senarvi -- I'll fix the changes on my end! |
This reverts commit c825d12.
Hi @senarvi -- Actually this makes both transformer_test and evolved_transformer_test fail on my end with the same reason, can you PTAL and fix?
|
Sorry, reverted the revert -- this is good to go! |
PiperOrigin-RevId: 281806525
Changes
As far as I can tell, prepending inputs to the targets is otherwise functional, except that during decoding, inputs are not saved to the "partial_inputs" feature. This pull request attempts to fix that.
There are a few extra calls to
update_hparams_for_universal_transformer()
that cause some Universal Transformer hyperparameter sets to crash. I removed them.Some background
During training,
preprocess_example_common()
sets targets to the concatenation of inputs + zero token + targets:https://github.com/tensorflow/tensor2tensor/blob/v1.14.1/tensor2tensor/data_generators/problem.py#L150
It looks like during inference the function sets the feature "partial_targets" to the concatenation of inputs + zero token. However, this function is not called during decoding!
The Transformer models support reading the "partial_targets" feature, so what was left is to save the inputs to "partial_targets" in the
_*_input_tensor_to_features_dict()
functions in decoding.py. Or let me know if I've missed something and there's a better way to do it.