Skip to content

Commit 211cb6f

Browse files
Ryan Sepassicopybara-github
Ryan Sepassi
authored andcommitted
Disable WMT in preparation for rewrite (#254)
PiperOrigin-RevId: 239486187
1 parent af8ef42 commit 211cb6f

File tree

4 files changed

+3
-170
lines changed

4 files changed

+3
-170
lines changed

docs/datasets.md

-170
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,6 @@ np_datasets = tfds.as_numpy(datasets)
6565
* [`"flores_translate_neen"`](#flores_translate_neen)
6666
* [`"flores_translate_sien"`](#flores_translate_sien)
6767
* [`"ted_multi_translate"`](#ted_multi_translate)
68-
* [`"wmt_translate_ende"`](#wmt_translate_ende)
69-
* [`"wmt_translate_enfr"`](#wmt_translate_enfr)
7068
* [`video`](#video)
7169
* [`"bair_robot_pushing_small"`](#bair_robot_pushing_small)
7270
* [`"moving_mnist"`](#moving_mnist)
@@ -2122,174 +2120,6 @@ VALIDATION | 6,049
21222120

21232121
---
21242122

2125-
### `"wmt_translate_ende"`
2126-
2127-
Translate dataset based on the data from statmt.org.
2128-
2129-
2130-
* URL: [http://www.statmt.org/wmt18/](http://www.statmt.org/wmt18/)
2131-
* `DatasetBuilder`: [`tfds.translate.wmt_ende.WmtTranslateEnde`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/translate/wmt_ende.py)
2132-
2133-
`wmt_translate_ende` is configured with `tfds.translate.wmt_ende.WMTConfig` and has the following
2134-
configurations predefined (defaults to the first one):
2135-
2136-
* `"ende_plain_text_t2t"` (`v0.0.2`) (`Size: 1.60 GiB`): Translation dataset from en to de, uses encoder plain_text. It uses the following data files (see the code for exact contents): {"dev": ["wmt17_newstest13"], "train": ["wmt18_news_commentary_ende", "wmt13_commoncrawl_ende", "wmt13_europarl_ende"]}.
2137-
2138-
* `"ende_subwords8k_t2t"` (`v0.0.2`) (`Size: 1.60 GiB`): Translation dataset from en to de, uses encoder subwords8k. It uses the following data files (see the code for exact contents): {"dev": ["wmt17_newstest13"], "train": ["wmt18_news_commentary_ende", "wmt13_commoncrawl_ende", "wmt13_europarl_ende"]}.
2139-
2140-
2141-
#### `"wmt_translate_ende/ende_plain_text_t2t"`
2142-
2143-
```python
2144-
Translation({
2145-
'de': Text(shape=(), dtype=tf.string, encoder=None),
2146-
'en': Text(shape=(), dtype=tf.string, encoder=None),
2147-
})
2148-
```
2149-
2150-
2151-
2152-
#### `"wmt_translate_ende/ende_subwords8k_t2t"`
2153-
2154-
```python
2155-
Translation({
2156-
'de': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8267>),
2157-
'en': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8216>),
2158-
})
2159-
```
2160-
2161-
2162-
2163-
2164-
#### Statistics
2165-
Split | Examples
2166-
:----- | ---:
2167-
ALL | 4,595,289
2168-
TRAIN | 4,592,289
2169-
VALIDATION | 3,000
2170-
2171-
2172-
#### Urls
2173-
* [http://www.statmt.org/wmt18/](http://www.statmt.org/wmt18/)
2174-
2175-
#### Supervised keys (for `as_supervised=True`)
2176-
`(u'en', u'de')`
2177-
2178-
#### Citation
2179-
```
2180-
@InProceedings{bojar-EtAl:2018:WMT1,
2181-
author = {Bojar, Ond {r}ej and Federmann, Christian and Fishel, Mark
2182-
and Graham, Yvette and Haddow, Barry and Huck, Matthias and
2183-
Koehn, Philipp and Monz, Christof},
2184-
title = {Findings of the 2018 Conference on Machine Translation (WMT18)},
2185-
booktitle = {Proceedings of the Third Conference on Machine Translation,
2186-
Volume 2: Shared Task Papers},
2187-
month = {October},
2188-
year = {2018},
2189-
address = {Belgium, Brussels},
2190-
publisher = {Association for Computational Linguistics},
2191-
pages = {272--307},
2192-
url = {http://www.aclweb.org/anthology/W18-6401}
2193-
}
2194-
```
2195-
2196-
---
2197-
2198-
### `"wmt_translate_enfr"`
2199-
2200-
Translate dataset based on the data from statmt.org.
2201-
2202-
2203-
* URL: [http://www.statmt.org/wmt18/](http://www.statmt.org/wmt18/)
2204-
* `DatasetBuilder`: [`tfds.translate.wmt_enfr.WmtTranslateEnfr`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/translate/wmt_enfr.py)
2205-
2206-
`wmt_translate_enfr` is configured with `tfds.translate.wmt_enfr.WMTConfig` and has the following
2207-
configurations predefined (defaults to the first one):
2208-
2209-
* `"enfr_plain_text_t2t_small"` (`v0.0.2`) (`Size: ?? GiB`): Translation dataset from en to fr, uses encoder plain_text. It uses the following data files (see the code for exact contents): {"dev": ["opennmt_1M_enfr_valid"], "train": ["opennmt_1M_enfr_train"]}.
2210-
2211-
* `"enfr_subwords8k_t2t_small"` (`v0.0.2`) (`Size: ?? GiB`): Translation dataset from en to fr, uses encoder subwords8k. It uses the following data files (see the code for exact contents): {"dev": ["opennmt_1M_enfr_valid"], "train": ["opennmt_1M_enfr_train"]}.
2212-
2213-
* `"enfr_plain_text_t2t_large"` (`v0.0.2`) (`Size: ?? GiB`): Translation dataset from en to fr, uses encoder plain_text. It uses the following data files (see the code for exact contents): {"dev": ["wmt17_newstest13"], "train": ["wmt13_commoncrawl_enfr", "wmt13_europarl_enfr", "wmt14_news_commentary_enfr", "wmt13_undoc_enfr"]}.
2214-
2215-
* `"enfr_subwords8k_t2t_large"` (`v0.0.2`) (`Size: ?? GiB`): Translation dataset from en to fr, uses encoder subwords8k. It uses the following data files (see the code for exact contents): {"dev": ["wmt17_newstest13"], "train": ["wmt13_commoncrawl_enfr", "wmt13_europarl_enfr", "wmt14_news_commentary_enfr", "wmt13_undoc_enfr"]}.
2216-
2217-
2218-
#### `"wmt_translate_enfr/enfr_plain_text_t2t_small"`
2219-
2220-
```python
2221-
Translation({
2222-
'en': Text(shape=(), dtype=tf.string, encoder=None),
2223-
'fr': Text(shape=(), dtype=tf.string, encoder=None),
2224-
})
2225-
```
2226-
2227-
2228-
2229-
#### `"wmt_translate_enfr/enfr_subwords8k_t2t_small"`
2230-
2231-
```python
2232-
Translation({
2233-
'en': Text(shape=(), dtype=tf.string, encoder=None),
2234-
'fr': Text(shape=(), dtype=tf.string, encoder=None),
2235-
})
2236-
```
2237-
2238-
2239-
2240-
#### `"wmt_translate_enfr/enfr_plain_text_t2t_large"`
2241-
2242-
```python
2243-
Translation({
2244-
'en': Text(shape=(), dtype=tf.string, encoder=None),
2245-
'fr': Text(shape=(), dtype=tf.string, encoder=None),
2246-
})
2247-
```
2248-
2249-
2250-
2251-
#### `"wmt_translate_enfr/enfr_subwords8k_t2t_large"`
2252-
2253-
```python
2254-
Translation({
2255-
'en': Text(shape=(), dtype=tf.string, encoder=None),
2256-
'fr': Text(shape=(), dtype=tf.string, encoder=None),
2257-
})
2258-
```
2259-
2260-
2261-
2262-
2263-
#### Statistics
2264-
None computed
2265-
2266-
#### Urls
2267-
* [http://www.statmt.org/wmt18/](http://www.statmt.org/wmt18/)
2268-
2269-
#### Supervised keys (for `as_supervised=True`)
2270-
`(u'en', u'fr')`
2271-
2272-
#### Citation
2273-
```
2274-
@InProceedings{bojar-EtAl:2018:WMT1,
2275-
author = {Bojar, Ond {r}ej and Federmann, Christian and Fishel, Mark
2276-
and Graham, Yvette and Haddow, Barry and Huck, Matthias and
2277-
Koehn, Philipp and Monz, Christof},
2278-
title = {Findings of the 2018 Conference on Machine Translation (WMT18)},
2279-
booktitle = {Proceedings of the Third Conference on Machine Translation,
2280-
Volume 2: Shared Task Papers},
2281-
month = {October},
2282-
year = {2018},
2283-
address = {Belgium, Brussels},
2284-
publisher = {Association for Computational Linguistics},
2285-
pages = {272--307},
2286-
url = {http://www.aclweb.org/anthology/W18-6401}
2287-
}
2288-
```
2289-
2290-
---
2291-
2292-
22932123
## [`video`](#video)
22942124

22952125
### `"bair_robot_pushing_small"`

tensorflow_datasets/translate/wmt.py

+1
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,7 @@ def __init__(self,
104104
class WmtTranslate(tfds.core.GeneratorBasedBuilder):
105105
"""WMT translation dataset."""
106106
_URL = "http://www.statmt.org/wmt18/"
107+
IN_DEVELOPMENT = True
107108

108109
@abc.abstractproperty
109110
def translate_datasets(self):

tensorflow_datasets/translate/wmt_ende.py

+1
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@
6161

6262
class WmtTranslateEnde(wmt.WmtTranslate):
6363
"""WMT English-German translation dataset."""
64+
IN_DEVELOPMENT = True
6465

6566
BUILDER_CONFIGS = [
6667
wmt.WMTConfig(

tensorflow_datasets/translate/wmt_enfr.py

+1
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@
9090

9191
class WmtTranslateEnfr(wmt.WmtTranslate):
9292
"""English-French WMT translation dataset."""
93+
IN_DEVELOPMENT = True
9394

9495
BUILDER_CONFIGS = [
9596
# EN-FR translations (matching the data used by Tensor2Tensor library).

0 commit comments

Comments
 (0)