开发验证
执行以下命令进行验证。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | python3 - <<'PY' import tokenizers from tokenizers import Tokenizer from tokenizers.models import WordLevel from tokenizers.pre_tokenizers import Whitespace print("tokenizers_version=" + tokenizers.__version__) tok = Tokenizer(WordLevel({"[UNK]": 0, "hello": 1, "kunpeng": 2, "world": 3}, unk_token="[UNK]")) tok.pre_tokenizer = Whitespace() out = tok.encode("hello kunpeng world") print("ids=" + ",".join(str(i) for i in out.ids)) print("tokens=" + ",".join(out.tokens)) assert tokenizers.__version__ == "0.22.2" assert out.ids == [1, 2, 3] assert out.tokens == ["hello", "kunpeng", "world"] PY |
预期输出如下信息:
1 2 3 | tokenizers_version=0.22.2 ids=1,2,3 tokens=hello,kunpeng,world |
父主题: 开发指南