安装与验证
- 安装Tokenizers。
1 2
python3 -m pip install --upgrade pip setuptools wheel python3 -m pip install tokenizers==0.22.2
- 执行以下命令验证Tokenizers是否安装成功。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
python3 - <<'PY' import tokenizers from tokenizers import Tokenizer from tokenizers.models import WordLevel from tokenizers.pre_tokenizers import Whitespace print("tokenizers_version=" + tokenizers.__version__) tok = Tokenizer(WordLevel({"[UNK]": 0, "hello": 1, "kunpeng": 2, "world": 3}, unk_token="[UNK]")) tok.pre_tokenizer = Whitespace() out = tok.encode("hello kunpeng world") print("ids=" + ",".join(str(i) for i in out.ids)) print("tokens=" + ",".join(out.tokens)) assert tokenizers.__version__ == "0.22.2" assert out.ids == [1, 2, 3] assert out.tokens == ["hello", "kunpeng", "world"] PY
预期输出如下信息。
1 2 3
tokenizers_version=0.22.2 ids=1,2,3 tokens=hello,kunpeng,world
父主题: 安装指南