About 6,690,000 results
Open links in new tab
  1. What does Keras Tokenizer method exactly do? - Stack Overflow

    On occasion, circumstances require us to do the following: from keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=my_max) Then, invariably, we chant this mantra: …

  2. How to do Tokenizer Batch processing? - HuggingFace

    Jun 7, 2023 · in the Tokenizer documentation from huggingface, the call fuction accepts List [List [str]] and says: text (str, List [str], List [List [str]], optional) — The sequence or batch of …

  3. python - AutoTokenizer.from_pretrained fails to load locally saved ...

    from transformers import AutoTokenizer, AutoConfig tokenizer = AutoTokenizer.from_pretrained('distilroberta-base') config = …

  4. How to add new tokens to an existing Huggingface tokenizer?

    May 8, 2023 · # add the tokens to the tokenizer vocabulary tokenizer.add_tokens(list(new_tokens)) # add new, random embeddings for the new tokens …

  5. Looking for a clear definition of what a "tokenizer", "parser" and ...

    Mar 28, 2018 · A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context …

  6. How to download punkt tokenizer in nltk? - Stack Overflow

    How to download punkt tokenizer in nltk? Asked 2 years, 1 month ago Modified 6 months ago Viewed 24k times

  7. java - Why is StringTokenizer deprecated? - Stack Overflow

    From the javadoc for StringTokenizer: StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that …

  8. Tokenizing strings in C - Stack Overflow

    I have been trying to tokenize a string using SPACE as delimiter but it doesn't work. Does any one have suggestion on why it doesn't work? Edit: tokenizing using: strtok (string, " "); The code is...

  9. parsing - lexers vs parsers - Stack Overflow

    Are lexers and parsers really that different in theory? It seems fashionable to hate regular expressions: coding horror, another blog post. However, popular lexing based tools: …

  10. Does huggingface have a model that is based on word-level tokens?

    The idea here is that the tokenizer would first tokenize at the word level by default because it expects the input as a word (in its base form) by default and then falls back on lower levels …