List of tokenize() Examples

Examples of tokenize()

batch.internal.support.DelimitedLineTokenizer.tokenize()
Yields the tokens resulting from the splitting of the supplied line along the configured delimiter.
Does not include the delimiter in the returned token array.
Empty tokens are returned as empty strings, never null. @param line the line to be tokenised (can be null) @return the resulting tokens; an empty String[] if no delimiter was found or if the suppliedline is null or zero length
client.net.sf.saxon.ce.regex.ARegularExpression.tokenize()
Use this regular expression to tokenize an input string. @param input the string to be tokenized @return a SequenceIterator containing the resulting tokens, as objects of type StringValue
com.aliasi.tokenizer.Tokenizer.tokenize()
com.atilika.kuromoji.AbstractTokenizer.tokenize()
Tokenize input text @param text @return list of Token
com.github.pmerienne.trident.ml.preprocessing.EnglishTokenizer.tokenize()
com.github.pmerienne.trident.ml.preprocessing.TextTokenizer.tokenize()
com.github.pmerienne.trident.ml.preprocessing.TwitterTokenizer.tokenize()
com.google.dart.engine.html.scanner.AbstractScanner.tokenize()
Scan the source code to produce a list of tokens representing the source. @return the first token in the list of tokens that were produced
com.google.dart.engine.html.scanner.StringScanner.tokenize()
com.google.dart.engine.scanner.Scanner.tokenize()
Scan the source code to produce a list of tokens representing the source. @return the first token in the list of tokens that were produced
com.openkm.kea.filter.KEAPhraseFilter.tokenize()
com.totsp.gwittir.client.util.HistoryTokenizer.tokenize()
cx.fbn.nevernote.oauth.OAuthTokenizer.tokenize()
edu.buffalo.cse.ir.wikiindexer.tokenizer.Tokenizer.tokenize()
Main method used to tokenize. IT simply calls the rules one by one @param stream: The TokenStream to be worked upon @throws TokenizerException : If any tokenization exception occurs
edu.harvard.wcfia.yoshikoder.document.tokenizer.TokenizationService.tokenize()
edu.stanford.nlp.process.PTBTokenizer.tokenize()
edu.udo.cs.wvtool.generic.tokenizer.WVTTokenizer.tokenize()
Tokenize a character stream. @param source the Reader from which to get the character stream @param d the WVTDocumentInfo value, describing the document being processed @return a TokenEnumeration @exception Exception if an error occurs
net.sf.saxon.expr.Tokenizer.tokenize()
Prepare a string for tokenization. The actual tokens are obtained by calls on next() @param input the string to be tokenized @param start start point within the string @param end end point within the string (last character not read):-1 means end of string @param lineNumber the linenumber in the source where the expression appears @throws XPathException if a lexical error occurs, e.g. unmatchedstring quotes
net.sf.saxon.regex.RegularExpression.tokenize()
Use this regular expression to tokenize an input string. @param input the string to be tokenized @return a SequenceIterator containing the resulting tokens, as objects of type StringValue
opennlp.ccg.lexicon.DefaultTokenizer.tokenize()
Parses an input string into a list of words, including any explicitly given factors, and the semantic class of special tokens. Tokens are parsed into words using parseToken with the strictFactors flag set to false.
opennlp.ccg.lexicon.Tokenizer.tokenize()
Parses an input string into a list of words, including any explicitly given factors, and the semantic class of special tokens. Tokens are parsed into words using parseToken.
opennlp.tools.tokenize.SimpleTokenizer.tokenize()
opennlp.tools.tokenize.Tokenizer.tokenize()
Splits a string into its atomic parts @param s The string to be tokenized. @return The String[] with the individual tokens as the arrayelements.
opennlp.tools.tokenize.TokenizerME.tokenize()
Tokenize a String. @param s The string to be tokenized. @return A string array containing individual tokens as elements.
org.antlr.v4.runtime.tree.pattern.ParseTreePatternMatcher.tokenize()
org.antlr.works.grammar.syntax.GrammarSyntaxLexer.tokenize()
org.apache.qpid.framing.AMQShortString.tokenize()
org.apache.stanbol.enhancer.engines.entitylinking.LabelTokenizer.tokenize()
Tokenizes the parsed label in the parsed language @param label the label @param language the language of the lable or null ifnot known @return the tokenized label
org.eclipse.assemblyformatter.ir.Formatter.tokenize()
org.folg.places.standardize.Normalizer.tokenize()
Tokenize name by removing diacritics, lowercasing, and splitting on non alphanumeric characters @param text string to tokenize @return tokenized place levels
org.galagosearch.core.parse.TagTokenizer.tokenize()
Parses the text in the document.text attribute and fills in the document.terms and document.tags arrays. @param document @throws java.io.IOException
org.jasen.core.token.SimpleWordTokenizer.tokenize()
Tokenizes (splits) the text @throws IOException
org.jboss.common.beans.property.token.ArrayTokenizer.tokenize()
Implementation of this method breaks down passed string into tokens. @param value @return
org.jitterbit.integration.data.script.Transform.tokenize()
org.languagetool.tokenizers.SentenceTokenizer.tokenize()
Tokenize the given string to sentences.
org.languagetool.tokenizers.Tokenizer.tokenize()
org.languagetool.tokenizers.WordTokenizer.tokenize()
org.pdf4j.saxon.regex.RegularExpression.tokenize()
Use this regular expression to tokenize an input string. @param input the string to be tokenized @return a SequenceIterator containing the resulting tokens, as objects of type StringValue
org.springframework.batch.item.file.transform.DelimitedLineTokenizer.tokenize()
org.springframework.batch.item.file.transform.LineTokenizer.tokenize()
Yields the tokens resulting from the splitting of the supplied line. @param line the line to be tokenized (can be null) @return the resulting tokens
org.zkoss.selector.lang.Tokenizer.tokenize()

Examples of org.galagosearch.core.parse.TagTokenizer.tokenize()

  public String[] processContent(String text) {
    TagTokenizer tokenizer = new TagTokenizer();
    Document doc = null;


    try {
      doc = tokenizer.tokenize(text);
    } catch (IOException e) {
      e.printStackTrace();
    }


    List<String> toks = doc.terms;

View Full Code Here

Examples of org.galagosearch.core.parse.TagTokenizer.tokenize()

  public String[] processContent(String text) {
    TagTokenizer tokenizer = new TagTokenizer();
    Document doc = null;


    try {
      doc = tokenizer.tokenize(text);
    } catch (IOException e) {
      e.printStackTrace();
      return null;
    }

View Full Code Here

Examples of org.jasen.core.token.SimpleWordTokenizer.tokenize()

   * Creates and initialized the analyzer
   */
  public void initialize() throws IOException {
    // Get the dictionary as a resource stream
    SimpleWordTokenizer t = new SimpleWordTokenizer(this.getClass().getClassLoader().getResourceAsStream(ENGLISH_DICTIONARY_PATH));
    t.tokenize();
    tokens = t.getTokens();
    Arrays.sort(tokens);
    buildTrees();
  }

View Full Code Here

Examples of org.jboss.common.beans.property.token.ArrayTokenizer.tokenize()

    }


    protected String[] tokenize(String text) {
        // makes us iterate twice...
        ArrayTokenizer arrayTokenizer = getTokenizer();
        return arrayTokenizer.tokenize(text);
    }


    protected String encode(String[] v) {
        StringBuffer text = new StringBuffer();
        for (int index = 0; index < v.length; index++) {

View Full Code Here

Examples of org.jitterbit.integration.data.script.Transform.tokenize()

        return isChangeAllowedByUser;
    }
    
    private String setFirstInstance(String expr){
        Transform transform=new Transform();
        List<Token> tokenList=transform.tokenize(expr);
        for(Token token: tokenList){
            if(token.m_id==Transform.t_DE){
                String sourceDe=token.m_str;
                Matcher matcher=PATTERN.matcher(sourceDe);
                if(matcher.find()){

View Full Code Here

Examples of org.languagetool.tokenizers.SentenceTokenizer.tokenize()

    }


    //display stats if it's not in a buffered mode
    if (xmlMode == StringTools.XmlPrintMode.NORMAL_XML) {
      SentenceTokenizer sentenceTokenizer = lt.getLanguage().getSentenceTokenizer();
      int sentenceCount = sentenceTokenizer.tokenize(contents).size();
      displayTimeStats(startTime, sentenceCount, apiFormat);
    }
    return ruleMatches.size();
  }

View Full Code Here

Examples of org.languagetool.tokenizers.Tokenizer.tokenize()

      System.out.println("Checking " + file.getAbsolutePath());
      String text = StringTools.readFile(new FileInputStream(file.getAbsolutePath()));
      text = textFilter.filter(text);
      if (CHECK_BY_SENTENCE) {
        final Tokenizer sentenceTokenizer = langTool.getLanguage().getSentenceTokenizer();
        final List<String> sentences = sentenceTokenizer.tokenize(text);
        for (String sentence : sentences) {
          Tools.checkText(sentence, langTool, false, 1000);
        }
      } else {
        Tools.checkText(text, langTool);

View Full Code Here

Examples of org.languagetool.tokenizers.WordTokenizer.tokenize()


  protected void testPerformance(LanguageModel model, int ngramLength) throws Exception {
    try (FileInputStream fis = new FileInputStream(FILE)) {
      String content = StringTools.readStream(fis, "UTF-8");
      WordTokenizer wordTokenizer = new WordTokenizer();
      List<String> words = wordTokenizer.tokenize(content);
      String prevPrevWord = null;
      String prevWord = null;
      int i = 0;
      long totalMicros = 0;
      for (String word : words) {

View Full Code Here

Examples of org.pdf4j.saxon.regex.RegularExpression.tokenize()

                err.setLocator(this);
                throw err;
            }


        }
        return re.tokenize(input);
    }




    /**
     * Simple command-line interface for testing.

View Full Code Here

Examples of org.springframework.batch.item.file.transform.DelimitedLineTokenizer.tokenize()

    mapper.setStrict(true);
    mapper.setTargetType(GreenBean.class);
    DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
    String[] names = { "brown", "green", "great", "groin", "braun" };
    lineTokenizer.setNames(names);
    GreenBean bean = mapper.mapFieldSet(lineTokenizer.tokenize("brown,green,great,groin,braun"));
    Assert.assertEquals("green", bean.getGreen());
  }


  @Test
  public void testFuzzyMatchingWithLowerLimit() throws BindException {

View Full Code Here

0 1 2 3 4 5 6 7

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.