List of tokenize() Examples

Examples of tokenize()

batch.internal.support.DelimitedLineTokenizer.tokenize()
Yields the tokens resulting from the splitting of the supplied line along the configured delimiter.
Does not include the delimiter in the returned token array.
Empty tokens are returned as empty strings, never null. @param line the line to be tokenised (can be null) @return the resulting tokens; an empty String[] if no delimiter was found or if the suppliedline is null or zero length
client.net.sf.saxon.ce.regex.ARegularExpression.tokenize()
Use this regular expression to tokenize an input string. @param input the string to be tokenized @return a SequenceIterator containing the resulting tokens, as objects of type StringValue
com.aliasi.tokenizer.Tokenizer.tokenize()
com.atilika.kuromoji.AbstractTokenizer.tokenize()
Tokenize input text @param text @return list of Token
com.github.pmerienne.trident.ml.preprocessing.EnglishTokenizer.tokenize()
com.github.pmerienne.trident.ml.preprocessing.TextTokenizer.tokenize()
com.github.pmerienne.trident.ml.preprocessing.TwitterTokenizer.tokenize()
com.google.dart.engine.html.scanner.AbstractScanner.tokenize()
Scan the source code to produce a list of tokens representing the source. @return the first token in the list of tokens that were produced
com.google.dart.engine.html.scanner.StringScanner.tokenize()
com.google.dart.engine.scanner.Scanner.tokenize()
Scan the source code to produce a list of tokens representing the source. @return the first token in the list of tokens that were produced
com.openkm.kea.filter.KEAPhraseFilter.tokenize()
com.totsp.gwittir.client.util.HistoryTokenizer.tokenize()
cx.fbn.nevernote.oauth.OAuthTokenizer.tokenize()
edu.buffalo.cse.ir.wikiindexer.tokenizer.Tokenizer.tokenize()
Main method used to tokenize. IT simply calls the rules one by one @param stream: The TokenStream to be worked upon @throws TokenizerException : If any tokenization exception occurs
edu.harvard.wcfia.yoshikoder.document.tokenizer.TokenizationService.tokenize()
edu.stanford.nlp.process.PTBTokenizer.tokenize()
edu.udo.cs.wvtool.generic.tokenizer.WVTTokenizer.tokenize()
Tokenize a character stream. @param source the Reader from which to get the character stream @param d the WVTDocumentInfo value, describing the document being processed @return a TokenEnumeration @exception Exception if an error occurs
net.sf.saxon.expr.Tokenizer.tokenize()
Prepare a string for tokenization. The actual tokens are obtained by calls on next() @param input the string to be tokenized @param start start point within the string @param end end point within the string (last character not read):-1 means end of string @param lineNumber the linenumber in the source where the expression appears @throws XPathException if a lexical error occurs, e.g. unmatchedstring quotes
net.sf.saxon.regex.RegularExpression.tokenize()
Use this regular expression to tokenize an input string. @param input the string to be tokenized @return a SequenceIterator containing the resulting tokens, as objects of type StringValue
opennlp.ccg.lexicon.DefaultTokenizer.tokenize()
Parses an input string into a list of words, including any explicitly given factors, and the semantic class of special tokens. Tokens are parsed into words using parseToken with the strictFactors flag set to false.
opennlp.ccg.lexicon.Tokenizer.tokenize()
Parses an input string into a list of words, including any explicitly given factors, and the semantic class of special tokens. Tokens are parsed into words using parseToken.
opennlp.tools.tokenize.SimpleTokenizer.tokenize()
opennlp.tools.tokenize.Tokenizer.tokenize()
Splits a string into its atomic parts @param s The string to be tokenized. @return The String[] with the individual tokens as the arrayelements.
opennlp.tools.tokenize.TokenizerME.tokenize()
Tokenize a String. @param s The string to be tokenized. @return A string array containing individual tokens as elements.
org.antlr.v4.runtime.tree.pattern.ParseTreePatternMatcher.tokenize()
org.antlr.works.grammar.syntax.GrammarSyntaxLexer.tokenize()
org.apache.qpid.framing.AMQShortString.tokenize()
org.apache.stanbol.enhancer.engines.entitylinking.LabelTokenizer.tokenize()
Tokenizes the parsed label in the parsed language @param label the label @param language the language of the lable or null ifnot known @return the tokenized label
org.eclipse.assemblyformatter.ir.Formatter.tokenize()
org.folg.places.standardize.Normalizer.tokenize()
Tokenize name by removing diacritics, lowercasing, and splitting on non alphanumeric characters @param text string to tokenize @return tokenized place levels
org.galagosearch.core.parse.TagTokenizer.tokenize()
Parses the text in the document.text attribute and fills in the document.terms and document.tags arrays. @param document @throws java.io.IOException
org.jasen.core.token.SimpleWordTokenizer.tokenize()
Tokenizes (splits) the text @throws IOException
org.jboss.common.beans.property.token.ArrayTokenizer.tokenize()
Implementation of this method breaks down passed string into tokens. @param value @return
org.jitterbit.integration.data.script.Transform.tokenize()
org.languagetool.tokenizers.SentenceTokenizer.tokenize()
Tokenize the given string to sentences.
org.languagetool.tokenizers.Tokenizer.tokenize()
org.languagetool.tokenizers.WordTokenizer.tokenize()
org.pdf4j.saxon.regex.RegularExpression.tokenize()
Use this regular expression to tokenize an input string. @param input the string to be tokenized @return a SequenceIterator containing the resulting tokens, as objects of type StringValue
org.springframework.batch.item.file.transform.DelimitedLineTokenizer.tokenize()
org.springframework.batch.item.file.transform.LineTokenizer.tokenize()
Yields the tokens resulting from the splitting of the supplied line. @param line the line to be tokenized (can be null) @return the resulting tokens
org.zkoss.selector.lang.Tokenizer.tokenize()

Examples of edu.harvard.wcfia.yoshikoder.document.tokenizer.TokenizationService.tokenize()

      TokenizationService service = TokenizationService.getTokenizationService();
      Map<YKDocument,Concordance> map = new HashMap<YKDocument,Concordance>();
      for (YKDocument doc : docs) {
        TokenList tl = tcache.getTokenList(doc);
        if (tl == null){
          tl = service.tokenize(doc);
          tcache.putTokenList(doc, tl);
        } 
        Concordance c = yoshikoder.getDictionary().getConcordance(tl, n, wsize);
        map.put(doc, c);
      }

View Full Code Here

Examples of edu.stanford.nlp.process.PTBTokenizer.tokenize()

      int sNum = 0;
      int wNum = 0;




      PTBTokenizer ptb = PTBTokenizer.newPTBTokenizer(new BufferedReader(new StringReader(doc)), false, true);
      List<CoreLabel> words = ptb.tokenize();


      List<CoreLabel> result = new ArrayList<CoreLabel>();


      CoreLabel prev = null;
      String prevString = "";

View Full Code Here

Examples of edu.udo.cs.wvtool.generic.tokenizer.WVTTokenizer.tokenize()

                wordFilter = (WVTWordFilter) config.getComponentForStep(WVTConfiguration.STEP_WORDFILTER, d);
                stemmer = (WVTStemmer) config.getComponentForStep(WVTConfiguration.STEP_STEMMER, d);


                // Process the document


                TokenEnumeration tokens = stemmer.stem(wordFilter.filter(tokenizer.tokenize(charConverter.convertChars(infilter.convertToPlainText(loader.loadDocument(d), d), d), d), d), d);


                while (tokens.hasMoreTokens()) {
                    wordList.addWordOccurance(tokens.nextToken());
                }

View Full Code Here

Examples of edu.udo.cs.wvtool.generic.tokenizer.WVTTokenizer.tokenize()


                outputFilter = (WVTOutputFilter) config.getComponentForStep(WVTConfiguration.STEP_OUTPUT, d);


                // Process the document


                TokenEnumeration tokens = stemmer.stem(wordFilter.filter(tokenizer.tokenize(charConverter.convertChars(infilter.convertToPlainText(loader.loadDocument(d), d), d), d), d), d);


                while (tokens.hasMoreTokens()) {
                    wordList.addWordOccurance(tokens.nextToken());
                }

View Full Code Here

Examples of edu.udo.cs.wvtool.generic.tokenizer.WVTTokenizer.tokenize()


            vectorCreator = (WVTVectorCreator) config.getComponentForStep(WVTConfiguration.STEP_VECTOR_CREATION, d);


            // Process the document


            TokenEnumeration tokens = stemmer.stem(wordFilter.filter(tokenizer.tokenize(charConverter.convertChars(new StringReader(text), d), d), d), d);


            while (tokens.hasMoreTokens()) {
                wordList.addWordOccurance(tokens.nextToken());
            }

View Full Code Here

Examples of edu.udo.cs.wvtool.generic.tokenizer.WVTTokenizer.tokenize()

                wordFilter = (WVTWordFilter) config.getComponentForStep(WVTConfiguration.STEP_WORDFILTER, d);
                stemmer = (WVTStemmer) config.getComponentForStep(WVTConfiguration.STEP_STEMMER, d);


                // Process the document


                TokenEnumeration tokens = stemmer.stem(wordFilter.filter(tokenizer.tokenize(charConverter.convertChars(infilter.convertToPlainText(loader.loadDocument(d), d), d), d), d), d);


                while (tokens.hasMoreTokens()) {
                    listener.processWord(tokens.nextToken());
                }

View Full Code Here

Examples of net.sf.saxon.expr.Tokenizer.tokenize()

  }


  public static String replaceNameInPathOrQuery( String pathOrQuery, String oldName, String newName ) throws Exception
  {
    Tokenizer t = new Tokenizer();
    t.tokenize( pathOrQuery, 0, -1, 1 );
    StringBuffer result = new StringBuffer();
    int lastIx = 0;


    while( t.currentToken != Token.EOF )
    {

View Full Code Here

Examples of net.sf.saxon.regex.RegularExpression.tokenize()

                err.setLocator(this);
                throw err;
            }


        }
        return re.tokenize(input);
    }




    /**
     * Simple command-line interface for testing.

View Full Code Here

Examples of net.sf.saxon.regex.RegularExpression.tokenize()

                err.setLocator(this);
                throw err;
            }


        }
        return re.tokenize(input);
    }




    /**
     * Simple command-line interface for testing.

View Full Code Here

Examples of opennlp.ccg.lexicon.DefaultTokenizer.tokenize()

  lm.debugScore = true;
        int secs = (int) (System.currentTimeMillis() - start) / 1000;
        System.out.println("secs: " + secs);
        System.out.println();
        Tokenizer tokenizer = new DefaultTokenizer();
        List<Word> words = tokenizer.tokenize(tokens);
        System.out.println("scoring: " + tokens);
        System.out.println();
        lm.setWordsToScore(words, true);
        lm.prepareToScoreWords();
        double logprob = lm.logprob();

View Full Code Here

0 1 2 3 4 5 6 7

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.