List of tokenize() Examples

Examples of tokenize()

batch.internal.support.DelimitedLineTokenizer.tokenize()
Yields the tokens resulting from the splitting of the supplied line along the configured delimiter.
Does not include the delimiter in the returned token array.
Empty tokens are returned as empty strings, never null. @param line the line to be tokenised (can be null) @return the resulting tokens; an empty String[] if no delimiter was found or if the suppliedline is null or zero length
client.net.sf.saxon.ce.regex.ARegularExpression.tokenize()
Use this regular expression to tokenize an input string. @param input the string to be tokenized @return a SequenceIterator containing the resulting tokens, as objects of type StringValue
com.aliasi.tokenizer.Tokenizer.tokenize()
com.atilika.kuromoji.AbstractTokenizer.tokenize()
Tokenize input text @param text @return list of Token
com.github.pmerienne.trident.ml.preprocessing.EnglishTokenizer.tokenize()
com.github.pmerienne.trident.ml.preprocessing.TextTokenizer.tokenize()
com.github.pmerienne.trident.ml.preprocessing.TwitterTokenizer.tokenize()
com.google.dart.engine.html.scanner.AbstractScanner.tokenize()
Scan the source code to produce a list of tokens representing the source. @return the first token in the list of tokens that were produced
com.google.dart.engine.html.scanner.StringScanner.tokenize()
com.google.dart.engine.scanner.Scanner.tokenize()
Scan the source code to produce a list of tokens representing the source. @return the first token in the list of tokens that were produced
com.openkm.kea.filter.KEAPhraseFilter.tokenize()
com.totsp.gwittir.client.util.HistoryTokenizer.tokenize()
cx.fbn.nevernote.oauth.OAuthTokenizer.tokenize()
edu.buffalo.cse.ir.wikiindexer.tokenizer.Tokenizer.tokenize()
Main method used to tokenize. IT simply calls the rules one by one @param stream: The TokenStream to be worked upon @throws TokenizerException : If any tokenization exception occurs
edu.harvard.wcfia.yoshikoder.document.tokenizer.TokenizationService.tokenize()
edu.stanford.nlp.process.PTBTokenizer.tokenize()
edu.udo.cs.wvtool.generic.tokenizer.WVTTokenizer.tokenize()
Tokenize a character stream. @param source the Reader from which to get the character stream @param d the WVTDocumentInfo value, describing the document being processed @return a TokenEnumeration @exception Exception if an error occurs
net.sf.saxon.expr.Tokenizer.tokenize()
Prepare a string for tokenization. The actual tokens are obtained by calls on next() @param input the string to be tokenized @param start start point within the string @param end end point within the string (last character not read):-1 means end of string @param lineNumber the linenumber in the source where the expression appears @throws XPathException if a lexical error occurs, e.g. unmatchedstring quotes
net.sf.saxon.regex.RegularExpression.tokenize()
Use this regular expression to tokenize an input string. @param input the string to be tokenized @return a SequenceIterator containing the resulting tokens, as objects of type StringValue
opennlp.ccg.lexicon.DefaultTokenizer.tokenize()
Parses an input string into a list of words, including any explicitly given factors, and the semantic class of special tokens. Tokens are parsed into words using parseToken with the strictFactors flag set to false.
opennlp.ccg.lexicon.Tokenizer.tokenize()
Parses an input string into a list of words, including any explicitly given factors, and the semantic class of special tokens. Tokens are parsed into words using parseToken.
opennlp.tools.tokenize.SimpleTokenizer.tokenize()
opennlp.tools.tokenize.Tokenizer.tokenize()
Splits a string into its atomic parts @param s The string to be tokenized. @return The String[] with the individual tokens as the arrayelements.
opennlp.tools.tokenize.TokenizerME.tokenize()
Tokenize a String. @param s The string to be tokenized. @return A string array containing individual tokens as elements.
org.antlr.v4.runtime.tree.pattern.ParseTreePatternMatcher.tokenize()
org.antlr.works.grammar.syntax.GrammarSyntaxLexer.tokenize()
org.apache.qpid.framing.AMQShortString.tokenize()
org.apache.stanbol.enhancer.engines.entitylinking.LabelTokenizer.tokenize()
Tokenizes the parsed label in the parsed language @param label the label @param language the language of the lable or null ifnot known @return the tokenized label
org.eclipse.assemblyformatter.ir.Formatter.tokenize()
org.folg.places.standardize.Normalizer.tokenize()
Tokenize name by removing diacritics, lowercasing, and splitting on non alphanumeric characters @param text string to tokenize @return tokenized place levels
org.galagosearch.core.parse.TagTokenizer.tokenize()
Parses the text in the document.text attribute and fills in the document.terms and document.tags arrays. @param document @throws java.io.IOException
org.jasen.core.token.SimpleWordTokenizer.tokenize()
Tokenizes (splits) the text @throws IOException
org.jboss.common.beans.property.token.ArrayTokenizer.tokenize()
Implementation of this method breaks down passed string into tokens. @param value @return
org.jitterbit.integration.data.script.Transform.tokenize()
org.languagetool.tokenizers.SentenceTokenizer.tokenize()
Tokenize the given string to sentences.
org.languagetool.tokenizers.Tokenizer.tokenize()
org.languagetool.tokenizers.WordTokenizer.tokenize()
org.pdf4j.saxon.regex.RegularExpression.tokenize()
Use this regular expression to tokenize an input string. @param input the string to be tokenized @return a SequenceIterator containing the resulting tokens, as objects of type StringValue
org.springframework.batch.item.file.transform.DelimitedLineTokenizer.tokenize()
org.springframework.batch.item.file.transform.LineTokenizer.tokenize()
Yields the tokens resulting from the splitting of the supplied line. @param line the line to be tokenized (can be null) @return the resulting tokens
org.zkoss.selector.lang.Tokenizer.tokenize()

Examples of batch.internal.support.DelimitedLineTokenizer.tokenize()


public class DelimitedLineTokenizerTests extends TestCase {


  public void testDelimitedLineTokenizer() {
    DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
    String[] line = tokenizer.tokenize("a,b,c");
    assertEquals(3, line.length);
  }


  public void testDelimitedLineTokenizerChar() {
    DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer(' ');

View Full Code Here

Examples of client.net.sf.saxon.ce.regex.ARegularExpression.tokenize()

            // check that it's not a pattern that matches ""
            if (re.matches("")) {
                dynamicError("The regular expression in tokenize() must not be one that matches a zero-length string", "FORX0003", null);
            }


            return re.tokenize(input);


        } catch (XPathException err) {
            err.setErrorCode("FORX0002");
            err.setXPathContext(c);
            err.maybeSetLocation(this.getSourceLocator());

View Full Code Here

Examples of com.aliasi.tokenizer.Tokenizer.tokenize()

   public String getPOS(String sentence, boolean allTags)
   {
    StringBuffer xmlOutput =  new StringBuffer();
    char[] cs = sentence.toCharArray();
    Tokenizer tokenizer = TOKENIZER_FACTORY.tokenizer(cs, 0, cs.length);
    String[] tokens = tokenizer.tokenize();
    String[] tags = decoder.firstBest(tokens); int len = tokens.length;
    for (int i = 0; i < len; i++)
    { 
     //*-- set the adjective tags
     if (tags[i].startsWith("j") || tags[i].equals("cd") || tags[i].endsWith("od") )

View Full Code Here

Examples of com.aliasi.tokenizer.Tokenizer.tokenize()

   {
    //*-- extract the sentence boundaries
    if (in.length() > Constants.DOC_LENGTH_MAXLIMIT) in = in.substring(0, Constants.DOC_LENGTH_MAXLIMIT - 1);
    ArrayList<Token> tokenList = new ArrayList<Token>(); ArrayList<Token> whiteList = new ArrayList<Token>();
    Tokenizer tokenizer = TOKENIZER_FACTORY.tokenizer(in.toCharArray(), 0, in.length() );
    tokenizer.tokenize(tokenList, whiteList);
    tokens = new String[tokenList.size()]; tokenList.toArray(tokens);
    whites = new String[whiteList.size()]; whiteList.toArray(whites);


    sentenceBoundaries = SENTENCE_MODEL.boundaryIndices(tokens, whites);   
    int numPossibleSentences = sentenceBoundaries.length;

View Full Code Here

Examples of com.aliasi.tokenizer.Tokenizer.tokenize()

   public String[] tokenizer(String in)
   {   
    if (in.length() > Constants.DOC_LENGTH_MAXLIMIT) in = in.substring(0, Constants.DOC_LENGTH_MAXLIMIT - 1);
    ArrayList<Token> tokenList = new ArrayList<Token>(); ArrayList<Token> whiteList = new ArrayList<Token>();
    Tokenizer tokenizer = new StandardBgramTokenizerFactory().tokenizer(in.toCharArray(), 0, in.length() );
    tokenizer.tokenize(tokenList, whiteList);
    String[] tokens = new String[tokenList.size()]; tokenList.toArray(tokens);
    return(tokens);
   }

View Full Code Here

Examples of com.aliasi.tokenizer.Tokenizer.tokenize()

  private void tokenize() {
    tokenList.clear();
    whiteList.clear();
    Tokenizer tokenizer = tokenizerFactory.tokenizer(text.toCharArray(),
        0, text.length());
    tokenizer.tokenize(tokenList, whiteList);
//    System.out.println(tokenList.size() + " TOKENS");
//    System.out.println(whiteList.size() + " WHITESPACES");
  }
  
  private void storeTokensInArrays() {

View Full Code Here

Examples of com.aliasi.tokenizer.Tokenizer.tokenize()

  private void tokenize() {
    tokenList.clear();
    whiteList.clear();
    Tokenizer tokenizer = tokenizerFactory.tokenizer(text.toCharArray(),
        0, text.length());
    tokenizer.tokenize(tokenList, whiteList);
//    System.out.println(tokenList.size() + " TOKENS");
//    System.out.println(whiteList.size() + " WHITESPACES");
  }
  
  private void storeTokensInArrays() {

View Full Code Here

Examples of com.atilika.kuromoji.AbstractTokenizer.tokenize()

    }
    System.out.println("AbstractTokenizer ready.  Provide input text and press RET.");
    BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
    String line;
    while ((line = reader.readLine()) != null) {
      List<Token> result = tokenizer.tokenize(line);
      for (Token token : result) {
        System.out.println(token.getSurfaceForm() + "\t"
            + token.getAllFeatures());
      }
    }

View Full Code Here

Examples of com.github.pmerienne.trident.ml.preprocessing.EnglishTokenizer.tokenize()

  @Test
  public void testWithSmallWiki() {
    EnglishTokenizer tokenizer = new EnglishTokenizer();


    KLDClassifier kldClassifier = new KLDClassifier(2);
    kldClassifier.update(0, tokenizer.tokenize(NOSQL_WIKI));
    kldClassifier.update(0, tokenizer.tokenize(MYSQL_WIKI));
    kldClassifier.update(1, tokenizer.tokenize(LILIUM_WIKI));
    kldClassifier.update(1, tokenizer.tokenize(ROSE_WIKI));


    assertEquals(0, (int) kldClassifier.classify(tokenizer.tokenize(DATABASE_WIKI)));

View Full Code Here

Examples of com.github.pmerienne.trident.ml.preprocessing.EnglishTokenizer.tokenize()

  public void testWithSmallWiki() {
    EnglishTokenizer tokenizer = new EnglishTokenizer();


    KLDClassifier kldClassifier = new KLDClassifier(2);
    kldClassifier.update(0, tokenizer.tokenize(NOSQL_WIKI));
    kldClassifier.update(0, tokenizer.tokenize(MYSQL_WIKI));
    kldClassifier.update(1, tokenizer.tokenize(LILIUM_WIKI));
    kldClassifier.update(1, tokenizer.tokenize(ROSE_WIKI));


    assertEquals(0, (int) kldClassifier.classify(tokenizer.tokenize(DATABASE_WIKI)));
    assertEquals(1, (int) kldClassifier.classify(tokenizer.tokenize(FLOWER_WIKI)));

View Full Code Here

0 1 2 3 4 5 6 7

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.