Examples of edu.uci.ics.crawler4j.crawler.CrawlController.addSeed()

Class edu.uci.ics.crawler4j.crawler.CrawlController

Examples of edu.uci.ics.crawler4j.crawler.CrawlController.addSeed()

edu.uci.ics.crawler4j.crawler.CrawlController.addSeed()

    controller1.addSeed("http://www.ics.uci.edu/~lopes/");
    controller1.addSeed("http://www.cnn.com/POLITICS/");


    controller2.addSeed("http://en.wikipedia.org/wiki/Main_Page");
    controller2.addSeed("http://en.wikipedia.org/wiki/Obama");
    controller2.addSeed("http://en.wikipedia.org/wiki/Bing");


    /*
     * The first crawler will have 5 cuncurrent threads and the second
     * crawler will have 7 threads.
     */

View Full Code Here

    PageFetcher pageFetcher = new PageFetcher(config);
    RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
    RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
    CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
    for (String domain : crawlDomains) {
      controller.addSeed(domain);
    }


    ImageCrawler.configure(crawlDomains, storageFolder);


    controller.start(ImageCrawler.class, numberOfCrawlers);

View Full Code Here

    /*
     * For each crawl, you need to add some seed urls. These are the first
     * URLs that are fetched and then the crawler starts following links
     * which are found in these pages
     */
    controller.addSeed("http://www.ics.uci.edu/~welling/");
    controller.addSeed("http://www.ics.uci.edu/~lopes/");
    controller.addSeed("http://www.ics.uci.edu/");


    /*
     * Start the crawl. This is a blocking operation, meaning that your code

View Full Code Here

     * For each crawl, you need to add some seed urls. These are the first
     * URLs that are fetched and then the crawler starts following links
     * which are found in these pages
     */
    controller.addSeed("http://www.ics.uci.edu/~welling/");
    controller.addSeed("http://www.ics.uci.edu/~lopes/");
    controller.addSeed("http://www.ics.uci.edu/");


    /*
     * Start the crawl. This is a blocking operation, meaning that your code
     * will reach the line after this only when crawling is finished.

View Full Code Here

     * URLs that are fetched and then the crawler starts following links
     * which are found in these pages
     */
    controller.addSeed("http://www.ics.uci.edu/~welling/");
    controller.addSeed("http://www.ics.uci.edu/~lopes/");
    controller.addSeed("http://www.ics.uci.edu/");


    /*
     * Start the crawl. This is a blocking operation, meaning that your code
     * will reach the line after this only when crawling is finished.
     */

View Full Code Here

    PageFetcher pageFetcher = new PageFetcher(config);
    RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
    RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
    CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);


    controller.addSeed("http://www.ics.uci.edu/");
    controller.start(LocalDataCollectorCrawler.class, numberOfCrawlers);


    List<Object> crawlersLocalData = controller.getCrawlersLocalData();
    long totalLinks = 0;
    long totalTextSize = 0;

View Full Code Here

     * For each crawl, you need to add some seed urls. These are the first
     * URLs that are fetched and then the crawler starts following links
     * which are found in these pages
     */


    controller.addSeed("http://www.ics.uci.edu/");
    controller.addSeed("http://www.ics.uci.edu/~lopes/");
    controller.addSeed("http://www.ics.uci.edu/~welling/");


    /*
     * Start the crawl. This is a blocking operation, meaning that your code

View Full Code Here

     * URLs that are fetched and then the crawler starts following links
     * which are found in these pages
     */


    controller.addSeed("http://www.ics.uci.edu/");
    controller.addSeed("http://www.ics.uci.edu/~lopes/");
    controller.addSeed("http://www.ics.uci.edu/~welling/");


    /*
     * Start the crawl. This is a blocking operation, meaning that your code
     * will reach the line after this only when crawling is finished.

View Full Code Here

     * which are found in these pages
     */


    controller.addSeed("http://www.ics.uci.edu/");
    controller.addSeed("http://www.ics.uci.edu/~lopes/");
    controller.addSeed("http://www.ics.uci.edu/~welling/");


    /*
     * Start the crawl. This is a blocking operation, meaning that your code
     * will reach the line after this only when crawling is finished.
     */

View Full Code Here

    public static void main(String[] args) throws Exception {
      String rootFolder = "/tmp";
      int numberOfCrawlers = 1;


      CrawlController controller = new CrawlController(rootFolder);
      controller.addSeed("http://hadoop.apache.org/");
      controller.addSeed("http://hadoop.apache.org/common/");
      controller.addSeed("http://hadoop.apache.org/hdfs/");
      controller.addSeed("http://hadoop.apache.org/mapreduce/");
      controller.addSeed("http://avro.apache.org/");
      controller.addSeed("http://hbase.apache.org/");

View Full Code Here

0 1 2

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.