Examples of org.apache.hadoop.mapreduce.Mapper

org.apache.hadoop.mapreduce.Mapper
Maps input key/value pairs to a set of intermediate key/value pairs.
Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.

The Hadoop Map-Reduce framework spawns one map task for each {@link InputSplit} generated by the {@link InputFormat} for the job.Mapper implementations can access the {@link Configuration} for the job via the {@link JobContext#getConfiguration()}.
The framework first calls {@link #setup(org.apache.hadoop.mapreduce.Mapper.Context)}, followed by {@link #map(Object,Object,Context)} for each key/value pair in the InputSplit. Finally {@link #cleanup(Context)} is called.
All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to a {@link Reducer} to determine the final output. Users can control the sorting and grouping by specifying two key {@link RawComparator} classes.
The Mapper outputs are partitioned per Reducer. Users can control which keys (and hence records) go to which Reducer by implementing a custom {@link Partitioner}.
Users can optionally specify a combiner, via {@link Job#setCombinerClass(Class)}, to perform local aggregation of the intermediate outputs, which helps to cut down the amount of data transferred from the Mapper to the Reducer.
Applications can specify if and how the intermediate outputs are to be compressed and which {@link CompressionCodec}s are to be used via the Configuration.

If the job has zero reduces then the output of the Mapper is directly written to the {@link OutputFormat} without sorting by keys.
Example:
```
 public class TokenCounterMapper  extends Mapper{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.collect(word, one); } } } 
```
Applications may override the {@link #run(Context)} method to exert greater control on map processing e.g. multi-threaded Mappers etc.
@see InputFormat @see JobContext @see Partitioner @see Reducer


  @Test
  public void testOutputFormatWithMismatchInOutputClasses() throws IOException {
    @SuppressWarnings({ "rawtypes", "unchecked" })
    final MapDriver<Text, Text, LongWritable, Text> driver = MapDriver
        .newMapDriver(new Mapper());
    driver.withOutputFormat(TextOutputFormat.class, TextInputFormat.class);
    driver.withInput(new Text("a"), new Text("1"));
    driver.withOutput(new LongWritable(), new Text("a\t1"));
    driver.runTest();
  }

View Full Code Here

  public void testUseOfWritableRegisteredComparator() throws IOException {
    
    // this test should use the comparator registered inside TestWritable
    // to output the keys in reverse order
    MapReduceDriver<TestWritable,Text,TestWritable,Text,TestWritable,Text> driver 
      = MapReduceDriver.newMapReduceDriver(new Mapper(), new Reducer());
    
    driver.withInput(new TestWritable("A1"), new Text("A1"))
      .withInput(new TestWritable("A2"), new Text("A2"))
      .withInput(new TestWritable("A3"), new Text("A3"))
      .withKeyGroupingComparator(new TestWritable.SingleGroupComparator())

View Full Code Here


  @Test
  public void testOutputFormatWithMismatchInOutputClasses() throws IOException {
    @SuppressWarnings({ "rawtypes", "unchecked" })
    final MapDriver<Text, Text, LongWritable, Text> driver = MapDriver
        .newMapDriver(new Mapper());
    driver.withOutputFormat(TextOutputFormat.class, TextInputFormat.class);
    driver.withInput(new Text("a"), new Text("1"));
    driver.withOutput(new LongWritable(), new Text("a\t1"));
    driver.runTest();
  }

View Full Code Here

  public void testUseOfWritableRegisteredComparator() throws IOException {


    // this test should use the comparator registered inside TestWritable
    // to output the keys in reverse order
    MapReduceDriver<TestWritable, Text, TestWritable, Text, TestWritable, Text> driver = MapReduceDriver
        .newMapReduceDriver(new Mapper(), new Reducer());


    driver
        .withInput(new TestWritable("A1"), new Text("A1"))
        .withInput(new TestWritable("A2"), new Text("A2"))
        .withInput(new TestWritable("A3"), new Text("A3"))

View Full Code Here


  @Test
  public void testOutputFormatWithMismatchInOutputClasses() throws IOException {
    @SuppressWarnings({ "rawtypes", "unchecked" })
    final MapDriver<Text, Text, LongWritable, Text> driver = MapDriver
        .newMapDriver(new Mapper());
    driver.withOutputFormat(TextOutputFormat.class, TextInputFormat.class);
    driver.withInput(new Text("a"), new Text("1"));
    driver.withOutput(new LongWritable(), new Text("a\t1"));
    driver.runTest();
  }

View Full Code Here


  @Test
  public void testOutputFormatWithMismatchInOutputClasses() throws IOException {
    @SuppressWarnings({ "rawtypes", "unchecked" })
    final MapDriver<Text, Text, LongWritable, Text> driver = MapDriver
        .newMapDriver(new Mapper());
    driver.withOutputFormat(TextOutputFormat.class, TextInputFormat.class);
    driver.withInput(new Text("a"), new Text("1"));
    driver.withOutput(new LongWritable(), new Text("a\t1"));
    driver.runTest();
  }

View Full Code Here

    basicMapReduceContext.getMetricsCollectionService().startAndWait();


    // now that the context is created, we need to make sure to properly close all datasets of the context
    try {
      String userMapper = context.getConfiguration().get(ATTR_MAPPER_CLASS);
      Mapper delegate = createMapperInstance(basicMapReduceContext.getProgram().getClassLoader(), userMapper);


      // injecting runtime components, like datasets, etc.
      try {
        Reflections.visit(delegate, TypeToken.of(delegate.getClass()),
                          new PropertyFieldSetter(basicMapReduceContext.getSpecification().getProperties()),
                          new MetricsFieldSetter(basicMapReduceContext.getMetrics()),
                          new DataSetFieldSetter(basicMapReduceContext));
      } catch (Throwable t) {
        LOG.error("Failed to inject fields to {}.", delegate.getClass(), t);
        throw Throwables.propagate(t);
      }


      LoggingContextAccessor.setLoggingContext(basicMapReduceContext.getLoggingContext());


      // this is a hook for periodic flushing of changes buffered by datasets (to avoid OOME)
      WrappedMapper.Context flushingContext = createAutoFlushingContext(context, basicMapReduceContext);


      if (delegate instanceof ProgramLifecycle) {
        try {
          ((ProgramLifecycle<BasicMapReduceContext>) delegate).initialize(basicMapReduceContext);
        } catch (Exception e) {
          LOG.error("Failed to initialize mapper with {}", basicMapReduceContext, e);
          throw Throwables.propagate(e);
        }
      }


      delegate.run(flushingContext);
      // sleep to allow metrics to be written
      TimeUnit.SECONDS.sleep(2L);


      // transaction is not finished, but we want all operations to be dispatched (some could be buffered in
      // memory by tx agent

View Full Code Here

  }


  @SuppressWarnings("unchecked")
  void runMapper(TaskInputOutputContext context, int index) throws IOException,
      InterruptedException {
    Mapper mapper = mappers.get(index);
    RecordReader rr = new ChainRecordReader(context);
    RecordWriter rw = new ChainRecordWriter(context);
    Mapper.Context mapperContext = createMapContext(rr, rw, context,
        getConf(index));
    mapper.run(mapperContext);
    rr.close();
    rw.close(context);
  }

View Full Code Here

      Class<? extends Mapper> klass = jobConf.getClass(prefix
          + CHAIN_MAPPER_CLASS + i, null, Mapper.class);
      Configuration mConf = getChainElementConf(jobConf, prefix
          + CHAIN_MAPPER_CONFIG + i);
      confList.add(mConf);
      Mapper mapper = ReflectionUtils.newInstance(klass, mConf);
      mappers.add(mapper);


    }


    Class<? extends Reducer> klass = jobConf.getClass(prefix

View Full Code Here

  }


  @SuppressWarnings("unchecked")
  void runMapper(TaskInputOutputContext context, int index) throws IOException,
      InterruptedException {
    Mapper mapper = mappers.get(index);
    RecordReader rr = new ChainRecordReader(context);
    RecordWriter rw = new ChainRecordWriter(context);
    Mapper.Context mapperContext = createMapContext(rr, rw, context,
        getConf(index));
    mapper.run(mapperContext);
    rr.close();
    rw.close(context);
  }

View Full Code Here

0 1

TOP

Related Classes of org.apache.hadoop.mapreduce.Mapper

co.cask.cdap.internal.app.runtime.batch.MapperWrapper

org.apache.hadoop.mapreduce.lib.chain.Chain

org.apache.hadoop.mrunit.mapreduce.TestMapDriver

org.apache.hadoop.mrunit.mapreduce.TestMapReduceDriver

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.