Module to configure Datahub::Factory.




5,393 latest version

3.9 quality score

Version information

  • 1.0.0 (latest)
released Mar 21st 2017
This version is compatible with:
  • Ubuntu

Start using this module


packedvzw/datahub_factory — version 1.0.0 Mar 21st 2017


Table of Contents

  1. Description
  2. Setup
  3. Usage
  4. Reference
  5. Development


Install and configure Datahub::Factory, an application to transfer and convert data from a (museum) Collection Management System to an exchange format (LIDO) or a Datahub instance.


What datahub_factory affects

  • This module uses meltwater-cpan to install the Datahub::Factory CPAN package. However, the module is included, not configured. You are free to configure it elsewhere in your manifests without running into errors.

Beginning with datahub_factory


There are two parts to this module, datahub_factory to install and configure Datahub::Factory and datahub_factory::pipeline to create and manage pipeline configuration files.

datahub_factory takes no options:

include datahub_factory

To create a pipeline configuration file (installed in /etc/datahub-factory/pipelines), use the defined type datahub_factory::pipeline:

datahub_factory::pipeline {'test.ini':
    importer_plugin  => 'Adlib',
    importer_options => {
        file_name => '/tmp/adlib.xml',
        data_path => 'recordList.record.*',

    fixer_plugin     => 'Fix',
    fixer_id_path    => 'administrativeMetadata.recordWrap.recordID.0._',
    fixer_options    => {
        file_name => '/tmp/msk.fix'

    exporter_plugin  => 'Exporter',
    exporter_options => {
        datahub_url         => '',
        datahub_format      => 'LIDO',
        oauth_client_id     => 'datahub',
        oauth_client_secret => 'datahub',
        oauth_username      => 'datahub',
        oauth_password      => 'datahub',
    setup_cron       => true,
    cron_frequency   => {
        hour   => 2,
        minute => 0

This creates the pipeline /etc/datahub-factory/pipelines/test.ini that fetches data from an Adlib data dump (/tmp/adlib.xml), uses a Catmandu fix called /tmp/msk.fix and submits it to a Datahub instance running at This entire operation is run every night at 2:00 by cron.


Class datahub_factory

The base class must be included before you can define a pipeline, but takes no options.

Defined type datahub_factory::pipeline

Create a pipeline configuration file in /etc/datahub-factory/pipelines and optionally creates a cron job to run the pipeline at a certain interval.

Configuring the pipeline is done by first selecting the importer, exporter and fixer plugin to use (<type>_plugin) and then passing a hash of key, value-pairs to <type>_options. The contents of the hash depend on the options the plugin requires.

Add a cron job by setting setup_cron to true and passing a frequency (in the format puppet-cron expects) to setup_frequency. The job is run by the datahub-factory user (which is created automatically).


  • importer_plugin: select the importer plugin to use.
  • importer_options: pass options to the importer plugin. Valid options are dependent on the plugin used.
  • fixer_plugin: select the fixer plugin.
  • fixer_options: options for the fixer plugin.
  • fixer_id_path: set the path of an ID of every item that is transformed by the fixer (after the transformation) to use in logging.
  • exporter_plugin: select the exporter plugin.
  • exporter_options: options for the exporter plugin.
  • setup_cron: set to true to create a cron job for this pipeline.
  • cron_frequency: pass a frequency for crone (in the format puppet-cron expects).


Pull requests welcome at