Forge Home

datahub_factory

Module to configure Datahub::Factory.

6,412 downloads

6,412 latest version

3.9 quality score

We run a couple of automated
scans to help you access a
module's quality. Each module is
given a score based on how well
the author has formatted their
code and documentation and
modules are also checked for
malware using VirusTotal.

Please note, the information below
is for guidance only and neither of
these methods should be considered
an endorsement by Puppet.

Support the Puppet Community by contributing to this module

You are welcome to contribute to this module by suggesting new features, currency updates, or fixes. Every contribution is valuable to help ensure that the module remains compatible with the latest Puppet versions and continues to meet community needs. Complete the following steps:

  1. Review the module’s contribution guidelines and any licenses. Ensure that your planned contribution aligns with the author’s standards and any legal requirements.
  2. Fork the repository on GitHub, make changes on a branch of your fork, and submit a pull request. The pull request must clearly document your proposed change.

For questions about updating the module, contact the module’s author.

Version information

  • 1.0.0 (latest)
released Mar 21st 2017
This version is compatible with:

Start using this module

  • r10k or Code Manager
  • Bolt
  • Manual installation
  • Direct download

Add this module to your Puppetfile:

mod 'packedvzw-datahub_factory', '1.0.0'
Learn more about managing modules with a Puppetfile

Add this module to your Bolt project:

bolt module add packedvzw-datahub_factory
Learn more about using this module with an existing project

Manually install this module globally with Puppet module tool:

puppet module install packedvzw-datahub_factory --version 1.0.0

Direct download is not typically how you would use a Puppet module to manage your infrastructure, but you may want to download the module in order to inspect the code.

Download

Documentation

packedvzw/datahub_factory — version 1.0.0 Mar 21st 2017

datahub_factory

Table of Contents

  1. Description
  2. Setup
  3. Usage
  4. Reference
  5. Development

Description

Install and configure Datahub::Factory, an application to transfer and convert data from a (museum) Collection Management System to an exchange format (LIDO) or a Datahub instance.

Setup

What datahub_factory affects

  • This module uses meltwater-cpan to install the Datahub::Factory CPAN package. However, the module is included, not configured. You are free to configure it elsewhere in your manifests without running into errors.

Beginning with datahub_factory

Usage

There are two parts to this module, datahub_factory to install and configure Datahub::Factory and datahub_factory::pipeline to create and manage pipeline configuration files.

datahub_factory takes no options:

include datahub_factory

To create a pipeline configuration file (installed in /etc/datahub-factory/pipelines), use the defined type datahub_factory::pipeline:

datahub_factory::pipeline {'test.ini':
    importer_plugin  => 'Adlib',
    importer_options => {
        file_name => '/tmp/adlib.xml',
        data_path => 'recordList.record.*',
    },

    fixer_plugin     => 'Fix',
    fixer_id_path    => 'administrativeMetadata.recordWrap.recordID.0._',
    fixer_options    => {
        file_name => '/tmp/msk.fix'
    },

    exporter_plugin  => 'Exporter',
    exporter_options => {
        datahub_url         => 'my.thedatahub.io',
        datahub_format      => 'LIDO',
        oauth_client_id     => 'datahub',
        oauth_client_secret => 'datahub',
        oauth_username      => 'datahub',
        oauth_password      => 'datahub',
    },
    
    setup_cron       => true,
    cron_frequency   => {
        hour   => 2,
        minute => 0
    },
}

This creates the pipeline /etc/datahub-factory/pipelines/test.ini that fetches data from an Adlib data dump (/tmp/adlib.xml), uses a Catmandu fix called /tmp/msk.fix and submits it to a Datahub instance running at my.thedatahub.io. This entire operation is run every night at 2:00 by cron.

Reference

Class datahub_factory

The base class must be included before you can define a pipeline, but takes no options.

Defined type datahub_factory::pipeline

Create a pipeline configuration file in /etc/datahub-factory/pipelines and optionally creates a cron job to run the pipeline at a certain interval.

Configuring the pipeline is done by first selecting the importer, exporter and fixer plugin to use (<type>_plugin) and then passing a hash of key, value-pairs to <type>_options. The contents of the hash depend on the options the plugin requires.

Add a cron job by setting setup_cron to true and passing a frequency (in the format puppet-cron expects) to setup_frequency. The job is run by the datahub-factory user (which is created automatically).

Parameters

  • importer_plugin: select the importer plugin to use.
  • importer_options: pass options to the importer plugin. Valid options are dependent on the plugin used.
  • fixer_plugin: select the fixer plugin.
  • fixer_options: options for the fixer plugin.
  • fixer_id_path: set the path of an ID of every item that is transformed by the fixer (after the transformation) to use in logging.
  • exporter_plugin: select the exporter plugin.
  • exporter_options: options for the exporter plugin.
  • setup_cron: set to true to create a cron job for this pipeline.
  • cron_frequency: pass a frequency for crone (in the format puppet-cron expects).

Development

Pull requests welcome at https://github.com/thedatahub/puppet-datahub_factory.