Version information
This version is compatible with:
- Puppet Enterprise 2023.8.x, 2023.7.x, 2023.6.x, 2023.5.x, 2023.4.x, 2023.3.x, 2023.2.x, 2023.1.x, 2023.0.x, 2021.7.x, 2021.6.x, 2021.5.x, 2021.4.x, 2021.3.x, 2021.2.x, 2021.1.x, 2021.0.x, 2019.8.x, 2019.7.x, 2019.5.x, 2019.4.x, 2019.3.x, 2019.2.x, 2019.1.x, 2019.0.x, 2018.1.x, 2017.3.x, 2017.2.x, 2017.1.x, 2016.5.x, 2016.4.x
- Puppet >= 3.4.0
- , , , ,
Start using this module
Add this module to your Puppetfile:
mod 'cesnet-oozie', '1.0.0'
Learn more about managing modules with a PuppetfileDocumentation
Apache Oozie Puppet Module
Table of Contents
- Module Description - What the module does and why it is useful
- Setup - The basics of getting started with oozie
- Usage - Configuration options and additional functionality
- Reference - An under-the-hood peek at what the module is doing and how
- Limitations - OS compatibility, etc.
- Development - Guide for contributing to the module
Module Description
Oozie puppet module installs Oozie server or client, optionally with features:
- security based on Kerberos
- HTTPS
Supported are:
- Debian 7/wheezy: Cloudera distribution (tested with CDH 5.4.2, Oozie 4.1.0)
- Ubuntu 14/trusty: Cloudera distribution
- RHEL 6 and clones: Cloudera distribution (tested with CDH 5.4.2, Oozie 4.1.0)
Setup
What oozie affects
- Packages: client and server packages can be installed
- Alternatives:
- alternatives are used for /etc/oozie/conf in Debian (Cloudera); This module switches to the new alternative by default, so the Cloudera original configuration can be kept intact.
- alternatives are used between http/https for /etc/oozie/tomcat-conf
- Files modified:
- */etc/oozie/conf.**
- /var/lib/oozie/ext-2.2: ext-2.2.zip is downloaded and extracted to /var/lib/oozie. If the file is already available locally at /var/lib/oozie/ext-2.2.zip (or the directory /var/lib/oozie/ext-2.2 already exists), the file is not downloaded.
- /var/lib/oozie/*.jar: JDBC files are copied from /usr/share/java according to configured database type in db parameter
- */etc/profile.d/oozie.**: created for client by default
- Database schema imported: according to the selected database type
- Services:
- oozie
- Helper Files: /var/lib/oozie/.puppet-oozie-setup, /var/lib/oozie/.puppet-oozie-schema-created, /var/lib/hadoop-hdfs/.puppet-oozie-dir-created
- Secret Files (keytabs, certificates): some files are copied to oozie home directory /var/lib/oozie
- HDFS directory and its content: /user/oozie
- Databases: for supported databases and when not disabled: user created and database schema imported using puppetlabs modules
Setup Requirements
There are several known or intended limitations in this module.
Be aware of:
-
Repositories: see cesnet-hadoop module Setup Requirements for details
-
Database setup: MariaDB/MySQL or PostgreSQL are supported. You need to install puppetlabs-mysql or puppetlabs-postgresql module, because they are not in dependencies.
-
Secure mode: keytabs must be prepared in /etc/security/keytabs/ (see realm parameter)
-
HTTPS: HTTP/<HOST> keytab must be available, keystore must be prepared in https_keystore, and signature secret file in /etc/security/http-auth-signature-secret
-
No inter-node dependencies
Beginning with oozie
Basic example without security: configured Hadoop cluster without security is needed (at least defaultFS or hdfs_hostname parameters in hadoop class). You will also need to add permissions for Oozie to Hadoop.
class{'hadoop':
hdfs_hostname=...
#defaultFS=...
...
properties => {
...
'hadoop.proxyuser.oozie.groups' => '*',
'hadoop.proxyuser.oozie.hosts' => '*',
},
...
}
class{'oozie':
realm => '',
version => 5,
}
node default {
include oozie::server
include oozie::client
include oozie::hdfs
Class['oozie::hdfs'] -> Class['oozie::server::service']
}
Note: The class oozie::server::config requires fully working HDFS (the namenode and enough datanodes), and oozie::hdfs. With multi-node cluster it may be needed to separate setup to more stages.
Usage
It is recommended to use real database backend. See following sections MySQL, and PostgreSQL. If choosing Oracle, you will also need to copy JDBC jar file to /var/lib/oozie.
Note: When changing database type and creating new schema, the puppet helper file /var/lib/oozie/.puppet-oozie-schema-created needs to be removed, or you can create the new schema manually:
su oozie -s /bin/bash
/usr/lib/oozie/bin/ooziedb.sh create -run
Note 2: You can override any module presets by the properties:
class{'oozie':
...
properties => {
'oozie.service.JPAService.jdbc.driver' => 'my.custom.jdbc.Driver',
'oozie.service.JPAService.jdbc.url' => 'jdbc:mysql://myserver:myport/oozie'
},
...
}
MySQL
Example MySQL: Oozie with MySQL, puppetlabs-mysql module must be installed:
Add this to the initial example:
class{'oozie':
...
db => 'mysql',
#db => 'mariadb',
db_password => 'ooziepassword',
}
node ... {
class { 'mysql::server':
root_password => 'strongpassword',
}
class { 'mysql::bindings':
java_enable => true,
#java_package_name => 'libmariadb-java',
}
}
Database is created in oozie::server::db (oozie::server) class.
PostgreSQL
Example PostgreSQL: Oozie with PostgreSQL, using puppetlabs-postgresql module:
class{'oozie':
...
db => 'postgresql',
db_password => 'ooziepassword',
}
node ... {
...
class { 'postgresql::server':
listen_addresses => 'localhost',
}
include postgresql::lib::java
}
Database is created in oozie::server::db (oozie::server) class.
Security
Security is enabled by setting Kerberos realm in realm parameter. Optionally also HTTPS can be enabled.
Security files must be prepared on proper places (see Requirements). But there can be used files from Hadoop. Keystore passphrase can't differ from the key passphrase inside the store.
Example:
class{'oozie':
...
https => true,
https_keystore_password => 'changeit',
realm => 'MY.REALM',
}
Note: the class oozie::hdfs creates the directory on HDFS. With enabled security, it must be included at HDFS namenode (or the class must be launched on the machine with the HDFS service admin keytab).
Note 2: You can consider modify or remove oozie.authentication.kerberos.name.rules. The default value is needed only when using cross-realm authentication:
properties => {
'oozie.authentication.kerberos.name.rules' => '::undef',
}
Cross-realm
Cross-realm environment is problematic, see issue OOZIE-2704.
Workarounds are possible:
- setup
The krb5.conf file must be modified temporarily so the default realm match the realm of oozie/HOSTNAME principal. Then you must launch setup manually (and mark it for oozie puppet module as done):
#defaultfs='hdfs://....'
oozie-setup sharelib create -fs $defaultfs -locallib /usr/lib/oozie/oozie-sharelib-yarn
touch /var/lib/oozie/.puppet-oozie-setup
- runtime
Link /etc/hadoop/conf/core-site.xml file to tomcat lib directory. For example:
ln -s /etc/hadoop/conf/core-site.xml /usr/lib/bigtop-tomcat/lib/
This is already done by site_hadoop CESNET puppet module.
Compatibility
For using with older versions of Cloudera (like CDH 5.3.1 / Oozie 4.0.0), you need to change parameters alternatives_ssl and oozie_sharelib. Defaults values are tested with CDH 5.4.2 / Oozie 4.1.0:
alternatives_ssl => 'oozie-tomcat-conf',
oozie_sharelib => '/usr/lib/oozie/oozie-sharelib-yarn.tar.gz',
###Cluster with more HDFS Name nodes
If there are used more HDFS namenodes in the Hadoop cluster (high availability, namespaces, ...), it is needed to have 'oozie' system user on all of them to authorization work properly. You could install full Oozie client (using oozie::client::install), but just creating the user is enough (using oozie::user).
Note, the oozie::hdfs class must be used too, but only on one of the HDFS namenodes. It includes the oozie::user.
Example:
node <HDFS_NAMENODE> {
include oozie::hdfs
}
node <HDFS_OTHER_NAMENODE> {
include oozie::user
}
###Upgrade
Configurations
The best way is to refresh configurations from the new original (=remove the old) and relaunch puppet on top of it. You may need to remove helper file *~oozie/.puppet-ssl**, when Hadoop SSL configuration files are recreated.
For example:
alternative='cluster'
d='oozie'
mv /etc/{d}$/conf.${alternative} /etc/${d}/conf.cdhXXX
update-alternatives --auto ${d}-conf
rm -fv ~oozie/.puppet-ssl*
# upgrade
...
puppet agent --test
#or: puppet apply ...
Database schema
Under oozie user:
/usr/lib/oozie/bin/ooziedb.sh create -run
Shared library
oozie-setup sharelib upgrade -fs hdfs://${DEFAULT_FS} -locallib /usr/lib/oozie/oozie-sharelib-yarn
Reference
###Classes
oozie
: Apache Oozie Workflow Scheduler - configure classoozie::client
: Oozie Clientoozie::config
oozie::install
oozie::common::config
oozie::common::postinstall
oozie::server
: Oozie Serveroozie::server::config
oozie::server::install
oozie::server::service
oozie::hdfs
: HDFS Initializationsoozie::params
oozie::user
: Create oozie system user
Parameters (oozie class)
####acl
Determines, if setfacl command is available and /etc/hadoop is on filesystem supporting POSIX ACL. Default: undef.
It is used to set privileges of ssl-server.xml and ssl-client.xml for Oozie. If the POSIX ACL is not supported, disable this parameter also in cesnet-hadoop puppet module.
####adminusers
Administrator users. Default: undef.
####alternatives
Switches the alternatives used for the configuration. Default: 'cluster' (Debian) or undef.
It can be used only when supported (for example with Cloudera distribution).
####alternatives_ssl
Switches the alternatives used for tomcat http/https configuration. Default: 'oozie-tomcat-conf'.
It must have proper value according to the Oozie version used. There has been several changes in Cloudera. Other valid value may be oozie-tomcat-deployment.
####database_setup_enable
Enables database setup (if suported). Default: true.
####db
Database type. Default: 'derby'.
Values can be:
- derby
- mysql
- postgresql
- oracle
####db_host
Database host. Default: 'localhost'.
####db_name
Database name. Default: 'oozie'.
####db_user
Database user. Default: 'oozie'.
####db_password
Database password. Default: ' ' (space)
Note, Oozie requires a space, when using empty password.
####environment
Define environment variable OOZIE_URL on clients. Default: true.
####gui_enable
Downloads and deploys Oozie extras GUI. Default: true.
There may be reasons to disable it:
- Its license is GPL (probably incompatible and less free than Apache 2.0, but IANAL)
- GUI is not compatible with Java >= 8 (tested with CDH <= 5.7.1, Oozie <= 4.1.0)
####https
Enable HTTPS. Default: false.
####https_keystore
Certificates keystore file. Default: '/etc/security/server.keystore'.
####https_keystore_password
Certificates keystore file password. Default: 'changeit'.
Note, the ::undef value can reset https_keystore_password to empty value. But oozie doesn't accept empty password, the startup scripts will set its default value "password" in that case.
####https_keytab
Keytab file for SPNEGO HTTPS. Default: '/etc/security/keytab/http.service.keytab'.
The file is copied into oozie home directory.
####keytab
Oozie keytab file. Default: '/etc/security/keytab/oozie.service.keytab'.
####hue_hostnames
Authenticated Apache Hue hostnames. Default: [].
Sets properties oozie.service.ProxyUserService.proxyuser.hue.hosts and oozie.service.ProxyUserService.proxyuser.hue.groups. They can be overridden by properties parameter.
####oozie_hostname
Oozie server hostname. Default: $::fqdn.
Needed when any oozie client is also on separated node.
####oozie_sharelib
Path to oozie sharelib for setup. Default: '/usr/lib/oozie/oozie-sharelib-yarn'.
Note: there has been change in Cloudera somewhere between 5.3.1 and 5.4.2, the older path has been '/usr/lib/oozie/oozie-sharelib-yarn.tar.gz'.
####realm
Enable security and Kerberos realm to use. Default: ''.
Empty string disables the security.
####version
Oozie version. Default: 5.
Oozie version to distinguish differences between Oozie 4.x and Oozie 5.x:
- moved from Tomcat to Jetty + SSL configured using properties instead of alternatives
- properties names changes
- credential classes list changes
Limitations
See Setup Requirements section.
Only Puppet 3 can be tested by unit-tests, Puppet 4 can't use custom site.pp.
Development
- Repository: https://github.com/MetaCenterCloudPuppet/cesnet-oozie
- Testing:
- basic: see .travis.yml
- vagrant: https://github.com/MetaCenterCloudPuppet/hadoop-tests
See https://github.com/MetaCenterCloudPuppet/cesnet-oozie/commits/master.
#Incompatible changes
1.0.0:
- removed hdfs_hostname parameter (replaced by *hadoop::_defaultFS)
- removed defaultFS parameter (replaced by *hadoop::_defaultFS)
Dependencies
- puppetlabs/stdlib (>= 1.0.0 <7.0.0)
- adrien/alternatives (>= 0.3.0 <1.0.0)
- cesnet/hadoop (>= 3.0.0 <4.0.0)
- cesnet/hadoop_lib (>=0.4.0 <1.0.0)
The MIT License (MIT) Copyright (c) 2014-2020 CESNET Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.