Version information
This version is compatible with:
- Debian
Start using this module
Add this module to your Puppetfile:
mod 'MrAlias-druid', '0.2.0'
Learn more about managing modules with a PuppetfileDocumentation
druid
Table of Contents
Module Description
Druid is a data store solution designed for online analytical processing of time-series data. This module is used to manage the Druid services used throughout a Druid cluster, namely the Historical, Broker, Coordinator, Indexing Services, and Realtime services.
Setup
What Druid Affects
- Installs the Druid jars to the local filesystem.
- Created Druid configuration files on the local filesystem.
- Creates and manages a systemd service for each Druid service to run.
What Druid Does Not Manage
This module will not setup all the additional services a full Druid cluster will need. Specificaly it does not manage the following.
- Cluster deep storage
- Metadata storage
- ZooKeeper
Setup Requirements
- Puppet >= 3.x (It might work with a previous version, but no guarantees are made).
- Ruby >= 1.9.3
- systemd
Beginning With Druid
To install the Druid jars and setup basic configurations the following is all you need.
include druid
Usage
The module is designed to be called differently depending on the type of Druid service the node should be running.
Standalone Druid Packages
If you want to run the druid service on your own the base druid
class can be used to maintain settings. The following will install the Druid-0.8.0 release in /opt/
, create a symlink from the installed directory to /opt/druid/
, and then install a basic common properties file in /opt/druid/configs/
class { 'druid':
version => '0.8.0',
install_dir => '/opt',
config_dir => '/opt/druid/configs',
}
Hiera Support
All parameters for all druid classes are definable in hiera. The following hiera excerpt shows how the above example could have been accomplished by defining the parameters in hiera.
druid::version: '0.8.0',
druid::install_dir: '/opt'
druid::config_dir: '/opt/druid/configs'
Historical Node
The druid::historical
class is used to setup and manage Druid as a historical node.
class { 'druid::historical':
host => $::ipaddress_lo,
port => 8080,
}
If you need to manage common configuration options you can additionally call the base druid
class.
class { 'druid':
storage_type => 's3',
s3_access_key => 'some_access_key',
s3_secret_key => 'super_secret_fake_key',
s3_bucket => 'druid',
s3_base_key => 'test-druid',
s3_archive_bucket => 'druid-archive',
s3_archive_base_key => 'test_archive_base_key',
}
However, a simpler way to accomplish this would be to define the whole configuration via hiera.
druid::storage_type: 's3'
druid::s3_access_key: 'some_access_key'
druid::s3_secret_key: 'super_secret_fake_key'
druid::s3_bucket: 'druid'
druid::s3_base_key: 'test-druid'
druid::s3_archive_bucket: 'druid-archive'
druid::s3_archive_base_key: 'test_archive_base_key'
druid::historical::host: %{::ipaddress_lo}
druid::historical::port: 8080
Reference
Private Classes
These classes should not be called outside of the module.
druid::indexing
Provides management of all common indexing related configurations.
druid::indexing::azure_logs_container
The Azure Blob Store container to store logs at.
druid::indexing::azure_logs_prefix
The path to prepend to logs for Azure storage.
druid::indexing::hdfs_logs_directory
The HDFS directory to store logs.
druid::indexing::local_logs_directory
Local filesystem path to store logs.
Default value: '/var/log'
druid::indexing::logs_type
Where to store task logs.
Valid values: 'noop'
, 's3'
, 'azure'
, 'hdfs'
, 'file'
.
Default value: 'file'
.
druid::indexing::s3_logs_bucket
S3 bucket name to store logs at.
druid::indexing::s3_logs_prefix
S3 key prefix.
Public Classes
druid
The main module class. This class is not intended to be called directly. All parameters a expected to be configured via hiera, but support is still provided to directly configure this class as a puppet class resource directly.
druid::announcer_max_bytes_per_node
Max byte size for Znode.
Default value: 524288
.
druid::announcer_segments_per_node
Each Znode contains info for up to this many segments.
Default value: 50
.
druid::announcer_type
The type of data segment announcer to use.
Valid values: 'legacy'
or 'batch'
.
Default value: 'batch'
.
druid::cache_expiration
Memcached expiration time.
Default value: 2592000
(30 days).
druid::cache_hosts
Array of Memcached hosts ('host:port'
).
Default value: []
.
druid::cache_initial_size
Initial size of the hashtable backing the local cache.
Default value: 500000
.
druid::cache_log_eviction_count
If non-zero, log local cache eviction for that many items.
Default value: 0
.
druid::cache_max_object_size
Maximum object size in bytes for a Memcached object.
Default value: 52428800
(50 MB).
druid::cache_memcached_prefix
Key prefix for all keys in Memcached.
Default value: 'druid'
.
druid::cache_size_in_bytes
Maximum local cache size in bytes. Zero disables caching.
Default value: 0
.
druid::cache_timeout
Maximum time in milliseconds to wait for a response from Memcached.
Default value: 500
.
druid::cache_type
The type of cache to use for queries.
Valid values: 'local'
or 'memcached'
.
Default value: 'local'
.
druid::cache_uncacheable
All query types to not cache.
druid::cassandra_host
Cassandra host to access for deep storage.
druid::cassandra_keyspace
Cassandra key space.
druid::config_dir
Directory druid will keep configuration files.
Default value: '/etc/druid'
.
druid::curator_compress
Boolean flag for whether or not created Znodes should be compressed.
Default value: true
.
druid::discovery_curator_path
Services announce themselves under this ZooKeeper path.
Default value: '/druid/discovery'
.
druid::emitter_http_flush_count
How many messages can the internal message buffer hold before flushing (sending).
Default value: 500
.
druid::emitter_http_flush_millis
How often to internal message buffer is flushed (data is sent).
Default value: 60000
.
druid::emitter_http_recipient_base_url
The base URL to emit messages to. Druid will POST JSON to be consumed at the HTTP endpoint specified by this property.
Default value: ''
.
druid::emitter_http_time_out
The timeout for data reads.
Default value: 'PT5M'
.
druid::emitter_logging_log_level
The log level at which message are logged.
Valid values: 'debug'
, 'info'
, 'warn'
, or 'error'
Default value: 'info'
.
druid::emitter_logging_logger_class
The class used for logging.
Valid values: 'HttpPostEmitter'
, 'LoggingEmitter'
, 'NoopServiceEmitter'
, or 'ServiceEmitter'
Default value: 'LoggingEmitter'
.
druid::emitter
Emitter module to use.
Valid values: 'noop'
, 'logging'
, or 'http'
.
Default value: 'logging'
.
druid::extensions_coordinates
Array of "groupId:artifactId[:version]" maven coordinates.
Default value: []
.
druid::extensions_default_version
Version to use for extension artifacts without version information.
druid::extensions_local_repository
The way maven gets dependencies is that it downloads them to a "local repository" on your local disk and then collects the paths to each of the jars. This specifies the directory to consider the "local repository". If this is set, extensions_remote_repositories
is not required.
Default value: '~/.m2/repository'
.
druid::extensions_remote_repositories
Array of remote repositories to load dependencies from.
Default value: ['http://repo1.maven.org/maven2/', 'https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local', ]
.
druid::extensions_search_current_classloader
This is a boolean flag that determines if Druid will search the main classloader for extensions. It defaults to true but can be turned off if you have reason to not automatically add all modules on the classpath.
Default value: true
.
druid::hdfs_directory
HDFS directory to use as deep storage.
druid::install_dir
Directory druid will be installed in.
Default value: '/usr/local/lib'
.
druid::java_pkg
Name of the java package to ensure installed on system.
Default value: 'openjdk-7-jre-headless'
.
druid::metadata_storage_connector_create_tables
Specifies to create tables in the metadata storage if they do not exits.
Default value: true
.
druid::metadata_storage_connector_password
The password to connect with.
Default value: 'insecure_pass'
.
druid::metadata_storage_connector_uri
The URI for the metadata storage.
Default value: 'jdbc:mysql://localhost:3306/druid?characterEncoding=UTF-8'
druid::metadata_storage_connector_user
The username to connect to the metadata storage with.
Default value: 'druid'
.
druid::metadata_storage_tables_audit
The table to use for audit history of configuration changes.
Default value: 'druid_audit'
.
druid::metadata_storage_tables_base
The base name for tables.
Default value: 'druid'
.
druid::metadata_storage_tables_config_table
The table to use to look for configs.
Default value: 'druid_config'
.
druid::metadata_storage_tables_rule_table
The table to use to look for segment load/drop rules.
Default value: 'druid_rules'
.
druid::metadata_storage_tables_segment_table
The table to use to look for segments.
Default value: 'druid_segments'
.
druid::metadata_storage_tables_task_lock
Used by the indexing service to store task locks.
Default value: 'druid_taskLock'
.
druid::metadata_storage_tables_task_log
Used by the indexing service to store task logs.
Default value: 'druid_taskLog'
.
druid::metadata_storage_tables_tasks
The table used by the indexing service to store tasks.
Default value: 'druid_tasks'
.
druid::metadata_storage_type
The type of metadata storage to use.
Valid values: 'mysql'
, 'postgres'
, or 'derby'
Default value: 'mysql'
.
druid::monitoring_emission_period
How often metrics are emitted.
Default value: 'PT1m'
.
druid::monitoring_monitors
Array of Druid monitors used by a node. See below for names and more information.
Valid array values:
Array Value | Description |
---|---|
'io.druid.client.cache.CacheMonitor' |
Emits metrics (to logs) about the segment results cache for Historical and Broker nodes. Reports typical cache statistics include hits, misses, rates, and size (bytes and number of entries), as well as timeouts and errors. |
'com.metamx.metrics.SysMonitor' |
This uses the SIGAR library to report on various system activities and statuses. Make sure to add the sigar library jar to your classpath if using this monitor. |
'io.druid.server.metrics.HistoricalMetricsMonitor' |
Reports statistics on Historical nodes. |
'com.metamx.metrics.JvmMonitor' |
Reports JVM-related statistics. |
'io.druid.segment.realtime.RealtimeMetricsMonitor' |
Reports statistics on Realtime nodes. |
Default value: []
.
druid::request_logging_dir
Historical, Realtime and Broker nodes maintain request logs of all of the requests they get (interacton is via POST, so normal request logs don’t generally capture information about the actual query), this specifies the directory to store the request logs in.
Default value: ''
.
druid::request_logging_feed
Feed name for requests.
Default value: 'druid'
.
druid::request_logging_type
Specifies the type of logging used.
Valid values: 'noop'
, 'file'
, 'emitter'
.
Default value: 'noop'
.
druid::s3_access_key
The access key to use to access S3.
druid::s3_archive_base_key
S3 object key prefix for archiving.
druid::s3_archive_bucket
S3 bucket name for archiving when running the indexing-service archive task.
druid::s3_base_key
S3 object key prefix for storage.
druid::s3_bucket
S3 bucket name.
druid::s3_secret_key
The secret key to use to access S3.
druid::selectors_indexing_service_name
The service name of the indexing service Overlord node. To start the Overlord with a different name, set it with this property.
Default value: 'druid/overlord'
.
druid::storage_directory
Directory on disk to use as deep storage if $storage_type
is set to 'local'
.
Default value: '/tmp/druid/localStorage'
.
druid::storage_disable_acl
Boolean flag for ACL.
Default value: false
.
druid::storage_type
The type of deep storage to use.
Valid values: 'local'
, 'noop'
, 's3'
, 'hdfs'
, or 'c*'
Default value: 'local'
.
druid::version
Version of druid to install.
Default value: '0.8.0'
.
druid::zk_paths_announcements_path
Druid node announcement path.
druid::zk_paths_base
Base Zookeeper path.
Default value: '/druid'
.
druid::zk_paths_coordinator_path
Used by the coordinator for leader election.
druid::zk_paths_indexer_announcements_path
Middle managers announce themselves here.
druid::zk_paths_indexer_base
Base zookeeper path for indexers.
druid::zk_paths_indexer_leader_latch_path
Used for Overlord leader election.
druid::zk_paths_indexer_status_path
Parent path for announcement of task statuses.
druid::zk_paths_indexer_tasks_path
Used to assign tasks to middle managers.
druid::zk_paths_live_segments_path
Current path for where Druid nodes announce their segments.
druid::zk_paths_load_queue_path
Entries here cause historical nodes to load and drop segments.
druid::zk_paths_properties_path
Zookeeper properties path.
druid::zk_service_host
The ZooKeeper hosts to connect to.
Default value: 'localhost'
.
druid::zk_service_session_timeout_ms
ZooKeeper session timeout, in milliseconds.
Default value: 30000
.
druid::broker
druid::broker::host
Host address for the service to listen on.
Default value: The $ipaddress
fact.
druid::broker::port
Port the service listens on.
Default value: 8082
.
druid::broker::service
The name of the service.
This is used as a dimension when emitting metrics and alerts. It is used to differentiate between the various services
Default value: 'druid/broker'
.
druid::broker::balancer_type
The way connections to historical nodes are balanced.
Valid values:
'random'
: Choose randomly.'connectionCount'
: Use node with the fewest active connections.
Default value: 'random'
.
druid::broker::cache_expiration
Memcached expiration time.
Default value: 2592000
(30 days).
druid::broker::cache_hosts
Array of Memcached hosts (i.e. '<host:port>').
Default value: []
.
druid::broker::cache_initial_size
Initial size of the hashtable backing the cache.
Default value: 500000
.
druid::broker::cache_log_eviction_count
If non-zero, log cache eviction every logEvictionCount items.
Default value: 0
.
druid::broker::cache_max_object_size
Maximum object size in bytes for a Memcached object.
Default value: 52428800
(50 MB).
druid::broker::cache_memcached_prefix
Key prefix for all keys in Memcached.
Default value: 'druid'
.
druid::broker::cache_populate_cache
Populate the cache on the broker.
Valid values: true
, false
.
Default value: false
.
druid::broker::cache_size_in_bytes
Maximum cache size in bytes.
Zero disables caching.
Default value: 0
.
druid::broker::cache_timeout
Maximum time in milliseconds to wait for a response from Memcached.
Default value: 500
.
druid::broker::cache_type
Type of cache to use for queries.
Valid values:
'local'
: Use local file cache.'memcached'
: Use Memcached.
Default value: 'local'
.
druid::broker::cache_uncacheable
All query types to not cache.
Default value: ['groupBy', 'select']
.
druid::broker::cache_use_cache
Enable the cache on the broker.
Valid values: true
, false
.
Default value: false
.
druid::broker::http_num_connections
Connection pool size.
Specifically, this is the number of connections the Broker uses to connect to historical and real-time nodes. If there are more queries than this number that all need to speak to the same node, then they will queue up.
Default value: 5
.
druid::broker::http_read_timeout
The timeout for data reads.
Default value: 'PT15M'
.
druid::broker::jvm_opts
Array of options to set for the JVM running the service.
Default value: ['-server', '-Duser.timezone=UTC', '-Dfile.encoding=UTF-8', '-Djava.io.tmpdir=/tmp', '-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager']
druid::broker::processing_buffer_size_bytes
Buffer size for the storage of intermediate results.
The computation engine in both the Historical and Realtime nodes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed.
Default value: 1073741824
(1GB).
druid::broker::processing_column_cache_size_bytes
Maximum size in bytes for the dimension value lookup cache.
Any value greater than 0 enables the cache. It is currently disabled by default. Enabling the lookup cache can significantly improve the performance of aggregators operating on dimension values, such as the JavaScript aggregator, or cardinality aggregator, but can slow things down if the cache hit rate is low (i.e. dimensions with few repeating values). Enabling it may also require additional garbage collection tuning to avoid long GC pauses.
Default value: 0
(disabled).
druid::broker::processing_format_string
Format string to name processing threads.
Default value: 'processing-%s'
.
druid::broker::processing_num_threads
Number of processing threads available for processing of segments.
Rule of thumb is num_cores - 1, which means that even under heavy load there will still be one core available to do background tasks like talking with ZooKeeper and pulling down segments. If only one core is available, this property defaults to the value 1.
druid::broker::query_group_by_max_intermediate_rows
Maximum number of intermediate rows.
Default value: 50000
.
druid::broker::query_group_by_max_results
Maximum number of results.
Default value: 500000
.
druid::broker::query_group_by_single_threaded
Run single threaded group by queries.
Valid values: true
, false
.
Default value: false
.
druid::broker::query_search_max_search_limit
Maximum number of search results to return.
Default value: 1000
.
druid::broker::retry_policy_num_tries
Number of tries.
Default value: 1
.
druid::broker::select_tier_custom_priorities
Array of integer priorities to select server tiers with.
Default value: []
.
druid::broker::select_tier
Preference of tier to select based on priority.
If segments are cross-replicated across tiers in a cluster, you can tell the broker to prefer to select segments in a tier with a certain priority.
Valid values:
'highestPriority'
'lowestPriority'
'custom'
Default value: 'highestPriority'
.
druid::broker::server_http_max_idle_time
The Jetty max idle time for a connection.
Default value: 'PT5m'
.
druid::broker::server_http_num_threads
Number of threads for HTTP requests.
Default value: 10
.
druid::coordinator
Sets up configuration and manages the Druid coordinator service.
druid::coordinator::host
Host address the service listens on.
Default value: The $ipaddress
fact.
druid::coordinator::port
Port the service listens on.
Default value: 8081
.
druid::coordinator::service
The name of the service.
This is used as a dimension when emitting metrics and alerts. It is used to differentiate between the various services
Default value: 'druid/coordinator'
.
druid::coordinator::conversion_on
Specify if old segment indexing versions should be converted to the latest.
Valid values: true
or false
.
Default value: false
.
druid::coordinator::jvm_opts
Array of options to set for the JVM running the service.
Default value: [ '-server', '-Duser.timezone=UTC', '-Dfile.encoding=UTF-8', '-Djava.io.tmpdir=/tmp', '-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager' ]
druid::coordinator::load_timeout
Time before the coordinator assigns a segment to a historical node.
Default value: 'PT15M'
.
druid::coordinator::manager_config_poll_duration
Time between polls of the config table for updates.
Default value: 'PT1m'
.
druid::coordinator::manager_rules_alert_threshold
The duration after a failed poll upon which an alert should be emitted.
Default value: 'PT10M'
.
druid::coordinator::manager_rules_default_tier
The default tier which default rules will be loaded from.
Default value: '_default'
.
druid::coordinator::manager_rules_poll_duration
Time between polls of the set of active rules for updates.
Defines the amount of lag time it can take for the coordinator to notice rule changes.
Default value: 'PT1M'
.
druid::coordinator::manager_segment_poll_duration
Time between polls of the set of active segments for updates.
Defines the amount of lag time it can take for the coordinator to notice segment changes.
Default value: 'PT1M'
.
druid::coordinator::merge_on
Specify if the coordinator should try and merge small segments.
This helps keep segements in a more optimal segment size.
Default value: false
.
druid::coordinator::period
The run period for the coordinator.
The coordinator’s operates by maintaining the current state of the world in memory and periodically looking at the set of segments available and segments being served to make decisions about whether any changes need to be made to the data topology. This parameter sets the delay between each of these runs.
Default value: 'PT60S'
.
druid::coordinator::period_indexing_period
How often to send indexing tasks to the indexing service.
Only applies if merge or conversion is turned on.
Default value: 'PT1800S'
(30 mins).
druid::coordinator::start_delay
Time to let the service think it has all data.
The operation of the Coordinator works on the assumption that it has an up-to-date view of the state of the world when it runs, the current ZK interaction code, however, is written in a way that doesn’t allow the Coordinator to know for a fact that it’s done loading the current state of the world. This delay is a hack to give it enough time to believe that it has all the data.
Default value: 'PT300S'
.
druid::historical
Historical node class.
druid::historical::host
The host for the current node.
Default value: The $ipaddress
fact.
druid::historical::port
This is the port to actually listen on; unless port mapping is used, this will be the same port as is on druid::historical::host
.
Default value: 8083
.
druid::historical::service
The name of the service. This is used as a dimension when emitting metrics and alerts to differentiate between the various services
Default value: 'druid/historical'
.
druid::historical::jvm_opts
Array of options to set for the JVM running the service.
Default value: ['-server', '-Duser.timezone=UTC', '-Dfile.encoding=UTF-8', '-Djava.io.tmpdir=/tmp', '-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager']
druid::historical::populate_cache
Populate the cache on the historical.
Default value: false
.
druid::historical::processing_buffer_size_bytes
This specifies a buffer size for the storage of intermediate results. The computation engine in both the Historical and Realtime nodes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed.
Default value: 1073741824
(1GB).
druid::historical::processing_column_cache_size_bytes
Maximum size in bytes for the dimension value lookup cache. Any value greater than 0 enables the cache. Enabling the lookup cache can significantly improve the performance of aggregators operating on dimension values, such as the JavaScript aggregator, or cardinality aggregator, but can slow things down if the cache hit rate is low (i.e. dimensions with few repeating values). Enabling it may also require additional garbage collection tuning to avoid long GC pauses.
Default value: 0
(disabled).
druid::historical::processing_format_string
Realtime and historical nodes use this format string to name their processing threads.
Default value: 'processing-%s'
.
druid::historical::processing_num_threads
The number of processing threads to have available for parallel processing of segments. Druid will default to using 'num_cores - 1 || 1'.
druid::historical::query_groupBy_max_intermediate_rows
Maximum number of intermediate rows.
Default value: 50000
.
druid::historical::query_groupBy_max_results
Maximum number of results.
Default value: 500000
.
druid::historical::query_groupBy_single_threaded
Run single threaded group By queries.
Default value: false
.
druid::historical::query_search_max_search_limit
Maximum number of search results to return.
Default value: 1000
.
druid::historical::segment_cache_announce_interval_millis
How frequently to announce segments while segments are loading from cache. Set this value to zero to wait for all segments to be loaded before announcing.
Default value: 5000
(5 seconds).
druid::historical::segment_cache_delete_on_remove
Delete segment files from cache once a node is no longer serving a segment.
Default value: true
.
druid::historical::segment_cache_drop_segment_delay_millis
How long a node delays before completely dropping segment.
Default value: 30000
(30 seconds).
druid::historical::segment_cache_info_dir
Historical nodes keep track of the segments they are serving so that when the process is restarted they can reload the same segments without waiting for the Coordinator to reassign. This path defines where this metadata is kept.
druid::historical::segment_cache_locations
Segments assigned to a Historical node are first stored on the local file system (in a disk cache) and then served by the Historical node. These locations define where that local cache resides.
Valid values: 'none' (also undef
) or an absolute file path.
druid::historical::segment_cache_num_loading_threads
How many segments to load concurrently from from deep storage.
Default value: 1
.
druid::historical::server_http_max_idle_time
The Jetty max idle time for a connection.
Default value: 'PT5m'
.
druid::historical::server_http_num_threads
Number of threads for HTTP requests.
Default value: 10
.
druid::historical::server_max_size
The maximum number of bytes-worth of segments that the node wants assigned to it. This is not a limit that Historical nodes actually enforces, just a value published to the Coordinator node so it can plan accordingly.
Default value: 0
.
druid::historical::server_priority
In a tiered architecture, the priority of the tier, thus allowing control over which nodes are queried. Higher numbers mean higher priority. The Druid default (no priority) works for architecture with no cross replication (tiers that have no data-storage overlap). Data centers typically have equal priority.
Default value: 0
.
druid::historical::server_tier
A string to name the distribution tier that the storage node belongs to. Many of the rules Coordinator nodes use to manage segments can be keyed on tiers.
Default value: '_default_tier'
.
druid::historical::uncacheable
All query types to not cache.
Default value: ["groupBy", "select"]
.
druid::historical::use_cache
Enable the cache on the historical.
Default value: false
.
druid::indexing::overlord
Manages and sets up the indexing overlord.
druid::indexing::overlord::host
Host address the service listens on.
Default value: The $ipaddress
fact.
druid::indexing::overlord::port
Port the service listens on.
Default value: 8090
.
druid::indexing::overlord::service
The name of the service.
This is used as a dimension when emitting metrics and alerts. It is used to differentiate between the various services
Default value: 'druid/overlord'
.
druid::indexing::overlord::autoscale
Enable autoscaling.
Valid values: true
or false
.
Default value: false
.
druid::indexing::overlord::autoscale_max_scaling_duration
Time to wait for a middle manager to appear before giving up.
Default value: 'PT15M'
.
druid::indexing::overlord::autoscale_num_events_to_track
The number of autoscaling events to track.
This includes node creation and termination.
Default value: 10
.
druid::indexing::overlord::autoscale_origin_time
Starting time stamp the terminate period increments upon.
Default value: '2012-01-01T00:55:00.000Z'
.
druid::indexing::overlord::autoscale_pending_task_timeout
Time a task can be "pending" before scaling up.
Default value: 'PT30S'
.
druid::indexing::overlord::autoscale_provision_period
Time window to check if new middle managers should be added.
Default value: 'PT1M'
.
druid::indexing::overlord::autoscale_strategy
Strategy to run when autoscaling is required.
Valid values: 'noop'
or 'ec2'
.
Default value: 'noop'
.
druid::indexing::overlord::autoscale_terminate_period
Time window to check if middle managers should be removed.
Default value: 'PT5M'
.
druid::indexing::overlord::autoscale_worker_idle_timeout
Idle time of a worker before it is considered for termination.
Default value: 'PT90M'
.
druid::indexing::overlord::autoscale_worker_port
Port that middle managers will listen on.
Default value: 8080
.
druid::indexing::overlord::autoscale_worker_version
Node versions to create.
If set, the overlord will only create nodes of set version during autoscaling. This overrides dynamic configuration.
druid::indexing::overlord::jvm_opts
Array of options to set for the JVM running the service.
Default value: ['-server', '-Duser.timezone=UTC', '-Dfile.encoding=UTF-8', '-Djava.io.tmpdir=/tmp', '-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager']
druid::indexing::overlord::queue_max_size
Maximum number of active tasks at one time.
druid::indexing::overlord::queue_restart_delay
Sleep time when queue management throws an exception.
Default value: 'PT30S'
.
druid::indexing::overlord::queue_start_delay
Sleep time before starting overlord queue management.
This can be useful to give a cluster time to re-orient itself (e.g. after a widespread network issue).
Default value: 'PT1M'
.
druid::indexing::overlord::queue_storage_sync_rate
Rate to sync state with an underlying task persistence mechanism.
Default value: 'PT1M'
.
druid::indexing::overlord::runner_compress_znodes
Expect middle managers to compress Znodes.
Valid values: true
or false
.
Default value: true
.
druid::indexing::overlord::runner_max_znode_bytes
Maximum Znode size in bytes that can be created in Zookeeper.
Default value: 524288
.
druid::indexing::overlord::runner_min_worker_version
Minimum middle manager version to send tasks to.
Default value: '0'
.
druid::indexing::overlord::runner_task_assignment_timeout
Time to wait after a task has been assigned to a middle manager.
Tasks that take longer then this results in an error being thrown.
Default value: 'PT5M'
.
druid::indexing::overlord::runner_task_cleanup_timeout
Time to wait before failing a task when the middle manager assigned the task becomes disconnected from Zookeeper.
Default value: 'PT15M'
.
druid::indexing::overlord::runner_type
Specify if tasks are run locally or in a distributed environment.
Valid values: 'local'
or 'remote'
.
Default value: 'local'
.
druid::indexing::overlord::storage_recently_finished_threshold
Time to store task results.
Default value: 'PT24H'
.
druid::indexing::overlord::storage_type
Specify where incoming task are stored.
Storing incoming tasks in metadata storage allows for tasks to be resumed if the overlord should fail.
Valid values:
* `'local'`: In heap
* `'metadata'`: Metadata storage.
Default value: 'local'
.
druid::indexing::middle_manager
Manages and sets up the indexing middle managers.
druid::indexing::middle_manager::host
Host address the service listens on.
Default value: The $ipaddress
fact.
druid::indexing::middle_manager::port
Port the service listens on.
Default value: 8080
.
druid::indexing::middle_manager::service
The name of the service.
This is used as a dimension when emitting metrics and alerts. It is used to differentiate between the various services
Default value: 'druid/middlemanager'
.
druid::indexing::middle_manager::fork_properties
Hash of explicit child peon config options.
Peons inherit the configurations of their parent middle managers, but if this is undesired for certain config options they can be explicitly passed here.
These key value pairs are expected in Druid config format and are unvalidated. The keys should NOT include 'druid.indexer.fork.property'
as a prefix.
Example:
{
"druid.monitoring.monitors" => "[\"com.metamx.metrics.JvmMonitor\"]",
"druid.processing.numThreads" => 2,
}
Default value: {}
druid::indexing::middle_manager::jvm_opts
Array of options to set for the JVM running the service.
Default value: ['-server', '-Duser.timezone=UTC', '-Dfile.encoding=UTF-8', '-Djava.io.tmpdir=/tmp', '-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager']
druid::indexing::middle_manager::peon_mode
Mode peons are run in.
Valid values:
* `'local'`: Standalone mode (Not recommended).
* `'remote'`: Pooled.
Default value: 'remote'
.
druid::indexing::middle_manager::remote_peon_max_retry_count
Max retries a remote peon makes communicating with the overlord.
Default value: 10
.
druid::indexing::middle_manager::remote_peon_max_wait
Max retry time a remote peon makes communicating with the overlord.
Default value: 'PT10M'
.
druid::indexing::middle_manager::remote_peon_min_wait
Min retry time a remote peon makes communicating with the overlord.
Default value: 'PT1M'
.
druid::indexing::middle_manager::runner_allowed_prefixes
Array of prefixes of configs that are passed down to peons.
Default value: ['com.metamx', 'druid', 'io.druid', 'user.timezone', 'file.encoding']
.
druid::indexing::middle_manager::runner_classpath
Java classpath for the peons.
druid::indexing::middle_manager::runner_compress_znodes
Specify if Znodes are compressed.
Default value: true
.
druid::indexing::middle_manager::runner_java_command
Command for peons to use to execute java.
Default value: 'java'
.
druid::indexing::middle_manager::runner_java_opts
Java "-X" options for the peon to use in its own JVM.
druid::indexing::middle_manager::runner_max_znode_bytes
Maximum Znode size in bytes that can be created in Zookeeper.
Default value: 524288
.
druid::indexing::middle_manager::runner_start_port
Port peons begin running on.
Default value: 8100
.
druid::indexing::middle_manager::task_base_dir
Base temporary working directory.
Default value: '/tmp'
.
druid::indexing::middle_manager::task_base_task_dir
Base temporary working directory for tasks.
Default value: '/tmp/persistent/tasks'
.
druid::indexing::middle_manager::task_chat_handler_type
Specify service discovery type.
Certain tasks will use service discovery to announce an HTTP endpoint that events can be posted to.
Valid values: 'noop'
or 'announce'
.
Default value: 'noop'
.
druid::indexing::middle_manager::task_default_hadoop_coordinates
Array of default Hadoop version to use.
This is used with HadoopIndexTasks that do not request a particular version.
Default value: ['org.apache.hadoop:hadoop-client:2.3.0']
.
druid::indexing::middle_manager::task_default_row_flush_boundary
Highest row count before persisting to disk.
Used for indexing generating tasks.
Default value: 50000
.
druid::indexing::middle_manager::task_hadoop_working_path
Temporary working directory for Hadoop tasks.
Default value: '/tmp/druid-indexing'
.
druid::indexing::middle_manager::worker_capacity
Maximum number of tasks to accept.
druid::indexing::middle_manager::worker_ip
The IP of the worker.
Default value: 'localhost'
.
druid::indexing::middle_manager::worker_version
Version identifier for the middle manager.
Default value: '0'
.
druid::realtime
Sets up configuration and manages the Druid realtime service.
druid::realtime::host
Host address the service listens on.
Default value: The $ipaddress
fact.
druid::realtime::port
Port the service listens on.
Default value: 8084
.
druid::realtime::service
The name of the service.
This is used as a dimension when emitting metrics and alerts. It is used to differentiate between the various services
Default value: 'druid/realtime'
.
druid::realtime::jvm_opts
Array of options to set for the JVM running the service.
Default value: [ '-server', '-Duser.timezone=UTC', '-Dfile.encoding=UTF-8', '-Djava.io.tmpdir=/tmp', '-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager' ]
druid::realtime::processing_buffer_size_bytes
Buffer size for the storage of intermediate results.
The computation engine uses a scratch buffer of this size to do all intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed.
Default value: 1073741824
(1GB).
druid::realtime::processing_column_cache_size_bytes
Maximum size in bytes for the dimension value lookup cache.
Any value greater than 0
enables the cache. Enabling the lookup cache can significantly improve the performance of aggregators operating on dimension values, such as the JavaScript aggregator, or cardinality aggregator, but can slow things down if the cache hit rate is low (i.e. dimensions with few repeating values). Enabling it may also require additional garbage collection tuning to avoid long GC pauses.
Default value: 0
(disabled).
druid::realtime::processing_format_string
Format string to name processing threads.
Default value: 'processing-%s'
.
druid::realtime::processing_num_threads
Number of processing threads for processing of segments.
Rule of thumb is num_cores - 1, which means that even under heavy load there will still be one core available to do background tasks like talking with ZooKeeper and pulling down segments.
druid::realtime::publish_type
Where to publish segments.
Valid values: 'noop'
or 'metadata'
.
Default value: 'metadata'
.
druid::realtime::query_group_by_max_intermediate_rows
Maximum number of intermediate rows.
Default value: 50000
.
druid::realtime::query_group_by_max_results
Maximum number of results.
Default value: 500000
.
druid::realtime::query_group_by_single_threaded
Run single threaded groupBy
queries.
Default value: false
.
druid::realtime::query_search_max_search_limit
Maximum number of search results to return.
Default value: 1000
.
druid::realtime::segment_cache_locations
Where intermediate segments are stored.
druid::realtime::spec_file
File location of realtime specFile.
druid::realtime::spec_file_content
Content to ensure in spec_file.
Private Defined Types
druid::service
Resource used to setup common files needed for Druic services.
druid::indexing::middle_manager::service_name
Name the service is known by (e.g historical, broker, realtime, ...).
Default value: $namevar
druid::service::config_content
Required content of the service properties file.
druid::serviceservice_content
Required content of the systemd service file.
Limitations
The module has been designed to run on a Debian based system using systemd as a service manager.
Testing is currently only being done on Debian 8 (Jessie).
Change Log
Unreleased
Fixed
- Explicitly set the service provider to systemd. This is currently the only provider the module supports.
- Middle manager template sets service configuration properties.
- Corrected the
druid.cache.hosts
common configuration property to be a comma separated list.
Added
- Explicit documentation of the systemd requirement.
- Missing configuration parameters for
druid::indexing::middle_manager
Changed
- Druid runtime property templates are not populated with blank configuration options. When the puppet parameter is
undef
or empty no default value is put in the template. This is in order to not repeat default values twice as well as have the option to explicitly omit setting values.
0.1.1 - 2015-09-03
Fixed
- All systemd service unit configurations to expect signal termination exit codes from the druid process.
- Validation of input parameters in the
druid::historical
class. - Documentation errors.
- Unit test errors.
Added
- Project change log to track release changes.
- Project contributing guidelines in CONTRIBUTING.md.
- Travis CI config file.
0.1.0 - 2015-08-25
Added
- Main puppet module structure.
- Manifests for each service druid performs.
- Main
druid
class to handle package install and common configuration. - Defined type
druid::service
to abstract implementation of druid services. - Unit tests with 100% coverage of all resources defined.
- Acceptance tests for each service the module is expected to manage.
Dependencies
- puppetlabs/stdlib (>= 4.0.0 < 5.0.0)
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "{}" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright {yyyy} {name of copyright owner} Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.