Version information
This version is compatible with:
- Puppet Enterprise 2023.7.x, 2023.6.x, 2023.5.x, 2023.4.x, 2023.3.x, 2023.2.x, 2023.1.x, 2023.0.x, 2021.7.x, 2021.6.x, 2021.5.x, 2021.4.x, 2021.3.x, 2021.2.x, 2021.1.x, 2021.0.x
- Puppet >= 7.0.0 < 9.0.0
- , , , , ,
Tasks:
- reconfig
Start using this module
Add this module to your Puppetfile:
mod 'treydock-slurm', '4.0.1'
Learn more about managing modules with a PuppetfileDocumentation
puppet-slurm
Table of Contents
- Overview
- Usage - Configuration options
- Reference - Parameter and detailed reference to all options
- Limitations - OS compatibility, etc.
Overview
Manage SLURM.
Supported Versions of SLURM
This module is designed to work with SLURM 22.05.x, 23.02.x and 23.11.x.
SLURM Version | SLURM Puppet module versions |
---|---|
20.02.x | 0.x |
20.11.x | 1.x |
21.08.x & 22.05.x | 2.x |
23.02.x | 3.x |
23.11.x | 4.x |
Usage
This module is designed so the majority of configuration changes are made through the slurm
class directly.
In most cases all that is needed to begin using this module is to have include slurm
defined. The following usage examples are all assuming that a host has include slurm
performed already and the rest of the configuration is done via Hiera.
It's advisable to put as much of the Hiera data as possible in a location like common.yaml
.
Setup
In order to use SLURM the munge daemon must be configured. This module will include the munge
class from treydock/munge but will not configure munge. The minimial configuration needed is to set the munge key source to the munge key stored in a module somewhere.
munge::munge_key_source: "puppet:///modules/profile/munge.key"
As of version v2.3.0 you can also provide the content of the munge key, for example if your using EYAML in Hiera.
munge::munge_key_content: "supersecret"
Dependencies
The following parameter changes can be made to avoid dependencies on several modules
slurm::manage_firewall: false
- Disable dependency on puppetlabs/firewallslurm::use_nhc: false
ORslurm::include_nhc: false
- Disable dependency on treydock/nhcslurm::manage_rsyslog: false
ORslurm::use_syslog: false
- Disable dependenchy on saz/rsyslogslurm::manage_logrotate: false
- Disable dependency on puppet/logrotateslurm::source_install_manage_alternatives: false
- Wheninstall_method
issource
and installing on a system without a default Python install, this will disable a dependency on puppet/alternativesslurm::tuning_net_core_somaxconn: false
- Disable dependency on herculesteam/augeasproviders_sysctl
NOTE: If use_syslog
is set to true
there is a soft dependency on saz/rsyslog
NOTE: If use_nhc
and include_nhc
are set to true
there is a soft dependency on treydock/nhc
common
The following could be included in common.yaml
. This assumes your site has access to SLURM RPMs.
slurm::repo_baseurl: "https://repo.hpc.osc.edu/internal/slurm/%{facts.os.release.major}/"
slurm::install_torque_wrapper: true
slurm::install_pam: true
slurm::slurm_group_gid: 93
slurm::slurm_user_uid: 93
slurm::slurm_user_home: /var/lib/slurm
slurm::manage_firewall: false
slurm::use_syslog: true
slurm::cluster_name: example
slurm::slurmctld_host:
- slurmctld.example.com
slurm::slurmdbd_host: slurmdbd.example.com
slurm::greses:
nvml:
auto_detect: nvml
slurm::slurmd_spool_dir: /var/spool/slurmd
slurm::slurm_conf_override:
AccountingStorageTRES:
- gres/gpu
- gres/gpu:tesla
- license/ansys
Licenses:
- ansys:2
ReturnToService: 2
SelectType: select/cons_tres
SelectTypeParameters:
- CR_CPU
slurm::partitions:
batch:
default: 'YES'
def_mem_per_cpu: 1700
max_mem_per_cpu: 1750
nodes: slurmd01
slurm::nodes:
slurmd01:
node_hostname: slurmd01.example.com
cpus: 4
threads_per_core: 1
cores_per_socket: 1
sockets: 4
real_memory: 7000
Roles
The behavior of this module is determined by 5 booleans that set the role for a host.
client
- When true will setup a host as SLURM clientslurmctld
- When true will setup a host to run slurmctldslurmdbd
- When true will setup a host to run slurmdbddatabase
- When true will setup a host to manage the slurmdbd MySQL databaseslurmd
- When true will setup a host to run slurmdslurmrestd
- When true will setup a host to run slurmrestd
NOTE: The only role enabled by default is client
.
Role: slurmdbd and database
The following example will setup an instance of slurmdbd that exports the database resource that can be collected by a database server:
slurm::client: true
slurm::slurmdbd: true
slurm::database: true
slurm::slurmdbd_storage_host: db.example.com
slurm::slurmdbd_storage_loc: slurm_acct_db
slurm::slurmdbd_storage_user: slurmdbd
slurm::slurmdbd_storage_pass: changeme
slurm::export_database: true
slurm::export_database_tag: "%{lookup('slurm::slurmdbd_storage_host')}"
slurm::slurmdbd_conf_override:
MaxQueryTimeRange: '90-00:00:00'
MessageTimeout: '10'
The database server would have something like the following to collect the db resources
Mysql::Db <<| tag == $facts['fqdn'] |>>
The following example would avoid PuppetDB dependency and require including the slurm
class on the MySQL server
# common.yaml
slurm::slurmdbd_storage_host: db.example.com
slurm::slurmdbd_storage_loc: slurm_acct_db
slurm::slurmdbd_storage_user: slurmdbd
slurm::slurmdbd_storage_pass: changeme
# fqdn/db.example.com.yaml
slurm::client: false
slurm::database: true
# fqdn/slurmdbd.example.com.yaml
slurm::slurmdbd: true
slurm::database: false
Role: slurmctld
The following enables a host to act as the slurmctld daemon with a remote slurmdbd.
slurm::client: true
slurm::slurmdbd: false
slurm::database: false
slurm::slurmctld: true
If you wish to enable configless SLURM:
slurm::enable_configless: true
Role: slurmd
The following enables a host to act as a slurmd compute node
slurm::client: true
slurm::slurmdbd: false
slurm::database: false
slurm::slurmctld: false
slurm::slurmd: true
To have slurmd pull configs via configless SLURM:
slurm::configless: true
Role: client
If the majority of your configuration is done in common.yaml
then the default for slurm::client
of true
is sufficient to configure a host to act as a SLURM client.
Role: slurmrestd
First the common Hiera such as common.yaml
should have something like the below. Setting auth_alt_types
to include auth/jwt
will activate the Puppet code to manage JWT resources where appropriate.
slurm::auth_alt_types:
- auth/jwt
slurm::jwt_key_source: 'puppet:///modules/site_slurm/jwt.key'
For the host to run slurmrestd:
slurm::slurmrestd: true
slurm::conf usage
It's possible to deploy multiple slurm.conf files using this module.
The following example will deploy /etc/slurm/slurm-ascend.conf
with only ClusterName and SlurmctldHost changed.
include slurm
$cluster_conf = {
'ClusterName' => 'ascend',
'SlurmctldHost' => 'ascend-slurm01.example.com',
}
slurm::conf { 'ascend':
configs => $slurm::slurm_conf + $cluster_conf,
}
Reference
http://treydock.github.io/puppet-slurm/
Limitations
This module has been tested on:
- RedHat/CentOS 7 x86_64
- RedHat/Rocky/AlmaLinux 8 x86_64
- Debian 10 x86_64
- Ubuntu 20.04 x86_64
Reference
Table of Contents
Classes
Public Classes
slurm
: Manage SLURM
Private Classes
slurm::client
slurm::common::config
slurm::common::install
slurm::common::install::apt
slurm::common::install::rpm
slurm::common::install::source
slurm::common::munge
slurm::common::setup
slurm::common::user
slurm::params
slurm::resources
: Manage SLURM resources using Puppet typesslurm::slurmctld
slurm::slurmctld::config
slurm::slurmctld::service
slurm::slurmd
slurm::slurmd::config
slurm::slurmd::service
slurm::slurmdbd
slurm::slurmdbd::config
slurm::slurmdbd::db
slurm::slurmdbd::service
slurm::slurmrestd
slurm::slurmrestd::service
Defined types
slurm::conf
: Manage Slurm main configurationslurm::down_node
: Manage SLURM down node configurationslurm::gres
: Manage SLURM GRES configurationslurm::job_container
: Manage SLURM job_container.conf entryslurm::node
: Manage SLURM node configurationslurm::nodeset
: Manage SLURM nodeset configurationslurm::partition
: Manage a SLURM partition configurationslurm::spank
: Manage SLURM SPANK pluginslurm::switch
: Add switch to topology.conf
Data types
Slurm::CPUBind
: Type for CPU bind settingsSlurm::DownNodeState
Slurm::NodeState
Slurm::PartitionState
Slurm::PreemptMode
Slurm::SelectTypeParameters
Slurm::YesNo
Tasks
reconfig
: Execute 'scontrol reconfig'
Classes
slurm
Roles
Parameters
The following parameters are available in the slurm
class:
slurmd
slurmctld
slurmdbd
database
client
slurmrestd
repo_baseurl
install_method
install_prefix
package_ensure
install_torque_wrapper
install_pam
version
source_dependencies
configure_flags
source_install_manage_alternatives
slurmd_service_ensure
slurmd_service_enable
slurmd_service_limits
slurmd_options
slurmctld_service_ensure
slurmctld_service_enable
slurmctld_service_limits
slurmctld_options
slurmdbd_service_ensure
slurmdbd_service_enable
slurmdbd_service_limits
slurmdbd_options
slurmctld_restart_on_failure
slurmdbd_restart_on_failure
reload_services
restart_services
slurmctld_conn_validator_timeout
reconfig_ignore_errors
manage_slurm_user
slurm_user_group
slurm_group_gid
slurm_user
slurm_user_uid
slurm_user_comment
slurm_user_home
slurm_user_managehome
slurm_user_shell
slurmd_user
slurmd_user_group
manage_munge
munge_key_source
munge_key_content
manage_slurm_conf
manage_scripts
manage_firewall
use_syslog
manage_logrotate
logrotate_syslog_pid_path
manage_rsyslog
manage_database
export_database
export_database_tag
cli_filter_lua_source
cli_filter_lua_content
scrun_lua_source
scrun_lua_content
state_dir_nfs_device
state_dir_nfs_options
job_submit_lua_source
job_submit_lua_content
cluster_name
slurmctld_host
slurmdbd_host
conf_dir
log_dir
env_dir
spank_plugins
enable_configless
configless
conf_server
slurm_conf_override
slurm_conf_template
slurm_conf_source
partition_template
partition_source
node_template
node_source
switch_template
topology_source
gres_template
gres_source
partitions
nodes
nodesets
switches
greses
job_containers
slurmd_log_file
slurmd_spool_dir
slurmctld_log_file
state_save_location
slurmdbd_archive_dir
slurmdbd_log_file
slurmdbd_storage_host
slurmdbd_storage_loc
slurmdbd_storage_pass
slurmdbd_storage_port
slurmdbd_storage_type
slurmdbd_storage_user
slurmdbd_db_charset
slurmdbd_db_collate
slurmdbd_conf_override
slurmdbd_archive_dir_nfs_device
slurmdbd_archive_dir_nfs_options
use_nhc
include_nhc
health_check_program
health_check_program_source
manage_epilog
epilog
epilog_source
epilog_sourceselect
manage_prolog
prolog
prolog_source
prolog_sourceselect
manage_task_epilog
task_epilog
task_epilog_source
manage_task_prolog
task_prolog
task_prolog_source
auth_alt_types
jwt_key_content
jwt_key_source
slurmrestd_listen_address
slurmrestd_disable_token_creation
slurmrestd_user
slurmrestd_user_group
slurmrestd_service_ensure
slurmrestd_service_enable
slurmrestd_service_limits
slurmrestd_options
slurmrestd_restart_on_failure
cgroup_conf_template
cgroup_conf_source
cgroup_mountpoint
cgroup_plugin
cgroup_allowed_ram_space
cgroup_allowed_swap_space
cgroup_constrain_cores
cgroup_constrain_devices
cgroup_constrain_ram_space
cgroup_constrain_swap_space
cgroup_max_ram_percent
cgroup_max_swap_percent
cgroup_memory_swappiness
cgroup_min_ram_space
cgroup_signal_child_processes
oci_conf_template
oci_conf_source
oci_container_path
oci_create_env_file
oci_debug_flags
oci_disable_cleanup
oci_disable_hooks
oci_env_exclude
oci_mount_spool_dir
oci_run_time_env_exclude
oci_file_debug
oci_ignore_file_config_json
oci_run_time_create
oci_run_time_delete
oci_run_time_kill
oci_run_time_query
oci_run_time_run
oci_run_time_start
oci_srun_path
oci_srun_args
oci_std_io_debug
oci_syslog_debug
slurm_sh_template
slurm_csh_template
profile_d_env_vars
slurmd_port
slurmctld_port
slurmdbd_port
slurmrestd_port
tuning_net_core_somaxconn
include_resources
clusters
qoses
reservations
accounts
users
licenses
purge_qos
slurmdbd_conn_validator_timeout
slurmd
Data type: Boolean
Default value: false
slurmctld
Data type: Boolean
Default value: false
slurmdbd
Data type: Boolean
Default value: false
database
Data type: Boolean
Default value: false
client
Data type: Boolean
Default value: true
slurmrestd
Data type: Boolean
Default value: false
repo_baseurl
Data type: Optional[Variant[Stdlib::HTTPSUrl, Stdlib::HTTPUrl, Pattern[/^file:\/\//]]]
Default value: undef
install_method
Data type: Optional[Enum['package','source','none']]
Default value: undef
install_prefix
Data type: Stdlib::Absolutepath
Default value: '/usr'
package_ensure
Data type: String
Default value: 'present'
install_torque_wrapper
Data type: Boolean
Default value: false
install_pam
Data type: Boolean
Default value: true
version
Data type: String
Default value: '23.11.5'
source_dependencies
Data type: Array
Default value: []
configure_flags
Data type: Array
Default value: []
source_install_manage_alternatives
Data type: Boolean
Default value: true
slurmd_service_ensure
Data type: Enum['running','stopped']
Default value: 'running'
slurmd_service_enable
Data type: Boolean
Default value: true
slurmd_service_limits
Data type: Hash
Default value: {}
slurmd_options
Data type: Optional[String[1]]
Default value: undef
slurmctld_service_ensure
Data type: Enum['running','stopped']
Default value: 'running'
slurmctld_service_enable
Data type: Boolean
Default value: true
slurmctld_service_limits
Data type: Hash
Default value: {}
slurmctld_options
Data type: Optional[String[1]]
Default value: undef
slurmdbd_service_ensure
Data type: Enum['running','stopped']
Default value: 'running'
slurmdbd_service_enable
Data type: Boolean
Default value: true
slurmdbd_service_limits
Data type: Hash
Default value: {}
slurmdbd_options
Data type: Optional[String[1]]
Default value: undef
slurmctld_restart_on_failure
Data type: Boolean
Default value: true
slurmdbd_restart_on_failure
Data type: Boolean
Default value: true
reload_services
Data type: Boolean
Default value: false
restart_services
Data type: Boolean
Default value: true
slurmctld_conn_validator_timeout
Data type: Integer
Default value: 60
reconfig_ignore_errors
Data type: Boolean
Default value: false
manage_slurm_user
Data type: Boolean
Default value: true
slurm_user_group
Data type: String[1]
Default value: 'slurm'
slurm_group_gid
Data type: Optional[Integer]
Default value: undef
slurm_user
Data type: String[1]
Default value: 'slurm'
slurm_user_uid
Data type: Optional[Integer]
Default value: undef
slurm_user_comment
Data type: String[1]
Default value: 'SLURM User'
slurm_user_home
Data type: Stdlib::Absolutepath
Default value: '/var/lib/slurm'
slurm_user_managehome
Data type: Boolean
Default value: true
slurm_user_shell
Data type: Stdlib::Absolutepath
Default value: '/sbin/nologin'
slurmd_user
Data type: String[1]
Default value: 'root'
slurmd_user_group
Data type: String[1]
Default value: 'root'
manage_munge
Data type: Boolean
Default value: false
munge_key_source
Data type: Optional[String]
Default value: undef
munge_key_content
Data type: Optional[String]
Default value: undef
manage_slurm_conf
Data type: Boolean
Default value: true
manage_scripts
Data type: Boolean
Default value: true
manage_firewall
Data type: Boolean
Default value: true
use_syslog
Data type: Boolean
Default value: false
manage_logrotate
Data type: Boolean
Default value: true
logrotate_syslog_pid_path
Data type: Stdlib::Absolutepath
Default value: '/var/run/syslogd.pid'
manage_rsyslog
Data type: Boolean
Default value: true
manage_database
Data type: Boolean
Default value: true
export_database
Data type: Boolean
Default value: false
export_database_tag
Data type: Optional[String[1]]
Default value: $facts['networking']['domain']
cli_filter_lua_source
Data type: Optional[String[1]]
Default value: undef
cli_filter_lua_content
Data type: Optional[String[1]]
Default value: undef
scrun_lua_source
Data type: Optional[String[1]]
Default value: undef
scrun_lua_content
Data type: Optional[String[1]]
Default value: undef
state_dir_nfs_device
Data type: Optional[String[1]]
Default value: undef
state_dir_nfs_options
Data type: String[1]
Default value: 'rw,sync,noexec,nolock,auto'
job_submit_lua_source
Data type: Optional[String[1]]
Default value: undef
job_submit_lua_content
Data type: Optional[String[1]]
Default value: undef
cluster_name
Data type: String[1]
Default value: 'linux'
slurmctld_host
Data type: Variant[Array, String]
Default value: 'slurm'
slurmdbd_host
Data type: Stdlib::Host
Default value: 'slurmdbd'
conf_dir
Data type: Stdlib::Absolutepath
Default value: '/etc/slurm'
log_dir
Data type: Stdlib::Absolutepath
Default value: '/var/log/slurm'
env_dir
Data type: Stdlib::Absolutepath
Default value: '/etc/sysconfig'
spank_plugins
Data type: Hash
Default value: {}
enable_configless
Data type: Boolean
Default value: false
configless
Data type: Boolean
Default value: false
conf_server
Data type: Optional[String]
Default value: undef
slurm_conf_override
Data type: Hash
Default value: {}
slurm_conf_template
Data type: String[1]
Default value: 'slurm/slurm.conf/slurm.conf.erb'
slurm_conf_source
Data type: Optional[String[1]]
Default value: undef
partition_template
Data type: String[1]
Default value: 'slurm/slurm.conf/conf_values.erb'
partition_source
Data type: Optional[String[1]]
Default value: undef
node_template
Data type: String[1]
Default value: 'slurm/slurm.conf/conf_values.erb'
node_source
Data type: Optional[String[1]]
Default value: undef
switch_template
Data type: String[1]
Default value: 'slurm/slurm.conf/conf_values.erb'
topology_source
Data type: Optional[String[1]]
Default value: undef
gres_template
Data type: String[1]
Default value: 'slurm/slurm.conf/conf_values.erb'
gres_source
Data type: Optional[String[1]]
Default value: undef
partitions
Data type: Hash
Default value: {}
nodes
Data type: Hash
Default value: {}
nodesets
Data type: Hash
Default value: {}
switches
Data type: Hash
Default value: {}
greses
Data type: Hash
Default value: {}
job_containers
Data type: Hash
Default value: {}
slurmd_log_file
Data type: Optional[Stdlib::Absolutepath]
Default value: undef
slurmd_spool_dir
Data type: Stdlib::Absolutepath
Default value: '/var/spool/slurmd'
slurmctld_log_file
Data type: Optional[Stdlib::Absolutepath]
Default value: undef
state_save_location
Data type: Stdlib::Absolutepath
Default value: '/var/spool/slurmctld.state'
slurmdbd_archive_dir
Data type: Stdlib::Absolutepath
Default value: '/var/lib/slurmdbd.archive'
slurmdbd_log_file
Data type: Optional[Stdlib::Absolutepath]
Default value: undef
slurmdbd_storage_host
Data type: Stdlib::Host
Default value: 'localhost'
slurmdbd_storage_loc
Data type: String[1]
Default value: 'slurm_acct_db'
slurmdbd_storage_pass
Data type: String[1]
Default value: 'slurmdbd'
slurmdbd_storage_port
Data type: Variant[Stdlib::Port, String[0,0]]
Default value: 3306
slurmdbd_storage_type
Data type: String[1]
Default value: 'accounting_storage/mysql'
slurmdbd_storage_user
Data type: String[1]
Default value: 'slurmdbd'
slurmdbd_db_charset
Data type: String[1]
Default value: 'utf8'
slurmdbd_db_collate
Data type: String[1]
Default value: 'utf8_general_ci'
slurmdbd_conf_override
Data type: Hash
Default value: {}
slurmdbd_archive_dir_nfs_device
Data type: Optional[String[1]]
Default value: undef
slurmdbd_archive_dir_nfs_options
Data type: String[1]
Default value: 'rw,sync,noexec,nolock,auto'
use_nhc
Data type: Boolean
Default value: false
include_nhc
Data type: Boolean
Default value: false
health_check_program
Data type: Optional[Stdlib::Absolutepath]
Default value: undef
health_check_program_source
Data type: Optional[String[1]]
Default value: undef
manage_epilog
Data type: Boolean
Default value: true
epilog
Data type: Optional[String[1]]
Default value: undef
epilog_source
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
epilog_sourceselect
Data type: Optional[String[1]]
Default value: undef
manage_prolog
Data type: Boolean
Default value: true
prolog
Data type: Optional[String[1]]
Default value: undef
prolog_source
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
prolog_sourceselect
Data type: Optional[String[1]]
Default value: undef
manage_task_epilog
Data type: Boolean
Default value: true
task_epilog
Data type: Optional[String[1]]
Default value: undef
task_epilog_source
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
manage_task_prolog
Data type: Boolean
Default value: true
task_prolog
Data type: Optional[String[1]]
Default value: undef
task_prolog_source
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
auth_alt_types
Data type: Array
Default value: []
jwt_key_content
Data type: Optional[String]
Default value: undef
jwt_key_source
Data type: Optional[String]
Default value: undef
slurmrestd_listen_address
Data type: String
Default value: $facts['networking']['ip']
slurmrestd_disable_token_creation
Data type: Boolean
Default value: false
slurmrestd_user
Data type: String
Default value: 'daemon'
slurmrestd_user_group
Data type: String
Default value: 'daemon'
slurmrestd_service_ensure
Data type: Enum['running','stopped']
Default value: 'running'
slurmrestd_service_enable
Data type: Boolean
Default value: true
slurmrestd_service_limits
Data type: Hash
Default value: {}
slurmrestd_options
Data type: Optional[String[1]]
Default value: undef
slurmrestd_restart_on_failure
Data type: Boolean
Default value: true
cgroup_conf_template
Data type: String
Default value: 'slurm/cgroup/cgroup.conf.erb'
cgroup_conf_source
Data type: Optional[String]
Default value: undef
cgroup_mountpoint
Data type: Stdlib::Absolutepath
Default value: '/sys/fs/cgroup'
cgroup_plugin
Data type: String
Default value: 'autodetect'
cgroup_allowed_ram_space
Data type: Integer
Default value: 100
cgroup_allowed_swap_space
Data type: Integer
Default value: 0
cgroup_constrain_cores
Data type: Boolean
Default value: false
cgroup_constrain_devices
Data type: Boolean
Default value: false
cgroup_constrain_ram_space
Data type: Boolean
Default value: false
cgroup_constrain_swap_space
Data type: Boolean
Default value: false
cgroup_max_ram_percent
Data type: Integer
Default value: 100
cgroup_max_swap_percent
Data type: Integer
Default value: 100
cgroup_memory_swappiness
Data type: Optional[Integer[0,100]]
Default value: undef
cgroup_min_ram_space
Data type: Integer
Default value: 30
cgroup_signal_child_processes
Data type: Optional[Boolean]
Default value: undef
oci_conf_template
Data type: String
Default value: 'slurm/oci.conf.erb'
oci_conf_source
Data type: Optional[String]
Default value: undef
oci_container_path
Data type: Optional[String[1]]
Default value: undef
oci_create_env_file
Data type: String[1]
Default value: 'disabled'
oci_debug_flags
Data type: Optional[String[1]]
Default value: undef
oci_disable_cleanup
Data type: Boolean
Default value: false
oci_disable_hooks
Data type: Optional[String[1]]
Default value: undef
oci_env_exclude
Data type: Optional[String[1]]
Default value: undef
oci_mount_spool_dir
Data type: Stdlib::Absolutepath
Default value: '/var/run/slurm/'
oci_run_time_env_exclude
Data type: Optional[String[1]]
Default value: undef
oci_file_debug
Data type: Optional[String[1]]
Default value: undef
oci_ignore_file_config_json
Data type: Boolean
Default value: false
oci_run_time_create
Data type: Optional[String[1]]
Default value: undef
oci_run_time_delete
Data type: Optional[String[1]]
Default value: undef
oci_run_time_kill
Data type: Optional[String[1]]
Default value: undef
oci_run_time_query
Data type: Optional[String[1]]
Default value: undef
oci_run_time_run
Data type: Optional[String[1]]
Default value: undef
oci_run_time_start
Data type: Optional[String[1]]
Default value: undef
oci_srun_path
Data type: Optional[Stdlib::Absolutepath]
Default value: undef
oci_srun_args
Data type: Optional[String[1]]
Default value: undef
oci_std_io_debug
Data type: Optional[String[1]]
Default value: undef
oci_syslog_debug
Data type: Optional[String[1]]
Default value: undef
slurm_sh_template
Data type: String[1]
Default value: 'slurm/profile.d/slurm.sh.erb'
slurm_csh_template
Data type: String[1]
Default value: 'slurm/profile.d/slurm.csh.erb'
profile_d_env_vars
Data type: Hash
Default value: {}
slurmd_port
Data type: Stdlib::Port
Default value: 6818
slurmctld_port
Data type: Stdlib::Port
Default value: 6817
slurmdbd_port
Data type: Stdlib::Port
Default value: 6819
slurmrestd_port
Data type: Stdlib::Port
Default value: 6820
tuning_net_core_somaxconn
Data type: Variant[Boolean, Integer]
Default value: 1024
include_resources
Data type: Boolean
Default value: false
clusters
Data type: Hash
Default value: {}
qoses
Data type: Hash
Default value: {}
reservations
Data type: Hash
Default value: {}
accounts
Data type: Hash
Default value: {}
users
Data type: Hash
Default value: {}
licenses
Data type: Hash
Default value: {}
purge_qos
Data type: Boolean
Default value: false
slurmdbd_conn_validator_timeout
Data type: Integer
Default value: 30
Defined types
slurm::conf
Manage Slurm main configuration
Examples
Create /etc/slurm/slurm-ascend.conf
include slurm
$cluster_conf = {
'ClusterName' => 'ascend',
'SlurmctldHost' => 'ascend-slurm01.example.com',
}
slurm::conf { 'ascend':
configs => $slurm::slurm_conf + $cluster_conf,
}
Parameters
The following parameters are available in the slurm::conf
defined type:
configs
Data type: Hash
Hash of Slurm configs
Default value: {}
template
Data type: Optional[String]
Template to use to generate slurm.conf contents
Default value: undef
source
Data type: Optional[String]
Source of configuration instead of templated configs
Default value: undef
config_name
Data type: String
Name of configuration file
Default value: "slurm-${name}.conf"
slurm::down_node
Manage SLURM down node configuration
Parameters
The following parameters are available in the slurm::down_node
defined type:
down_nodes
Data type: String
Default value: $name
reason
Data type: Optional[String]
Default value: undef
state
Data type: Slurm::DownNodeState
Default value: 'UNKNOWN'
target
Data type: String
Default value: 'slurm.conf'
order
Data type: Variant[String[1], Integer]
Default value: '75'
slurm::gres
Manage SLURM GRES configuration
Examples
Add static GPU GRES
slurm::gres { 'gpu':
type => 'v100',
file => '/dev/nvidia0',
cores => '0,1',
}
Add nvml AutoDetect gres
slurm::gres { 'nvml':
auto_detect => 'nvml',
}
Parameters
The following parameters are available in the slurm::gres
defined type:
gres_name
Data type: String[1]
Default value: $name
type
Data type: Optional[String[1]]
Default value: undef
node_name
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
auto_detect
Data type: Optional[Enum['nvml','rsmi','oneapi','off']]
Default value: undef
count
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
cores
Data type: Optional[Variant[String[1], Integer, Array[Variant[String[1],Integer]]]]
Default value: undef
file
Data type: Optional[Stdlib::Absolutepath]
Default value: undef
flags
Data type: Optional[Enum['CountOnly']]
Default value: undef
links
Data type: Optional[Variant[Integer, Array[Integer]]]
Default value: undef
order
Data type: Variant[String[1], Integer]
Default value: '50'
slurm::job_container
Manage SLURM job_container.conf entry
Parameters
The following parameters are available in the slurm::job_container
defined type:
base_path
Data type: Stdlib::Absolutepath
job_container.conf BasePath
auto_base_path
Data type: Boolean
job_container.conf AutoBasePath
Default value: false
dirs
Data type: Optional[Array[Stdlib::Absolutepath]]
job_container.conf Dirs
Default value: undef
init_script
Data type: Optional[Stdlib::Absolutepath]
job_container.conf InitScript
Default value: undef
node_name
Data type: Optional[String]
job_container.conf NodeName
Default value: undef
shared
Data type: Optional[Boolean]
job_container.conf Shared
Default value: undef
order
Data type: Variant[String[1], Integer]
Order in job_container.conf
Default value: '50'
slurm::node
Manage SLURM node configuration
Parameters
The following parameters are available in the slurm::node
defined type:
node_name
node_hostname
node_addr
bcast_addr
boards
core_spec_count
cores_per_socket
cpu_bind
cpus
cpu_spec_list
features
gres
mem_spec_limit
port
real_memory
reason
sockets
sockets_per_board
state
threads_per_core
tmp_disk
weight
target
order
node_name
Data type: String[1]
Default value: $name
node_hostname
Data type: Optional[Stdlib::Host]
Default value: undef
node_addr
Data type: Optional[Stdlib::IP::Address]
Default value: undef
bcast_addr
Data type: Optional[Stdlib::IP::Address]
Default value: undef
boards
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
core_spec_count
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
cores_per_socket
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
cpu_bind
Data type: Optional[Slurm::CPUBind]
Default value: undef
cpus
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
cpu_spec_list
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
features
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
gres
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
mem_spec_limit
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
port
Data type: Optional[Stdlib::Port]
Default value: undef
real_memory
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
reason
Data type: Optional[String[1]]
Default value: undef
sockets
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
sockets_per_board
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
state
Data type: Slurm::NodeState
Default value: 'UNKNOWN'
threads_per_core
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
tmp_disk
Data type: Optional[Integer]
Default value: undef
weight
Data type: Optional[Integer]
Default value: undef
target
Data type: String[1]
Default value: 'slurm.conf'
order
Data type: Variant[String[1], Integer]
Default value: '90'
slurm::nodeset
Manage SLURM nodeset configuration
Parameters
The following parameters are available in the slurm::nodeset
defined type:
feature
Data type: Optional[String]
Default value: undef
nodes
Data type: Optional[String]
Default value: undef
node_set
Data type: String
Default value: $name
target
Data type: String[1]
Default value: 'slurm.conf'
order
Data type: Variant[String[1], Integer]
Default value: '40'
slurm::partition
Manage a SLURM partition configuration
Parameters
The following parameters are available in the slurm::partition
defined type:
partition_name
alloc_nodes
allow_accounts
allow_groups
allow_qos
alternate
cpu_bind
default
def_cpu_per_gpu
def_mem_per_cpu
def_mem_per_gpu
def_mem_per_node
deny_accounts
deny_qos
default_time
disable_root_jobs
exclusive_user
grace_time
hidden
lln
max_cpus_per_node
max_mem_per_cpu
max_mem_per_node
max_nodes
max_time
min_nodes
nodes
over_subscribe
over_time_limit
preempt_mode
priority_job_factor
priority_tier
qos
req_resv
resume_timeout
root_only
select_type_parameters
shared
state
suspend_time
suspend_timeout
tres_billing_weights
target
order
partition_name
Data type: String[1]
Default value: $name
alloc_nodes
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
allow_accounts
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
allow_groups
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
allow_qos
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
alternate
Data type: Optional[String[1]]
Default value: undef
cpu_bind
Data type: Optional[Slurm::CPUBind]
Default value: undef
default
Data type: Optional[Slurm::YesNo]
Default value: undef
def_cpu_per_gpu
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
def_mem_per_cpu
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
def_mem_per_gpu
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
def_mem_per_node
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
deny_accounts
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
deny_qos
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
default_time
Data type: Optional[String[1]]
Default value: undef
disable_root_jobs
Data type: Optional[Slurm::YesNo]
Default value: undef
exclusive_user
Data type: Optional[Slurm::YesNo]
Default value: undef
grace_time
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
hidden
Data type: Optional[Slurm::YesNo]
Default value: undef
lln
Data type: Optional[Slurm::YesNo]
Default value: undef
max_cpus_per_node
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
max_mem_per_cpu
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
max_mem_per_node
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
max_nodes
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
max_time
Data type: Optional[String[1]]
Default value: undef
min_nodes
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
nodes
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
over_subscribe
Data type: Optional[Enum['EXCLUSIVE','FORCE','YES','NO']]
Default value: undef
over_time_limit
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
preempt_mode
Data type: Optional[Slurm::PreemptMode]
Default value: undef
priority_job_factor
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
priority_tier
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
qos
Data type: Optional[String[1]]
Default value: undef
req_resv
Data type: Optional[Slurm::YesNo]
Default value: undef
resume_timeout
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
root_only
Data type: Optional[Slurm::YesNo]
Default value: undef
select_type_parameters
Data type: Optional[Slurm::SelectTypeParameters]
Default value: undef
shared
Data type: Optional[Enum['EXCLUSIVE','FORCE','YES','NO']]
Default value: undef
state
Data type: Slurm::PartitionState
Default value: 'UP'
suspend_time
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
suspend_timeout
Data type: Optional[Variant[String[1], Integer]]
Default value: undef
tres_billing_weights
Data type: Optional[Variant[String[1], Array[String[1]]]]
Default value: undef
target
Data type: String[1]
Default value: 'slurm.conf'
order
Data type: Variant[String[1], Integer]
Default value: '50'
slurm::spank
Manage SLURM SPANK plugin
Parameters
The following parameters are available in the slurm::spank
defined type:
ensure
Data type: Enum['present','absent']
Ensure state of SPANK plugin
Default value: 'present'
plugin
Data type: String
The shared library
Default value: "${name}.so"
arguments
Data type: Optional[Variant[Hash, Array, String]]
Arguments for the plugin
Default value: undef
required
Data type: Boolean
Is this plugin required?
Default value: false
manage_package
Data type: Boolean
Manage plugin package?
Default value: true
package_name
Data type: String
Plugin package name
Default value: "slurm-spank-${name}"
package_ensure
Data type: String
Plugin package ensure value
Default value: 'installed'
order
Data type: Variant[String[1], Integer]
Order in plugstack.conf
Default value: '50'
slurm::switch
Add switch to topology.conf
Examples
slurm::switch { 'switch1':
switches => 'switch[2-3],
}
slurm::switch { 'switch2':
nodes => 'c0[1-2]',
}
Parameters
The following parameters are available in the slurm::switch
defined type:
switch_name
Data type: String[1]
= $name, SwitchName value, see man page for topology.conf
Default value: $name
switches
Data type: Optional[String[1]]
= undef, Switches value, see man page for topology.conf
Default value: undef
nodes
Data type: Optional[String[1]]
= undef, Nodes value, see man page for topology.conf
Default value: undef
link_speed
Data type: Optional[String[1]]
= undef, LinkSpeed value, see man page for topology.conf
Default value: undef
order
Data type: Variant[String[1], Integer]
= '50', Order inside topology.conf
Default value: '50'
Data types
Slurm::CPUBind
Type for CPU bind settings
Alias of Enum['none', 'socket', 'ldom', 'core', 'thread', 'UNSET']
Slurm::DownNodeState
The Slurm::DownNodeState data type.
Alias of Enum['DOWN', 'DRAIN', 'FAIL', 'FAILING', 'UNKNOWN']
Slurm::NodeState
The Slurm::NodeState data type.
Alias of Variant[Enum['CLOUD','FUTURE'], Slurm::DownNodeState]
Slurm::PartitionState
The Slurm::PartitionState data type.
Alias of Enum['UP', 'DOWN', 'DRAIN', 'INACTIVE']
Slurm::PreemptMode
The Slurm::PreemptMode data type.
Alias of Enum['OFF', 'CANCEL', 'CHECKPOINT', 'GANG', 'REQUEUE', 'SUSPEND']
Slurm::SelectTypeParameters
The Slurm::SelectTypeParameters data type.
Alias of Enum['CR_Core', 'CR_Core_Memory', 'CR_Socket', 'CR_Socket_Memory']
Slurm::YesNo
The Slurm::YesNo data type.
Alias of Enum['YES', 'NO', 'UNSET']
Tasks
reconfig
Execute 'scontrol reconfig'
Supports noop? false
Parameters
scontrol
Data type: String[1]
Path to scontrol (default: 'scontrol', searches $PATH)
What are tasks?
Modules can contain tasks that take action outside of a desired state managed by Puppet. It’s perfect for troubleshooting or deploying one-off changes, distributing scripts to run across your infrastructure, or automating changes that need to happen in a particular order as part of an application deployment.
Tasks in this module release
Change log
All notable changes to this project will be documented in this file. The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
v4.0.1 (2024-04-16)
Fixed
- allow slurmdbd_storage_port to be an empty string (support socket connection to DB) #58 (jakerundall)
v4.0.0 (2024-03-25)
Changed
v3.2.0 (2024-02-18)
Added
v3.1.0 (2024-02-15)
Added
Fixed
v3.0.0 (2023-12-28)
Changed
- Drop Debian 10, Add EL9, Debian 11, Ubuntu 22.04 #50 (treydock)
- Support Slurm 23.02.x #48 (treydock)
- BREAKING: Many updates - read description #46 (treydock)
Added
v2.4.0 (2022-08-11)
Added
v2.3.0 (2022-08-09)
Added
v2.2.0 (2022-07-25)
Added
v2.1.0 (2022-06-30)
Added
- Bump default version to 21.08.8 #39 (treydock)
- add the ability to enable AMD GPUs #38 (v1peractual)
v2.0.2 (2022-01-20)
Fixed
v2.0.1 (2021-12-06)
Fixed
v2.0.0 (2021-12-06)
Changed
- Replace CentOS 8 support with Rocky 8 #33 (treydock)
- BREAKING: Support SLURM 21.08 and breaking changes (see description) #32 (treydock)
v1.0.0 (2021-10-06)
Changed
- Refactor how slurmrestd is configured to work with SLURM 20.11 #30 (treydock)
- Package version set using package_ensure, source uses version parameter #21 (treydock)
- Drop Puppet 5 support, add Puppet 7 #18 (treydock)
Added
- Support newer stdlib, logrotate and archive modules #31 (treydock)
- Updates to module dependencies #29 (treydock)
- Improve how source install is handled #28 (treydock)
- Support Ubuntu 18.04 and 20.04 #27 (treydock)
- Improved Debian 10 support - improve EL8 dependencies for source install #26 (treydock)
- Bump default version to 20.11.8 #25 (treydock)
- Add Debian 10 support #24 (martijndegouw)
- Allow configuring munge via this module #23 (martijndegouw)
- Allow spank plugin to be set to ensure => absent #20 (treydock)
- Add ability to manage job_container.conf #19 (treydock)
- Ensure all logging defaults to info #16 (treydock)
- Support SLURM 20.11 #15 (treydock)
Fixed
v0.7.0 (2020-12-02)
Added
- PDK update - Use Github Actions #14 (treydock)
- Improved support for slurmrestd as a daemon #12 (treydock)
v0.6.3 (2020-11-23)
Fixed
v0.6.2 (2020-08-21)
Fixed
v0.6.1 (2020-08-17)
Fixed
v0.6.0 (2020-08-11)
Added
v0.5.1 (2020-08-10)
Fixed
v0.5.0 (2020-08-10)
Added
v0.4.0 (2020-07-23)
Added
v0.3.0 (2020-07-16)
Added
v0.2.1 (2020-07-13)
Fixed
v0.2.0 (2020-07-13)
Added
v0.1.0 (2020-06-26)
Changed
0.0.2 (2014-10-14)
0.0.1 (2014-10-13)
* This Changelog was automatically generated by github_changelog_generator
Dependencies
- puppetlabs/stdlib (>=4.25.1 <9.0.0)
- puppetlabs/firewall (>=1.0.0 <6.0.0)
- puppetlabs/concat (>=1.0.0 <9.0.0)
- puppetlabs/mysql (>=2.3.0 <15.0.0)
- puppet/epel (>= 3.0.0 <5.0.0)
- puppet/augeasproviders_sysctl (>= 2.0.0 <4.0.0)
- puppet/logrotate (>= 3.4.0 <8.0.0)
- treydock/munge (>= 1.1.0 <6.0.0)
- puppet/systemd (>= 3.1.0 <7.0.0)
- puppet/archive (>= 1.0.0 <8.0.0)
- puppet/alternatives (>= 2.1.0 <6.0.0)
- treydock/slurm_providers (>= 0.6.0 <1.0.0)
Puppet-slurm - Puppet module for SLURM. Copyright (C) 2012 CERN Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.