mbind — Set memory policy for a memory range
#include <numaif.h>
int
mbind( |
void * | start, |
| unsigned long | len, | |
| int | policy, | |
| unsigned long * | nodemask, | |
| unsigned long | maxnode, | |
| unsigned | flags); |
cc ... −lnuma
mbind() sets the NUMA memory
policy for the memory
range starting with start and continuing for
len bytes. The memory
of a NUMA machine is divided into multiple nodes. The memory
policy defines in which node memory is allocated.
mbind() only has an effect for
new allocations; if the pages inside the range have been
already touched before setting the policy, then the policy
has no effect.
Available policies are MPOL_DEFAULT, MPOL_BIND, MPOL_INTERLEAVE, and MPOL_PREFERRED. All policies except
MPOL_DEFAULT require the caller
to specify the nodes to which the policy applies in the
nodemask parameter.
nodemask is a bitmask
of nodes containing up to maxnode bits. The actual number
of bytes transferred via this argument is rounded up to the
next multiple of sizeof(unsigned
long), but the kernel will only use bits up to
maxnode. A NULL
argument means an empty set of nodes.
The MPOL_DEFAULT policy is
the default and means to use the underlying process policy
(which can be modified with set_mempolicy(2)). Unless
the process policy has been changed this means to allocate
memory on the node of the CPU that triggered the allocation.
nodemask should be
specified as NULL.
The MPOL_BIND policy is a
strict policy that restricts memory allocation to the nodes
specified in nodemask. There won't be
allocations on other nodes.
MPOL_INTERLEAVE interleaves
allocations to the nodes specified in nodemask. This optimizes for
bandwidth instead of latency. To be effective the memory area
should be fairly large, at least 1MB or bigger.
MPOL_PREFERRED sets the
preferred node for allocation. The kernel will try to
allocate in this node first and fall back to other nodes if
the preferred nodes is low on free memory. Only the first
node in the nodemask
is used. If no node is set in the mask, then the memory is
allocated on the node of the CPU that triggered the
allocation allocation).
If MPOL_MF_STRICT is passed
in flags and
policy is not
MPOL_DEFAULT, then the call
will fail with the error EIO
if the existing pages in the mapping don't follow the policy.
In 2.6.16 or later the kernel will also try to move pages to
the requested node with this flag.
If MPOL_MF_MOVE is passed in
flags, then an
attempt will be made to move all the pages in the mapping so
that they follow the policy. Pages that are shared with other
processes are not moved. If MPOL_MF_STRICT is also specified, then the
call will fail with the error EIO if some pages could not be moved.
If MPOL_MF_MOVE_ALL is
passed in flags, then
all pages in the mapping will be moved regardless of whether
other processes use the pages. The calling process must be
privileged (CAP_SYS_NICE) to
use this flag. If MPOL_MF_STRICT is also specified, then the
call will fail with the error EIO if some pages could not be moved.
On success, mbind() returns
0; on error, −1 is returned and errno is set to indicate the error.
There was a unmapped hole in the specified memory range or a passed pointer was not valid.
An invalid value was specified for flags or mode; or start + len was less than
start; or
policy was
MPOL_DEFAULT and
nodemask
pointed to a non-empty set; or policy was MPOL_BIND or MPOL_INTERLEAVE and nodemask pointed to an
empty set,
System out of memory.
MPOL_MF_STRICT was
specified and an existing page was already on a node
that does not follow the policy.
NUMA policy is not supported on file mappings.
MPOL_MF_STRICT is ignored on
huge page mappings right now.
It is unfortunate that the same flag, MPOL_DEFAULT, has different effects for
mbind(2) and set_mempolicy(2). To select
"allocation on the node of the CPU that triggered the
allocation" (like set_mempolicy(2)
MPOL_DEFAULT) when calling
mbind(), specify a policy of MPOL_PREFERRED with an empty nodemask.
The mbind(), get_mempolicy(2), and
set_mempolicy(2) system
calls were added to the Linux kernel with version 2.6.7. They
are only available on kernels compiled with CONFIG_NUMA.
Support for huge page policy was added with 2.6.16. For interleave policy to be effective on huge page mappings the policied memory needs to be tens of megabytes or larger.
MPOL_MF_MOVE and
MPOL_MF_MOVE_ALL are only
available on Linux 2.6.16 and later.
These system calls should not be used directly. Instead,
the higher level interface provided by the numa(3) functions in the
numactl package is
recommended. The numactl package is available
at ftp://ftp.suse.com/pub/people/ak/numa/.
You can link with −lnuma
to get system call definitions. libnuma is available in the
numactl package.
This package also has the numaif.h header.
numa(3), numactl(8), set_mempolicy(2), get_mempolicy(2), mmap(2)
|
|