Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Memory and Thread Placement Optimization Developer's Guide
Previous Next

Locality Groups and Thread and Memory Placement

This section discusses the APIs used to discover and affect thread and memory placement with respect to lgroups.

  • The lgrp_home(3LGRP) function is used to discover thread placement.

  • The meminfo(2) system call is used to discover memory placement.

  • The MADV_ACCESS flags to the madvise(3C) function are used to affect memory allocation among lgroups.

  • The lgrp_affinity_set(3LGRP) function can affect thread and memory placement by setting a thread's affinity for a given lgroup.

  • The affinities of an lgroup may specify an order of preference for lgroups from which to allocate resources.

  • The kernel needs information about the likely pattern of an application's memory use in order to allocate memory resources efficiently.

  • The madvise() function and its shared object analogue madv.so.1 provide this information to the kernel.

  • A running process can gather memory usage information about itself by using the meminfo() system call.

Using lgrp_home()

The lgrp_home() function returns the home lgroup for the specified process or thread.

#include <sys/lgrp_user.h>
lgrp_id_t lgrp_home(idtype_t idtype, id_t id);

The lgrp_home() function returns EINVAL when the ID type is not valid. The lgrp_home() function returns EPERM when the effective user of the calling process is not the superuser and the real or effective user ID of the calling process does not match the real or effective user ID of one of the threads. The lgrp_home() function returns ESRCH when the specified process or thread is not found.

Using madvise()

The madvise() function advises the kernel that a region of user virtual memory in the range starting at the address specified in addr and with length equal to the value of the len parameter is expected to follow a particular pattern of use. The kernel uses this information to optimize the procedure for manipulating and maintaining the resources associated with the specified range. Use of the madvise() function can increase system performance when used by programs that have specific knowledge of their access patterns over memory.

#include <sys/types.h>
#include <sys/mman.h>
int madvise(caddr_t addr, size_t len, int advice);

The madvise() function provides the following flags to affect how a thread's memory is allocated among lgroups:

MADV_ACCESS_DEFAULT

This flag resets the kernel's expected access pattern for the specified range to the default.

MADV_ACCESS_LWP

This flag advises the kernel that the next LWP to touch the specified address range is the LWP that will access that range the most. The kernel allocates the memory and other resources for this range and the LWP accordingly.

MADV_ACCESS_MANY

This flag advises the kernel that many processes or LWPs will access the specified address range randomly across the system. The kernel allocates the memory and other resources for this range accordingly.

The madvise() function can return the following values:

EAGAIN

Some or all of the mappings in the specified address range, from addr to addr+len, are locked for I/O.

EINVAL

The value of the addr parameter is not a multiple of the page size as returned by sysconf(3C), the length of the specified address range is less than or equal to zero, or the advice is invalid.

EIO

An I/O error occurs while reading from or writing to the file system.

ENOMEM

Addresses in the specified address range are outside the valid range for the address space of a process or the addresses in the specified address range specify one or more pages that are not mapped.

ESTALE

The NFS file handle is stale.

Using madv.so.1

The madv.so.1 shared object enables the selective configuration of virtual memory advice for launched processes and their descendants. To use the shared object, the following string must be present in the environment:

LD_PRELOAD=$LD_PRELOAD:madv.so.1

The madv.so.1 shared object applies memory advice as specified by the value of the MADV environment variable. The MADV environment variable specifies the virtual memory advice to use for all heap, shared memory, and mmap regions in the process address space. This advice is applied to all created processes. The following values of the MADV environment variable affect resource allocation among lgroups:

access_default

This value resets the kernel's expected access pattern to the default.

access_lwp

This value advises the kernel that the next LWP to touch an address range is the LWP that will access that range the most. The kernel allocates the memory and other resources for this range and the LWP accordingly.

access_many

This value advises the kernel that many processes or LWPs will access memory randomly across the system. The kernel allocates the memory and other resources accordingly.

The value of the MADVCFGFILE environment variable is the name of a text file that contains one or more memory advice configuration entries in the form exec-name:advice-opts.

The value of exec-name is the name of an application or executable. The value of exec-name can be a full pathname, a base name, or a pattern string.

The value of advice-opts is of the form region=advice. The values of advice are the same as the values for the MADV environment variable. Replace region with any of the following legal values:

madv

Advice applies to all heap, shared memory, and mmap(2) regions in the process address space.

heap

The heap is defined to be the brk(2) area. Advice applies to the existing heap and to any additional heap memory allocated in the future.

shm

Advice applies to shared memory segments. See shmat(2) for more information on shared memory operations.

ism

Advice applies to shared memory segments that are using the SHM_SHARE_MMU flag. The ism option takes precedence over shm.

dsm

Advice applies to shared memory segments that are using the SHM_PAGEABLE flag. The dsm option takes precedence over shm.

mapshared

Advice applies to mappings established by the mmap() system call using the MAP_SHARED flag.

mapprivate

Advice applies to mappings established by the mmap() system call using the MAP_PRIVATE flag.

mapanon

Advice applies to mappings established by the mmap() system call using the MAP_ANON flag. The mapanon option takes precedence when multiple options apply.

The value of the MADVERRFILE environment variable is the name of the path where error messages are logged. In the absence of a MADVERRFILE location, the madv.so.1 shared object logs errors by using syslog(3C) with a LOG_ERR as the severity level and LOG_USER as the facility descriptor.

Memory advice is inherited. A child process has the same advice as its parent. The advice is set back to the system default advice after a call to exec(2) unless a different level of advice is configured using the madv.so.1 shared object. Advice is only applied to mmap() regions explicitly created by the user program. Regions established by the run-time linker or by system libraries that make direct system calls are not affected.

madv.so.1 Usage Examples

The following examples illustrate specific aspects of the madv.so.1 shared object.

Example 1-2 Setting Advice for a Set of Applications

This configuration applies advice to all ISM segments for applications with exec names that begin with foo.

$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADVCFGFILE
$ cat $MADVCFGFILE
        foo*:ism=access_lwp
Example 1-3 Excluding a Set of Applications From Advice

This configuration sets advice for all applications with the exception of ls.

$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADV=access_many
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADV MADVCFGFILE
$ cat $MADVCFGFILE
        ls:
Example 1-4 Pattern Matching in a Configuration File

Because the configuration specified in MADVCFGFILE takes precedence over the value set in MADV, specifying * as the exec-name of the last configuration entry is equivalent to setting MADV. This example is equivalent to the previous example.

$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADVCFGFILE
$ cat $MADVCFGFILE
        ls:
        *:madv=access_many
Example 1-5 Advice for Multiple Regions

This configuration applies one type of advice for mmap() regions and different advice for heap and shared memory regions for applications whose exec() names begin with foo.

$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADVCFGFILE
$ cat $MADVCFGFILE
        foo*:madv=access_many,heap=sequential,shm=access_lwp

Using meminfo()

The meminfo() function gives the calling process information about the virtual memory and physical memory that the system has allocated to that process.

#include <sys/types.h>
#include <sys/mman.h>
int meminfo(const uint64_t inaddr[], int addr_count,
    const uint_t info_req[], int info_count, uint64_t outdata[],
    uint_t validity[]);

The meminfo() function can return the following types of information:

MEMINFO_VPHYSICAL

The physical memory address corresponding to the given virtual address

MEMINFO_VLGRP

The lgroup to which the physical page corresponding to the given virtual address belongs

MEMINFO_VPAGESIZE

The size of the physical page corresponding to the given virtual address

MEMINFO_VREPLCNT

The number of replicated physical pages that correspond to the given virtual address

MEMINFO_VREPL|n

The nth physical replica of the given virtual address

MEMINFO_VREPL_LGRP|n

The lgroup to which the nth physical replica of the given virtual address belongs

MEMINFO_PLGRP

The lgroup to which the given physical address belongs

The meminfo() function takes the following parameters:

inaddr

An array of input addresses.

addr_count

The number of addresses that are passed to meminfo().

info_req

An array that lists the types of information that are being requested.

info_count

The number of pieces of information that are requested for each address in the inaddr array.

outdata

An array where the meminfo() function places the results. The array's size is equal to the product of the values of the info_req and addr_count parameters.

validity

An array of size equal to the value of the addr_count parameter. The validity array contains bitwise result codes. The 0th bit of the result code evaluates the validity of the corresponding input address. Each successive bit in the result code evaluates the validity of the response to the members of the info_req array in turn.

The meminfo() function returns EFAULT when the area of memory to which the outdata or validity arrays point cannot be written to. The meminfo() function returns EFAULT when the area of memory to which the info_req or inaddr arrays point cannot be read from. The meminfo() function returns EINVAL when the value of info_count exceeds 31 or is less than 1. The meminfo() function returns EINVAL when the value of addr_count is less than zero.

Example 1-6 Use of meminfo() to Print Out Physical Pages and Page Sizes Corresponding to a Set of Virtual Addresses
void
print_info(void **addrvec, int how_many)
{
        static const int info[] = {
                MEMINFO_VPHYSICAL,
                MEMINFO_VPAGESIZE};
        uint64_t * inaddr = alloca(sizeof(uint64_t) * how_many);
        uint64_t * outdata = alloca(sizeof(uint64_t) * how_many * 2;
        uint_t * validity = alloca(sizeof(uint_t) * how_many);

        int i;

        for (i = 0; i < how_many; i++)
                inaddr[i] = (uint64_t *)addr[i];

        if (meminfo(inaddr, how_many,  info,
                    sizeof (info)/ sizeof(info[0]),
                    outdata, validity) < 0)
                ...

        for (i = 0; i < how_many; i++) {
                if (validity[i] & 1 == 0)
                        printf("address 0x%llx not part of address
                                        space\n",
                                inaddr[i]);

                else if (validity[i] & 2 == 0)
                        printf("address 0x%llx has no physical page
                                        associated with it\n",
                                inaddr[i]);

                else {
                        char buff[80];
                        if (validity[i] & 4 == 0)
                                strcpy(buff, "<Unknown>");
                        else
                                sprintf(buff, "%lld", outdata[i * 2 +
                                                1]);
                        printf("address 0x%llx is backed by physical
                                        page 0x%llx of size %s\n",
                                        inaddr[i], outdata[i * 2], buff);
                }
        }
}

Locality Group Affinity

The kernel assigns a thread to a locality group when the lightweight process (LWP) for that thread is created. That lgroup is called the thread's home lgroup. The kernel runs the thread on the CPUs in the thread's home lgroup and allocates memory from that lgroup whenever possible. If resources from the home lgroup are unavailable, the kernel allocates resources from other lgroups. When a thread has affinity for more than one lgroup, the operating system allocates resources from lgroups chosen in order of affinity strength. Lgroups can have one of three distinct affinity levels:

  1. LGRP_AFF_STRONG indicates strong affinity. If this lgroup is the thread's home lgroup, the operating system avoids rehoming the thread to another lgroup if possible. Events such as dynamic reconfiguration, processor, offlining, processor binding, and processor set binding and manipulation might still result in thread rehoming.

  2. LGRP_AFF_WEAK indicates weak affinity. If this lgroup is the thread's home lgroup, the operating system rehomes the thread if necessary for load balancing purposes.

  3. LGRP_AFF_NONE indicates no affinity. If a thread has no affinity to any lgroup, the operating system assigns a home lgroup to the thread .

The operating system uses lgroup affinities as advice when allocating resources for a given thread. The advice is factored in with the other system constraints. Processor binding and processor sets do not change lgroup affinities, but might restrict the lgroups on which a thread can run.

Using lgrp_affinity_get()

The lgrp_affinity_get(3LGRP) function returns the affinity that a LWP has for a given lgroup.

#include <sys/lgrp_user.h>
lgrp_affinity_t lgrp_affinity_get(idtype_t idtype, id_t id, lgrp_id_t lgrp);

The idtype and id arguments specify the LWP that the lgrp_affinity_get() function examines. If the value of idtype is P_PID, the lgrp_affinity_get() function gets the lgroup affinity for one of the LWPs in the process whose process ID matches the value of the id argument. If the value of idtype is P_LWPID, the lgrp_affinity_get() function gets the lgroup affinity for the LWP of the current process whose LWP ID matches the value of the id argument. If the value of idtype is P_MYID, the lgrp_affinity_get() function gets the lgroup affinity for the current LWP.

The lgrp_affinity_get() function returns EINVAL when the given lgroup or ID type is not valid. The lgrp_affinity_get() function returns EPERM when the effective user of the calling process is not the superuser and the ID of the calling process does not match the real or effective user ID of one of the LWPs. The lgrp_affinity_get() function returns ESRCH when a given lgroup or LWP is not found.

Using lgrp_affinity_set()

The lgrp_affinity_set(3LGRP) function sets the affinity that a LWP or set of LWPs have for a given lgroup.

#include <sys/lgrp_user.h>
int lgrp_affinity_set(idtype_t idtype, id_t id, lgrp_id_t lgrp,
                      lgrp_affinity_t affinity);

The idtype and id arguments specify the LWP or set of LWPs the lgrp_affinity_set() function examines. If the value of idtype is P_PID, the lgrp_affinity_set() function sets the lgroup affinity for all of the LWPs in the process whose process ID matches the value of the id argument to the affinity level specified in the affinity argument. If the value of idtype is P_LWPID, the lgrp_affinity_set() function sets the lgroup affinity for the LWP of the current process whose LWP ID matches the value of the id argument to the affinity level specified in the affinity argument. If the value of idtype is P_MYID, the lgrp_affinity_set() function sets the lgroup affinity for the current LWP or process to the affinity level specified in the affinity argument.

The lgrp_affinity_set() function returns EINVAL when the given lgroup, affinity, or ID type is not valid. The lgrp_affinity_set() function returns EPERM when the effective user of the calling process is not the superuser and the ID of the calling process does not match the real or effective user ID of one of the LWPs. The lgrp_affinity_set() function returns ESRCH when a given lgroup or LWP is not found.

Previous Next

 
 
  Published under the terms fo the Public Documentation License Version 1.01. Design by Interspire