TIPS/Linux/kernel/OutOfMemory/overcommit-accounting

Last-modified: 2006-12-23 (土) 17:10:17
The Linux kernel supports the following overcommit handling modes
0      -       Heuristic overcommit handling. Obvious overcommits of
               address space are refused. Used for a typical system. It
               ensures a seriously wild allocation fails while allowing
               overcommit to reduce swap usage.  root is allowed to
               allocate slighly more memory in this mode. This is the
               default.
1      -       Always overcommit. Appropriate for some scientific
               applications.
2      -       Don't overcommit. The total address space commit
               for the system is not permitted to exceed swap + a
               configurable percentage (default is 50) of physical RAM.
               Depending on the percentage you use, in most situations
               this means a process will not be killed while accessing
               pages but will receive errors on memory allocation as
               appropriate.
The overcommit policy is set via the sysctl `vm.overcommit_memory'.
The overcommit percentage is set via `vm.overcommit_ratio'.
The current overcommit limit and amount committed are viewable in
/proc/meminfo as CommitLimit and Committed_AS respectively.
Gotchas
-------
The C language stack growth does an implicit mremap. If you want absolute
guarantees and run close to the edge you MUST mmap your stack for the
largest size you think you will need. For typical stack usage this does
not matter much but it's a corner case if you really really care
In mode 2 the MAP_NORESERVE flag is ignored.
How It Works
------------
The overcommit is based on the following rules
For a file backed map
       SHARED or READ-only     -       0 cost (the file is the map not swap)
       PRIVATE WRITABLE        -       size of mapping per instance
For an anonymous or /dev/zero map
       SHARED                  -       size of mapping
       PRIVATE READ-only       -       0 cost (but of little use)
       PRIVATE WRITABLE        -       size of mapping per instance
Additional accounting
       Pages made writable copies by mmap
       shmfs memory drawn from the same pool
Status
------
o      We account mmap memory mappings
o      We account mprotect changes in commit
o      We account mremap changes in size
o      We account brk
o      We account munmap
o      We report the commit status in /proc
o      Account and check on fork
o      Review stack handling/building on exec
o      SHMfs accounting
o      Implement actual limit enforcement
To Do
-----
o      Account ptrace pages (this is hard)