From: Adam Warner
Subject: Linux 2.6 Memory Map
Date: 
Message-ID: <pan.2004.10.26.00.10.48.909100@consulting.net.nz>
Hi all,

From Linux 2.6.9 onwards the memory map has been changed. The relevant
changelog entry is quoted below. Linux Weekly News has a great summary
with helpful diagrams: <http://lwn.net/Articles/91829/>

Memory is now allocated downwards and the stack is fixed in size [but see
the comment below: "fall back to the bottom-up layout if the stack can
grow unlimited (if the stack ulimit has been set to RLIM_INFINITY)]

If you didn't know Linus' 2.6 kernel tree is not longer being developed as
a stable branch. This (<http://kerneltrap.org/node/view/3513>) is the best
summary of the policy:

   Andrew's vision, as expressed at the summit, is that the mainline
   kernel will be the fastest and most feature-rich kernel around, but
   not, necessarily, the most stable. Final stabilization is to be done
   by distributors (as happens now, really), but the distributors are
   expected to merge their patches quickly.

I briefly tried 2.6.9 so I know CMUCL and SBCL run. The question is
can we be confident that this change in the memory map doesn't break Lisp
implementations in insidious ways?

Note this comment in the LWN article:

   Any application which is sensitive to how virtual memory is laid out
   is buggy to begin with; according to Arjan van de Ven, the most common
   case is applications which store pointers in integer variables and then
   do the wrong thing when they see a "negative" value.

Regards,
Adam


From: <http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.9>

<·····@elte.hu>
	[PATCH] i386 virtual memory layout rework
	
	  Rework the i386 mm layout to allow applications to allocate more virtual
	  memory, and larger contiguous chunks.
	
	
	  - the patch is compatible with existing architectures that either make
	    use of HAVE_ARCH_UNMAPPED_AREA or use the default mmap() allocator - there
	    is no change in behavior.
	
	  - 64-bit architectures can use the same mechanism to clean up 32-bit
	    compatibility layouts: by defining HAVE_ARCH_PICK_MMAP_LAYOUT and
	    providing a arch_pick_mmap_layout() function - which can then decide
	    between various mmap() layout functions.
	
	  - I also introduced a new personality bit (ADDR_COMPAT_LAYOUT) to signal
	    older binaries that dont have PT_GNU_STACK.  x86 uses this to revert back
	    to the stock layout.  I also changed x86 to not clear the personality bits
	    upon exec(), like x86-64 already does.
	
	  - once every architecture that uses HAVE_ARCH_UNMAPPED_AREA has defined
	    its arch_pick_mmap_layout() function, we can get rid of
	    HAVE_ARCH_UNMAPPED_AREA altogether, as a final cleanup.
	
	  the new layout generation function (__get_unmapped_area()) got significant
	  testing in FC1/2, so i'm pretty confident it's robust.
	
	
	  Compiles & boots fine on an 'old' and on a 'new' x86 distro as well.
	
	  The two known breakages were:
	
	     http://www.redhatconfig.com/msg/67248.html
	
	     [ 'cyzload' third-party utility broke. ]
	
	     http://www.zipworld.com/au/~akpm/dde.tar.gz
	
	     [ your editor broke :-) ]
	
	  both were caused by application bugs that did:
	
		int ret = malloc();
	
		if (ret <= 0)
			failure;
	
	  such bugs are easy to spot if they happen, and if it happens it's possible
	  to work it around immediately without having to change the binary, via the
	  setarch patch.
	
	  No other application has been found to be affected, and this particular
	  change got pretty wide coverage already over RHEL3 and exec-shield, it's in
	  use for more than a year.
	
	
	  The setarch utility can be used to trigger the compatibility layout on
	  x86, the following version has been patched to take the `-L' option:
	
	 	http://people.redhat.com/mingo/flexible-mmap/setarch-1.4-2.tar.gz
	
	  "setarch -L i386 <command>" will run the command with the old layout.
	
	From: Hugh Dickins <····@veritas.com>
	
	  The problem is in the flexible mmap patch: arch_get_unmapped_area_topdown
	  is liable to give your mmap vm_start above TASK_SIZE with vm_end wrapped;
	  which is confusing, and ends up as that BUG_ON(mm->map_count).
	
	  The patch below stops that behaviour, but it's not the full solution:
	  wilson_mmap_test -s 1000 then simply cannot allocate memory for the large
	  mmap, whereas it works fine non-top-down.
	
	  I think it's wrong to interpret a large or rlim_infinite stack rlimit as
	  an inviolable request to reserve that much for the stack: it makes much less
	  VM available than bottom up, not what was intended.  Perhaps top down should
	  go bottom up (instead of belly up) when it fails - but I'd probably better
	  leave that to Ingo.
	
	  Or perhaps the default should place stack below text (as WLI suggested and
	  ELF intended, with its text defaulting to 0x08048000, small progs sharing
	  page table between stack and text and data); with a further personality for
	  those needing bigger stack.
	
	From: Ingo Molnar <·····@elte.hu>
	
	  - fall back to the bottom-up layout if the stack can grow unlimited (if
	  the stack ulimit has been set to RLIM_INFINITY)
	
	  - try the bottom-up allocator if the top-down allocator fails - this can
	  utilize the hole between the true bottom of the stack and its ulimit, as a
	  last-resort effort.
	
	Signed-off-by: Ingo Molnar <·····@elte.hu>
	Signed-off-by: Andrew Morton <····@osdl.org>
	Signed-off-by: Linus Torvalds <········@osdl.org>