Hooney의 스토리텔링

2011. 10. 18. 13:21

영역(Zone) OS이야기2011. 10. 18. 13:21

하드웨어적인 한계 때문에 커널은 모든 페이지를 동등하게 다루지 못한다. 즉 어떤 페이지는 메모리 상의 물리적 주소 때문에 특정 태스크에서 사용되지 못한다. 이런 제약 때문에 커널은 페이지를 서로 다른 영역으로 구분하여 관리한다. 커널은 영역을 이용하ㅕ 비슷한 속성을 가지는 페이지를 모아 둔다. 특히 리눅스는 메모리 주소 체계와 관련된 다음의 두 가지 하드웨어적 한계를 극복해야 한다.

- 어떤 하드웨어 장치는 특정 메모리 주소에 대해서만 DMA(direct memory access)를 수행할 수 있다.
- 어떤 아키텍쳐는 가상주소보다 더 많은 물리적 주소를 사용할 수 있다. 그 결과 어떤 메모리는 커널 주소 공간에 영구적으로 매핑되지 않게 된다.

이러한 제한 때문에 리눅스에서는 세개의 영역을 둔다
ZONE_DMA : 이 영역은 DMA가 가능한 페이지를 포함한다.
ZONE_NORMAL : 이 영역은 일반적이고 정규적으로 매핑되는 페이지를 포함한다.
ZONE_HIGHMEM:이 영역은 "상위 메모리"를 포함한다. 상위 메모리는 커널 주소 공간으로 영구적으로 매핑되지 않는다.(그렇다면, 유저 주소 공간으로 매핑된다는 말인가?)

정리!
어떤 아키텍쳐는 가상주소보다 더 많은 물리적 주소를 사용할 수 있다 <-- 이 대목을 주목하자.
즉, 물리적 주소로 할당된 공간 > 논리적주소로 할당된 공간이라는 말씀.
그러므로, 커널은 논리적 주소로 물리적 주소로 매핑할 수 없다. 따라서 커널은 struct page로만 접근을 할 수 있다는 말씀.
실제주소를 얻으려면, kmap()을 사용하여 상위 메모리를 커널의 논리적 주소로 매핑해야한다.

이러한 영역들은 <linux/mmzone.h>에 정의되어 있다.
http://lxr.linux.no/linux+v2.6.39.4/include/linux/mmzone.h

메모리 영역의 실제적인 활용과 레이아웃은 아키텍쳐마다 다르다. 예를들면, 어떤 아키텍쳐에서는 모든 메모리 주소에 DMA를 사용할 수 있다. 이러한 아키텍쳐에서는 ZONE_DMA영역이 비게되며, 따라서 용도에 관계없이 ZONE_NORMAL를 할당하여 사용할 수 있다.
반면 x86아키텍쳐에서 ISA 디바이스는 모든 32비트 주소 공간으로 DMA를 할 수가 없는데(할 수 있다면 4GB), 그 이유는 ISA 장치(24비트까지만 접근가능) 가 물리적 메모리의 처음 16MB만을 접근할 수 있기 때문이다. 따라서 x86에서는 ZONE_DMA는 0~16MB 범위의 모든 메모리로 구성된다.

ZONE_HIGHMEM 역시 동일한 방식으로 동작한다. 즉 아키텍처에 따라 직접 매핑 가능한 범위가 일정하지 않다. x86에서는 ZONE_HIGHMEM은 896MB 이상의 모든 물리적 메모리이다. 어떤 아키텍처에서는 모든 메모리가 직접 매핑되므로 ZONE_HIGHMEM 영역이 비어있는 경우도 있다.
ZONE_HIGHMEM에 포함되는 메모리를 상위 메모리, 시스템의 나머지 메모리를 하위 메모리라고 부른다.
ZONE_NORAML은 보통 앞의 두 영역을 제외한 나머지 영역을 가리킨다.
예를들면 x86에서 ZONE_NORMAL은 16MB에서 896MB까지의 모든 물리적 메모리이다.

영역	설명	물리적 메모리
ZONE_DMA	DMA 설정가능한 페이지	x < 16MB
ZONE_NORMAL	일반적인 주소 지정 가능 페이지	16MB ~ 896MB
ZONE_HIGHMEM	동적으로 매핑되는 페이지	x > 896MB

리눅스 각 영역별로 그에 해당하는 페이지를 한곳에 모아ㅏ두어(pooling) 필요시 할당될 수 있도록 한다. 즉, 커널이 ZONE_DMA풀을 가지고 있다면 DMA에 필요한 메모리 할당을 할수 있다는 식으로 말이다(쓰레드 풀처럼). 그러한 메모리가 필요한 경우,ㅡ 커널은 ZONE_DMA에서 필요한 만큼의 페이지를 끌어다 쓰면 된다. 각 영역은 아무런 물리적 연관성이 없다는 것을 알자. 즉 이것은 커널이 페이지를 관리하기 위해 사용하는 논리적인 배치일 뿐이다.

어떤 경우에는 할당을 할 때 반드시 특정 영역의 페리지를 필요로 하지만, 항상 그런것은 아니다. 즉, DMA가 가능한 메모리를 할당하려면 반드시 ZONE_DMA영역에 있는 메모리를 사용해야 하지만, 일반적인 할당은 ZONE_DMA나 ZONE_NORAML에 있는 페이지를 모두 사용할 수 있다. 물론 커널은 일반적인 할당에 일반 영역을 사용하여 얼마 되지 않는 ZONE_DMA의 페이지를 절약하려 할 것이다. 하지만 어쩔 수 없는 경우라면 커널은 사용 가능한 모든 영역을 사용하게 된다.

각 영역은 struct zone으로 표현되며, 이 구조체는 <linux/mmzone.h>에 정의되어 있따.

 283struct zone {
 284        /* Fields commonly accessed by the page allocator */
 285
 286        /* zone watermarks, access with *_wmark_pages(zone) macros */
 287        unsigned long watermark[NR_WMARK];
 288
 289        /*
 290         * When free pages are below this point, additional steps are taken
 291         * when reading the number of free pages to avoid per-cpu counter
 292         * drift allowing watermarks to be breached
 293         */
 294        unsigned long percpu_drift_mark;
 295
 296        /*
 297         * We don't know if the memory that we're going to allocate will be freeable
 298         * or/and it will be released eventually, so to avoid totally wasting several
 299         * GB of ram we must reserve some of the lower zone memory (otherwise we risk
 300         * to run OOM on the lower zones despite there's tons of freeable ram
 301         * on the higher zones). This array is recalculated at runtime if the
 302         * sysctl_lowmem_reserve_ratio sysctl changes.
 303         */
 304        unsigned long           lowmem_reserve[MAX_NR_ZONES];
 305
 306#ifdef CONFIG_NUMA
 307        int node;
 308        /*
 309         * zone reclaim becomes active if more unmapped pages exist.
 310         */
 311        unsigned long           min_unmapped_pages;
 312        unsigned long           min_slab_pages;
 313#endif
 314        struct per_cpu_pageset __percpu *pageset;
 315        /*
 316         * free areas of different sizes
 317         */
 318        spinlock_t              lock;
 319        int                     all_unreclaimable; /* All pages pinned */
 320#ifdef CONFIG_MEMORY_HOTPLUG
 321        /* see spanned/present_pages for more description */
 322        seqlock_t               span_seqlock;
 323#endif
 324        struct free_area        free_area[MAX_ORDER];
 325
 326#ifndef CONFIG_SPARSEMEM
 327        /*
 328         * Flags for a pageblock_nr_pages block. See pageblock-flags.h.
 329         * In SPARSEMEM, this map is stored in struct mem_section
 330         */
 331        unsigned long           *pageblock_flags;
 332#endif /* CONFIG_SPARSEMEM */
 333
 334#ifdef CONFIG_COMPACTION
 335        /*
 336         * On compaction failure, 1<<compact_defer_shift compactions
 337         * are skipped before trying again. The number attempted since
 338         * last failure is tracked with compact_considered.
 339         */
 340        unsigned int            compact_considered;
 341        unsigned int            compact_defer_shift;
 342#endif
 343
 344        ZONE_PADDING(_pad1_)
 345
 346        /* Fields commonly accessed by the page reclaim scanner */
 347        spinlock_t              lru_lock;       
 348        struct zone_lru {
 349                struct list_head list;
 350        } lru[NR_LRU_LISTS];
 351
 352        struct zone_reclaim_stat reclaim_stat;
 353
 354        unsigned long           pages_scanned;     /* since last reclaim */
 355        unsigned long           flags;             /* zone flags, see below */
 356
 357        /* Zone statistics */
 358        atomic_long_t           vm_stat[NR_VM_ZONE_STAT_ITEMS];
 359
 360        /*
 361         * The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on
 362         * this zone's LRU.  Maintained by the pageout code.
 363         */
 364        unsigned int inactive_ratio;
 365
 366
 367        ZONE_PADDING(_pad2_)
 368        /* Rarely used or read-mostly fields */
 369
 370        /*
 371         * wait_table           -- the array holding the hash table
 372         * wait_table_hash_nr_entries   -- the size of the hash table array
 373         * wait_table_bits      -- wait_table_size == (1 << wait_table_bits)
 374         *
 375         * The purpose of all these is to keep track of the people
 376         * waiting for a page to become available and make them
 377         * runnable again when possible. The trouble is that this
 378         * consumes a lot of space, especially when so few things
 379         * wait on pages at a given time. So instead of using
 380         * per-page waitqueues, we use a waitqueue hash table.
 381         *
 382         * The bucket discipline is to sleep on the same queue when
 383         * colliding and wake all in that wait queue when removing.
 384         * When something wakes, it must check to be sure its page is
 385         * truly available, a la thundering herd. The cost of a
 386         * collision is great, but given the expected load of the
 387         * table, they should be so rare as to be outweighed by the
 388         * benefits from the saved space.
 389         *
 390         * __wait_on_page_locked() and unlock_page() in mm/filemap.c, are the
 391         * primary users of these fields, and in mm/page_alloc.c
 392         * free_area_init_core() performs the initialization of them.
 393         */
 394        wait_queue_head_t       * wait_table;
 395        unsigned long           wait_table_hash_nr_entries;
 396        unsigned long           wait_table_bits;
 397
 398        /*
 399         * Discontig memory support fields.
 400         */
 401        struct pglist_data      *zone_pgdat;
 402        /* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */
 403        unsigned long           zone_start_pfn;
 404
 405        /*
 406         * zone_start_pfn, spanned_pages and present_pages are all
 407         * protected by span_seqlock.  It is a seqlock because it has
 408         * to be read outside of zone->lock, and it is done in the main
 409         * allocator path.  But, it is written quite infrequently.
 410         *
 411         * The lock is declared along with zone->lock because it is
 412         * frequently read in proximity to zone->lock.  It's good to
 413         * give them a chance of being in the same cacheline.
 414         */
 415        unsigned long           spanned_pages;  /* total size, including holes */
 416        unsigned long           present_pages;  /* amount of memory (excluding holes) */
 417
 418        /*
 419         * rarely used fields:
 420         */
 421        const char              *name;
 422} ____cacheline_internodealigned_in_smp;

이 구조체는 상당히 크지만, 시스템에는 세 개의 영역만이 있으며 따라서 이 구조체도 세 개만이 존재한다.
여기서 중요한 필드를 살펴보면,
lock필드는 스핀락으로서 이 구조체가 동시에 접근되는 것을 방지한다. 주의할 점은 이 락은 구조체를 보호할 뿐 영역 내에 있는 모든 페이지를 보호하지는 않는다.
free_pages 필드는 영역 내에 있는 가용한 페이지의 수. 커널은 가능하다면 최소한 pages_min 수 만큼의 페이지를 가용한 상태로 유지하려고(필요하다면 swapping을 해서라도) 노력한다.<-- 이건 없는데?;;;
name 필드는 영역의 이름에 해당하는 , NULL로 끈나는 문자열이다. 커널은 이 값을 부팅 과정중에 mm/page_alloc.c에서 초기화하며, 세 영역에는 각각 "DMA", "Normal", "HighMem"의 이름이 부여된다.

'OS이야기' 카테고리의 다른 글

kmalloc() (0)	2011.10.19
페이지 얻기 (0)	2011.10.18
메모리 관리 (0)	2011.10.17
실행지연 (0)	2011.10.17
타이머 (0)	2011.10.17

Posted by НooпeУ

달력

« 2025/7 »

영역(Zone) OS이야기2011. 10. 18. 13:21

'OS이야기' 카테고리의 다른 글

Hooney의 스토리텔링

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

링크

티스토리툴바