导航:[首页]->[cpp]->[Google TCMalloc学习笔记2-页管理]

写于 2011年11月5日

tcmalloc底层的内存分配直接使用操作系统的内存页分配接口来分配内存,这些也通过一个PageHeap来管理。整个内存流向如下图

ThreadCache
  ||
CentralFreelist
  ||
PageHeap
  ||
操作系统

PageHeap直接从操作系统获得一个个内存页,并以页面的形式保存,中央空闲链(CentralFreeList)从PageHeap获得一个个页面,然后分成若干对象(一个对象就是上文所说的区间大小),线程缓存(ThreadCache)从CentralFreeList获得若干对象并交付应用程序。

在一个32位的地址空间中,中央阵列由一个2层的基数树来表示,其中根包含了32个条目,每个叶包含了 215个条目(一个32为地址空间包含了 220个 4K 页面,所以这里树的第一层则是用25整除220个页面)。这就导致了中央阵列的初始内存使用需要128KB空间(215*4字节),看上去还是可以接受的。

1 2 3 4 ... ... 29 30 31
          ||
          ||
      Page1
      Page2
      ... ...  2的15次方
      PageN-1
      PageN

如上图,整个基数树包含两层,第一层是一个32个元素的指针数组,每一个元素指向一个结构,这个结构包含一个215个条目的指针数组,这个数组元素(存储一个指针)指向一个内存页(到后面会发现,其实是指向一个Span)。每一个内存也是4K,那么一个就包含4k x 32 x 215 = 4G,恰好包含整个虚拟内存空间,tcmalloc这样做有一个好处。假设第一次分配一个页面,内存地址是X,接下来又分配一个内存也,内存地址是X+4K,现在如果有一个业务需要6K内存,正常情况下是做不到的,但是这里可以看到其实两个内存也是相连的,完全可以将其合并,并一起使用。为了做到这一点,就必须对所有内存也的地址进行排序,上面的结构就很方便(数组寻址),除了有点费内存,整个结构初始化之后需要4M内存。另一个问题是,如果打算在内存过多时返还给操作系统一部分,这里合并的时候还是必须记录每一次申请内存时的边界,否则释放内存时将出错,不过就我所知,tcmalloc并没有打算返还内存。

每一次申请内存,tcmalloc将通过一个span来管理这块内存,这些信息包括这个页在基数树的索引,以及页的数量。Span的定义如下(有删减)

// Information kept for a span (a contiguous run of pages).
struct Span {
    //此连续页面内存块,第一个页面的索引
    PageID        start;          // Starting page number
    //连续页面长度
    Length        length;         // Number of pages in span

    //span的位置,这些位置定义如下
    unsigned int  location : 2;   // Is the span on a freelist, and if so, which?

    //分别是“正在使用”,“未使用过的空闲块”,“使用过但是已经不再使用过的空闲块”
    // What freelist the span is on: IN_USE if on none, or normal or returned
    enum { IN_USE, ON_NORMAL_FREELIST, ON_RETURNED_FREELIST };
};

每一次获得页面并准备插入到基数树之前,tcmalloc总是先检查与它相邻的前后两个页面是否也存在,并且是同样的属性(空闲/使用/…),若相同,则将其合并而组成一个更大的Span,

|=|=|=|=|=|=|=|=|=|=|=|=|=|=|=|=|=|=|=|=
| |     | |     |       |     |     |
|a|     |b|     |    c  |     |  d  |

如上图,下面的a、b、c、d就是一个Span,Span a包含两个页面,Span c包含五个页面,对于多个页面的Span,中间页面的基数树索引并不一定指向实际的Span。比如Span c中间三个页面的索引就可能不存在。

根据源码看一下具体的实现(32位系统为例),基数树的实现如下,

template<int BITS> class MapSelector {
public:
    typedef TCMalloc_PageMap3<BITS - kPageShift> Type;
    typedef PackedCache<BITS - kPageShift, uint64_t> CacheType;
};

// A two-level map for 32-bit machines
//kPageShift = 12  (4K)
template<> class MapSelector<32> {
public:
    typedef TCMalloc_PageMap2<32 - kPageShift> Type;
    typedef PackedCache<32 - kPageShift, uint16_t> CacheType;
};

如果是32位平台,会使用下面模板特化的版本。

template <int BITS> //BITS = 20
class TCMalloc_PageMap2 {
private:
    // Put 32 entries in the root and (2^BITS)/32 entries in each leaf.
    static const int ROOT_BITS = 5;
    static const int ROOT_LENGTH = 1 << ROOT_BITS; //32

    static const int LEAF_BITS = BITS - ROOT_BITS; //15
    static const int LEAF_LENGTH = 1 << LEAF_BITS; //1024*32

    // Leaf node
    //每一个根数组元素指向的结构,这个结构包含一个数组
    struct Leaf {
        void* values[LEAF_LENGTH];
    };

    //32各根指针数组
    Leaf* root_[ROOT_LENGTH]; // Pointers to 32 child nodes
    void* (*allocator_)(size_t); // Memory allocator

public:
    typedef uintptr_t Number;

    explicit TCMalloc_PageMap2(void* (*allocator)(size_t)) {
        allocator_ = allocator;
        memset(root_, 0, sizeof(root_));
    }

    void* get(Number k) const {
        //获得根节点的索引
        const Number i1 = k >> LEAF_BITS;
        //获得特定页结构 数组的索引
        const Number i2 = k & (LEAF_LENGTH - 1);
        if ((k >> BITS) > 0 || root_[i1] == NULL) {
            return NULL;
        }
        //可见效率还是很高的
        return root_[i1]->values[i2];
    }

    void set(Number k, void* v) {
        ASSERT(k >> BITS == 0);
        const Number i1 = k >> LEAF_BITS;
        const Number i2 = k & (LEAF_LENGTH - 1);
        root_[i1]->values[i2] = v;
    }

    //...
}

使用如下结构来管理页面,

// Pick the appropriate map and cache types based on pointer size
typedef MapSelector<kAddressBits>::Type PageMap;
typedef MapSelector<kAddressBits>::CacheType PageMapCache;
//基数树
PageMap pagemap_;
//一个缓存类,缓存一些数据。没仔细看,里面用hash实现快速查询
mutable PageMapCache pagemap_cache_;

// We segregate spans of a given size into two circular linked
// lists: one for normal spans, and one for spans whose memory
// has been returned to the system.
//一个Span链表头

struct SpanList {
    Span normal;
    Span returned;
};

// List of free spans of length >= kMaxPages
//超过256个页面的连续块,使用large span来存储
SpanList large_;

//小于256个页面的连续块,对每一种页面数大小,使用一个链表,这些链表通过free_数组索引
// Array mapping from span length to a doubly linked list of free spans
SpanList free_[kMaxPages];

//统计信息
// Statistics on system, free, and unmapped bytes
Stats stats_;

//插入一个空闲页面 
void PageHeap::PrependToFreeList(Span* span) {
    ASSERT(span->location != Span::IN_USE);
    //如果大于256,则插入large_,否则插入数组free_对应的链表
    SpanList* list =
            (span->length < kMaxPages) ? &free_[span->length] : &large_;
    if (span->location == Span::ON_NORMAL_FREELIST) {
        stats_.free_bytes += (span->length << kPageShift);
        //置入链表中
        DLL_Prepend(&list->normal, span);
    } else {
        stats_.unmapped_bytes += (span->length << kPageShift);
        //置入链表中
        DLL_Prepend(&list->returned, span);
    }
}

//删除一个空闲页面
void PageHeap::RemoveFromFreeList(Span* span) {
    ASSERT(span->location != Span::IN_USE);
    if (span->location == Span::ON_NORMAL_FREELIST) {
        stats_.free_bytes -= (span->length << kPageShift);
    } else {
        stats_.unmapped_bytes -= (span->length << kPageShift);
    }
    //从链表中删除,因为是一个双向链表,所以这里并不需要知道它属于哪一个链表
    DLL_Remove(span);
}

分配N个页面

//请求分配N个页面
Span* PageHeap::New(Length n) {
    ASSERT (Check());ASSERT(n > 0);
    //看当前还有没有足够的页面
    Span* result = SearchFreeAndLargeLists(n);
    if (result != NULL)
    //有了,

    return result;

    //否则像操作系统要一些
    // Grow the heap and try again.
    if (!GrowHeap(n)) {
        ASSERT(Check());
        return NULL;
    }
    //再次分配
    return SearchFreeAndLargeLists(n);
}

//查找当前是否有足够大的空闲页面
Span* PageHeap::SearchFreeAndLargeLists(Length n) {
    ASSERT (Check());ASSERT(n > 0);
    //依次从匹配的页面链表开始查找,例如申请两个页面,先从free_[1]查找
    //若没有,则往上查找,比如free_[3]找到了,则获得对应的span,然后分成两个
    //span,给出一个页面的span返回,然后将三个页面的span交给free_[2]
    // Find first size >= n that has a non-empty list
    for (Length s = n; s < kMaxPages; s++) {
        Span* ll = &free_[s].normal;
        // If we're lucky, ll is non-empty, meaning it has a suitable span.
        if (!DLL_IsEmpty(ll)) {
            ASSERT(ll->next->location == Span::ON_NORMAL_FREELIST);
            return Carve(ll->next, n);
        }
        // Alternatively, maybe there's a usable returned span.
        ll = &free_[s].returned;
        if (!DLL_IsEmpty(ll)) {
            ASSERT(ll->next->location == Span::ON_RETURNED_FREELIST);
            return Carve(ll->next, n);
        }
    }
    //找不到,则从大页面链表查找
    // No luck in free lists, our last chance is in a larger class.
    return AllocLarge(n);// May be NULL
}

Span* PageHeap::Carve(Span* span, Length n) {
    ASSERT(n > 0);
    ASSERT(span->location != Span::IN_USE);
    //保存原来的状态,
    const int old_location = span->location;
    //从空闲列表中删除
    RemoveFromFreeList(span);
    //将他标记为使用状态
    span->location = Span::IN_USE;
    Event(span, 'A', n);

    const int extra = span->length - n;
    ASSERT(extra >= 0);
    //如果这个Span比实际的大,则将多余的部分重新分配一个span
    if (extra > 0) {
        //新建一个span
        Span* leftover = NewSpan(span->start + n, extra);
        //恢复原来的状态
        leftover->location = old_location;
        Event(leftover, 'S', extra);
        //初始化span
        RecordSpan(leftover);
        //插入空闲列表
        PrependToFreeList(leftover); // Skip coalescing - no candidates possible
        span->length = n;
        //插入基数树
        pagemap_.set(span->start + n - 1, span);
    }
    ASSERT (Check());
    return span ;
}

Span* PageHeap::AllocLarge(Length n) {
    // find the best span (closest to n in size).
    // The following loops implements address-ordered best-fit.
    Span *best = NULL;

    //依次查询normal和returned链表
    //因为超过256个页面的span都放在这里,所以这里总是找一个合适并且最小的span
    // Search through normal list
    for (Span* span = large_.normal.next; span != &large_.normal;
            span = span->next) {
        if (span->length >= n) {
            if ((best == NULL)
            //如果页数量小于当前最合适的
                    || (span->length < best->length)
                    //或者span的初始地址较低
                    || ((span->length == best->length)
                            && (span->start < best->start))) {
                best = span;
                ASSERT(best->location == Span::ON_NORMAL_FREELIST);
            }
        }
    }

    // Search through released list in case it has a better fit
    for (Span* span = large_.returned.next; span != &large_.returned; span =
            span->next) {
        if (span->length >= n) {
            if ((best == NULL) || (span->length < best->length)
                    || ((span->length == best->length)
                            && (span->start < best->start))) {
                best = span;
                ASSERT(best->location == Span::ON_RETURNED_FREELIST);
            }
        }
    }

    return best == NULL ? NULL : Carve(best, n);
}

bool PageHeap::GrowHeap(Length n) {
    ASSERT(kMaxPages >= kMinSystemAlloc);
    if (n > kMaxValidPages)
        return false;
    //tcmalloc限制了最小分配的页面数,默认是256(1M),所以即使只需要一个页面,也
    //直接给申请256个页面
    Length ask =
            (n > kMinSystemAlloc) ? n : static_cast<Length>(kMinSystemAlloc);
    size_t actual_size;
    void* ptr = TCMalloc_SystemAlloc(ask << kPageShift, &actual_size,
            kPageSize);
    if (ptr == NULL) {
        //如果失败了,则尝试申请上层需要的页面数
        if (n < ask) {
            // Try growing just "n" pages
            ask = n;
            ptr = TCMalloc_SystemAlloc(ask << kPageShift, &actual_size,
                    kPageSize);
        }
        if (ptr == NULL)
            return false;
    }
    ask = actual_size >> kPageShift;
    RecordGrowth(ask << kPageShift);

    uint64_t old_system_bytes = stats_.system_bytes;
    stats_.system_bytes += (ask << kPageShift);
    const PageID p = reinterpret_cast<uintptr_t>(ptr) >> kPageShift;
    ASSERT(p > 0);

    // If we have already a lot of pages allocated, just pre allocate a bunch of
    // memory for the page map. This prevents fragmentation by pagemap metadata
    // when a program keeps allocating and freeing large blocks.
    //如果申请内存之后,跨过了kPageMapBigAllocationThreshold(128M)这个阀值,
    //就让基数树预分配更多的结构来存储span
    if (old_system_bytes < kPageMapBigAllocationThreshold
            && stats_.system_bytes >= kPageMapBigAllocationThreshold) {
        pagemap_.PreallocateMoreMemory();
    }

    // Make sure pagemap_ has entries for all of the new pages.
    // Plus ensure one before and one after so coalescing code
    // does not need bounds-checking.
    //确保基数树,申请的页面数,以及前后两个页面的数据结构存在
    if (pagemap_.Ensure(p - 1, ask + 2)) {
        // Pretend the new area is allocated and then Delete() it to cause
        // any necessary coalescing to occur.
        //创建一个新的Span来管理这些内存
        Span* span = NewSpan(p, ask);
        //插入到基数树
        RecordSpan(span);
        //进一步检查这个span
        Delete(span);
        ASSERT (Check());
        return true     ;
    } else {
        // We could not allocate memory within "pagemap_"
        // TODO: Once we can return memory to the system, return the new span
        return false;
    }
}

void PageHeap::Delete(Span* span) {
    const Length n = span->length;
    span->sizeclass = 0;
    span->sample = 0;
    span->location = Span::ON_NORMAL_FREELIST;
    Event(span, 'D', span->length);
    //检查前后的span是否存在,并且具有相同的属性
    MergeIntoFreeList(span); // Coalesces if possible
    //是否需要释放内存,但是这函数似乎从来都没有释放内存,
    IncrementalScavenge(n);
    ASSERT (Check());}

void PageHeap::MergeIntoFreeList(Span* span) {
    ASSERT(span->location != Span::IN_USE);

    // Coalesce -- we guarantee that "p" != 0, so no bounds checking
    // necessary.  We do not bother resetting the stale pagemap
    // entries for the pieces we are merging together because we only
    // care about the pagemap entries for the boundaries.
    //
    // Note that only similar spans are merged together.  For example,
    // we do not coalesce "returned" spans with "normal" spans.
    const PageID p = span->start;
    const Length n = span->length;
    Span* prev = GetDescriptor(p - 1);
    //如果前一个span不为空,并且属性一致,则合并之
    if (prev != NULL && prev->location == span->location) {
        // Merge preceding span into this span
        ASSERT(prev->start + prev->length == p);
        //保存前一个span的长度并删除对应的span
        const Length len = prev->length;
        RemoveFromFreeList(prev);
        DeleteSpan(prev);

        //将当前的span长度家常,并将初始页面往前移动
        span->start -= len;
        span->length += len;
        //插入基数树
        pagemap_.set(span->start, span);
        Event(span, 'L', len);
    }
    Span* next = GetDescriptor(p + n);
    //如果后一个span不为空,并且属性一致,则合并之
    if (next != NULL && next->location == span->location) {
        // Merge next span into this span
        ASSERT(next->start == p + n);
        //保存旧的长度,并删除span
        const Length len = next->length;
        RemoveFromFreeList(next);
        DeleteSpan(next);
        //将span长度增加
        span->length += len;
        //写入基数树
        pagemap_.set(span->start + span->length - 1, span);
        Event(span, 'R', len);
    }
    //放入空闲链表
    PrependToFreeList(span);
}

参考

  1. http://shiningray.cn/tcmalloc-thread-caching-malloc.html
  2. http://pan.baidu.com/s/1mgmiuqk