如果一开始设置了retain token=192,在此处预设每层保留的token数,这样等效的是191
$$ (2\times 576+4\times300+10\times200+16\times110)/32=191 $$
|
sparse_token_list_192 = [300, 200, 110] if not V2_0 else [300, 200, 118] # 2*576 4*300 10*200 16*110 |
|
sparse_token_list_128 = [303, 110, 36] if not V2_0 else [238, 108, 60] |
|
sparse_token_list_96 = [238, 48, 26] if not V2_0 else [246, 54, 28] |
|
sparse_token_list_64 = [66, 30, 17] if not V2_0 else [66, 34, 20] |
但是计算cluster and merge的话岂不是增加了token数,选取的是未被选择的前30%(定义为merge_token_stage2),cluster成int(merge_token_stage2.shape[1] / 10) + 1
|
merge_token_idx_stage1 = torch.where(pred_score_vis==0)[1] |
|
merge_token_stage1 = relation_vis_text[0][merge_token_idx_stage1] |
|
merge_token_num_stage1 = int(merge_token_idx_stage1.shape[0] * 0.3 ) + 1 # Top 30% |
|
merge_token_stage2_idx = merge_token_stage1.topk(merge_token_num_stage1)[1] |
|
|
|
merge_token_stage2 = total_sparse_token[:,merge_token_stage2_idx,:] |
|
cluster_num = int(merge_token_stage2.shape[1] / 10) + 1 |
|
if (cluster_num == 0) : |
|
cluster_num = merge_token_stage2.shape[1] |
|
|
|
merge_sparse_token = cluster_and_merge(merge_token_stage2, cluster_num) |
那在剪枝层增加的token数是
276/10+1=28, (328-200)/10+1=13, (213-110)/10+1=11
那等效token数应该是
$$ (2\times576+4\times328+10\times213+16\times121)/32=204 $$
另外将token数量设置为固定值,论文原文中3.2节Sparsification Level Adaptation中关于attention score的rank有什么用?
如果一开始设置了retain token=192,在此处预设每层保留的token数,这样等效的是191
SparseVLMs/llava/model/language_model/score.py
Lines 11 to 14 in 87fe431
但是计算cluster and merge的话岂不是增加了token数,选取的是未被选择的前30%(定义为merge_token_stage2),cluster成int(merge_token_stage2.shape[1] / 10) + 1
SparseVLMs/llava/model/language_model/modelling_sparse_llama.py
Lines 280 to 290 in 87fe431
那在剪枝层增加的token数是
276/10+1=28, (328-200)/10+1=13, (213-110)/10+1=11
那等效token数应该是
另外将token数量设置为固定值,论文原文中3.2节Sparsification Level Adaptation中关于attention score的rank有什么用?