Tag

#Sparse Models

arXiv 2606.12397은 MoE router row를 각 expert weight matrix의 principal singular direction에 맞추는 Manifold Power Iteration을...

Sangmin Lee2026.06.12

Ai2와 UC Berkeley의 EMO는 문서 경계를 약한 supervision으로 삼아 같은 문서의 토큰이 공유 expert pool 안에서 routing되도록 MoE를 사전학습한다. 1B active / 14...

Sangmin Lee2026.05.13

UniPool은 MoE에서 레이어마다 따로 들고 있던 expert 집합을 전역 공유 풀로 바꾸고, pool-level balancing과 NormRouter를 더해 깊이에 비례하던 expert 파라미터 증가를 느...

Sangmin Lee2026.05.08