#Multimodal

#Agents42#MCP16#RAG15#Developer Tools14#YouTube14#Claude Code11#Retrieval11#AI Engineer9#MoE9#Multi-Agent Systems9#Reinforcement Learning9#Multimodal8#VLM8#Coding Agents7#Hugging Face7#On-device AI7#Open Weights7#Reasoning7#Agent Evaluation6#Agent Harness6#Agent Skills6#Agent Systems6#Benchmark6#Codex6#Context Engineering6#Embeddings6#Long Context6#Tool Use6#Agentic AI5#AI Agents5#arXiv5#Claude5#NVIDIA5#OCR5#Qwen5#Qwen35#Reranking5#Agent Memory4#Agent Workflows4#AI Safety4#Data Pipeline4#Document Intelligence4#Foundation Models4#Gemma4#GRPO4#Image Generation4#Inference Optimization4#LLM Architecture4#LLM Systems4#Open Source4#RLVR4#SKILL.md4#Synthetic Data4#Vision-Language Models4#Workflow Automation4#Agent Training3#Anthropic3#Auto Research3#Design Systems3#Edge AI3#Fine-Tuning3#KV Cache3#LLM Evaluation3#LLM Serving3#LoRA3#MLOps3#Prompt Engineering3#PyTorch3#Quantization3#Qwen3-VL3#Test-Time Scaling3#Transformer3#Vector Search3#Video Generation3#VLA3#Agent OS2#Agentic Coding2#Agentic Search2#AI for Science2#Apple Silicon2#Argilla2#Backpropagation2#Computer Vision2#Contrastive Learning2#CUDA2#Data Curation2#Data Infrastructure2#Deep Research2#DeepSeek2#Diffusion2#Distillation2#Document AI2#Gemini2#GEPA2#Hallucination2#Harness Engineering2#Information Retrieval2#Knowledge Distillation2#Knowledge Graph2#Language Modeling2#LLM Agents2#LLM Pretraining2#LLM Training2#LLMOps2#Local-First2#Mechanistic Interpretability2#Microsoft Research2#MLX2#Mobile LLM2#Model Training2#Multi-Agent2#Multimodal Agents2#Nemotron2#Observability2#Obsidian2#OpenAI2#Post-Training2#Privacy2#Product Strategy2#Quant Finance2#Reasoning Models2#Research Engineering2#RF-DETR2#Roboflow2#Robotics2#SAM2#Security2#Skill Optimization2#Small Language Models2#Sparse Models2#Structured Extraction2#Survey2#Tabular Data2#TDD2#Training Systems2#Unsloth2#Verification2#Video Understanding2#Vision Transformer2#Vision-Language Model2#vLLM2#World Model2#Accessibility1#Activation Steering1#AG-UI1#Agent Engineering1#Agent Orchestration1#Agent Protocols1#Agent Runtime1#Agent Safety1#Agent UI1#AgentBench1#Agentic Design1#Agentic Reasoning1#Agentic RL1#Agentic Security1#Agentic Self-Instruct1#AI Co-Mathematician1#AI Coding1#AI Coding Agents1#AI for Mathematics1#AI Infrastructure1#AI Pricing1#AI SaaS1#AI-Q1#Allen AI1#Alyx1#AMD1#Analog Hardware1#ANN1#Apple Intelligence1#AppWorld1#Arize1#Assistive Technology1#Associative Memory1#Attention1#Attention Supervision1#Attractor Models1#Autodata1#Autogenesis1#AutoML1#Autonomous Driving1#AutoResearchClaw1#Baidu1#Benchmarks1#BI1#Biologically Plausible Learning1#Browser Agents1#Calamari1#Camera Control1#Career1#Chain-of-Thought1#Chunking1#Claude Opus1#ClawHub1#Clinical AI1#ClinSeekAgent1#Code Evolution1#Code Generation1#Code Intelligence1#Code Models1#Codex CLI1#Cognitive Loafing1#Computer Use1#Computer Use Agent1#Consulting1#Content Moderation1#Context Compression1#Context Distillation1#Context Graphs1#Context Learning1#Continual Learning1#Continuous Generation1#CopilotKit1#Ctx2Skill1#CyberGym1#Data Annotation1#Data Sanitization1#Data Security1#Data-Centric AI1#Dataset Ops1#DCI-Agent1#Deep Search1#DeepEval1#DeepSeek-OCR1#DELEGATE-521#Delegated Work1#Delivery1#Delta-Mem1#Demand-Driven Context1#Design Research1#Design Tools1#DESIGN.md1#Desktop Apps1#Diffusion Language Models1#Diffusion LLM1#Diffusion LM1#Diffusion Transformer1#Direct Corpus Interaction1#Distilabel1#Distributed Training1#Document Editing1#DPO1#DSPy1#Dynin-Omni1#EDA1#Edge Inference1#Effect1#EHR1#Elastic1#ElevenLabs1#ELF1#Enterprise Agents1#Enterprise AI1#ERNIE 4.51#ETL1#Evaluation1#EXAONE1#ExecuTorch1#Fara-7B1#FastAPI1#FastEmbed1#Financial Time Series1#Fintech1#Fixed-point1#FlashAttention1#Flow Matching1#Forward-Forward Algorithm1#Frontend AI1#FrontierMath1#Gemma 41#Generative UI1#Google I/O1#GPT-5.51#GPU Optimization1#GQA1#GraphRAG1#Grounding1#Guard Models1#Guardrail1#Guardrails1#GUI Agents1#HarnessAudit1#HeavySkill1#Hermes Agent1#Hidden-State Probing1#Historical Documents1#HNSW1#Hope1#Horizon Generalization1#Human Feedback1#Human-in-the-loop1#HumanLayer1#Hunyuan1#Hy-MT21#Hybrid SSM1#Hypernetworks1#Hyperparameter Transfer1#ICLR 20261#Image Editing1#Implicit Differentiation1#In-Context Learning1#Incremental Processing1#Inference1#Inference Providers1#Inference Scaffolding1#Inference Systems1#Inference-Time Compute1#Inference-Time Feedback1#Inpainting1#Interpretability1#Jina AI1#Kanban1#Karpathy1#Knowledge Bases1#Knowledge Graphs1#Knowledge Management1#Korean AI1#Korean LLM1#Kronos1#Lance1#Latent Reasoning1#Latent Space1#Layout Analysis1#Leaderboard1#Learning Rate Transfer1#LFM21#Life-Harness1#Liquid AI1#LiteLLM1#LiteVLA-H1#LLM Depth1#LLM Fundamentals1#LLM Infrastructure1#LLM Internals1#LLM Ops1#LLM Reasoning1#Local Agents1#Local AI1#Local LLM1#Locally AI1#Logic Synthesis1#Long-Horizon1#Long-Horizon Agents1#LongLive-2.01#LongMemEval1#Looped Transformers1#Machine Translation1#Mamba1#Matt Pocock1#MCP Apps1#MCTS1#MDASH1#Megatron-LM1#Memory Systems1#Meta AI1#Meta-Optimization1#MetaAgent-X1#Microsoft1#Microsoft Security1#Mid-Training1#MiMo1#MiniCPM-V1#MinT1#Mistral1#Mistral AI1#ML Engineering1#MMProLong1#Mobile Agents1#MobileLLM-R11#Model Adaptation1#Model Compression1#Model Pruning1#Model Studio1#Moderation1#ModernBERT1#Modular AI1#Monetization1#MongoDB1#MTEB1#Multi-Agent Debate1#Multi-LoRA1#Multimodal AI1#Multimodal Diffusion1#Multimodal Embeddings1#Multimodal LLM1#Multimodal Models1#Multimodal Retrieval1#Multimodal RL1#Multimodal Safety1#Multimodal Search1#Multimodal Training1#NanoGPT1#Native Unified Model1#Native VLM1#Natural Language Autoencoders1#NEO-unify1#Neo4j1#Nested Learning1#Netflix1#Neural Architecture Search1#Neural Networks1#NeurIPS 20251#nGPT1#NL-Refer1#Normalized Transformer1#NVFP41#NVIDIA NeMo1#Object Detection1#OCR-Memory1#Olympiad Math1#Omnimodal1#OmniShotCut1#On-policy Distillation1#OneManCompany1#OneVL1#Online Memory1#Open Models1#Open Training Recipe1#OpenAI API1#OpenCompass1#OpenSearch-VL1#Optimizers1#Ouroboros1#PageIndex1#Paper Reproduction1#Parameter-Efficient Tuning1#Pare-Bench1#PEFT1#PII Detection1#Pixel Embeddings1#Plugins1#PowerPoint1#Preparedness1#Presentation AI1#Presentation Tools1#Priming1#PriorVLA1#Proactive Agents1#Procrustes Alignment1#Product1#Productivity1#Prompt Optimization1#Prompt Tuning1#Pytest1#QAT1#Query Rewriting1#Qwen-Image-2.01#Qwen2.5-VL1#Qwen3-Next1#React1#Reasoning Model1#Recursive Reasoning1#Reinforced Agent1#Research1#Research Workflow1#Residual Stream1#Risk Assessment1#Robot Learning1#RoPE1#Routing1#Rust1#Safety1#SANA-WM1#Sandcastle1#Segmentation1#Self-Consistency1#Self-Evolution1#Self-Generated Data1#Self-Play1#Self-Speculation1#Self-Training1#Semantic Layer1#SenseNova-U11#Sentence Transformers1#Sequence Classification1#SGLang1#Shot Boundary Detection1#Skill Evolution1#Skill Governance1#Skill Retrieval1#Skill-RAG1#Skills1#SmallCode1#Sparse Attention1#Sparse Autoencoder1#Sparse MoE1#Specification First1#Speculative Decoding1#Speech1#SQL1#SRA-Bench1#State Space Model1#Stripe1#SU-011#Swarm Intelligence1#SWE-Bench1#SWIM1#Sycophancy1#TabEmbed1#TabPFN1#Talent Market1#TEDS1#Terminal-Bench1#Text Embeddings1#Text Rendering1#Text-to-SQL1#Time Series Forecasting1#TinyLoRA1#Token Classification1#Tokenizer1#Tool Calling1#Tool-Integrated Reasoning1#TorchAO1#Training Recipes1#Trajectory Audit1#Transformers1#Triton1#Tuna-21#TypeScript1#UI Engineering1#UI Inspiration1#UI over MCP1#Usage-Based Billing1#User Simulation1#Validation1#Vector Database1#Vibe Coding1#Video Editing1#Vision-Language Alignment1#Vulnerability Discovery1#Web Agents1#WorkOS1#X2SAM1#ZeroEntropy1#Zyphra1#μP1

Foundation Models

ERNIE-4.5-VL-Thinking은 3B 활성 MoE로 멀티모달 추론을 가...

Baidu의 ERNIE-4.5-VL-28B-A3B-Thinking은 28B급 총 파라미터와 3B 활성 파라미터를 갖는 공개 VLM으로, 이미지·비디오 reasoning, grounding, tool call, 1...

Sangmin Lee2026.05.26

Foundation Models

Lance는 멀티태스크 시너지로 이미지·비디오 이해와 생성을 한 모델에 묶는다

ByteDance의 Lance는 3B active parameter급 native unified multimodal model로, 이미지·비디오 이해, 생성, 편집을 shared interleaved contex...

Sangmin Lee2026.05.20

Foundation Models

SenseNova-U1은 픽셀과 단어를 같은 기판에서 이해하고 생성한다

SenseNova-U1은 NEO-unify 기반으로 비전 인코더와 VAE를 제거하고, 픽셀 공간 생성과 MoT 구조를 결합해 이해·생성·편집·인터리브 생성을 한 모델 계열에 묶은 공개 멀티모달 릴리스다.

Sangmin Lee2026.05.19

Foundation Models

MiniCPM-V 4.6은 1.3B 멀티모달 모델을 휴대폰 배포 기준으로 다시...

MiniCPM-V 4.6은 SigLIP2-400M과 Qwen3.5-0.8B, LLaVA-UHD v4식 시각 토큰 압축을 결합해 이미지·비디오 이해를 1.3B 규모와 모바일 배포 surface로 끌어내린 공개 VL...

Sangmin Lee2026.05.12

Foundation Models

Tuna-2는 비전 인코더를 버리고 픽셀 임베딩으로 통합 멀티모달을 다시 설계한...

Tuna-2는 사전학습 비전 인코더와 VAE를 제거하고 raw pixel patch embedding만으로 이해와 생성을 함께 처리해, native unified multimodal model의 복잡도를 낮추면서...

Sangmin Lee2026.05.11

Foundation Models

Dynin-Omni는 오토리그레시브 대신 마스크드 디퓨전으로 옴니모달을 한 백본...

Dynin-Omni는 텍스트·이미지·음성 이해와 생성, 그리고 비디오 이해를 하나의 8B 마스크드 디퓨전 백본으로 통합해, 옴니모달 모델링을 외부 생성기 조립이 아니라 shared discrete token sp...

Sangmin Lee2026.05.06

Foundation Models

Qwen3.6-35B-A3B는 3B 활성 파라미터로 에이전트 코딩 성능을 밀어...

Qwen3.6-35B-A3B는 총 35B·활성 3B의 멀티모달 MoE 구조 위에 agentic coding, preserve_thinking, 초장문 컨텍스트 확장, 오픈 배포 경로를 결합해 작은 활성 비용으로...

Sangmin Lee2026.05.06

Foundation Models

MiMo-V2.5는 1M 컨텍스트·오디오·에이전트를 한 모델로 묶는다

Xiaomi의 MiMo-V2.5는 310B Sparse MoE에 비전·오디오 인코더와 에이전트 후학습을 결합해, 1M 컨텍스트와 네이티브 옴니모달 이해를 하나의 공개 모델로 밀어 넣은 릴리스다.

Sangmin Lee2026.05.06