EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

Download PDF

Chat with PDF

Download Chat

The paper "Training-free Compensation for Compressed LLM" introduces an innovative approach to improve the performance of compressed Large Language Models without requiring additional training. The authors propose Eigenspace Low-Rank Approximation (ELRA), a novel technique that compensates for performance degradation in compressed LLMs by analyzing and adjusting their weight matrices in eigenspace. The method identifies and preserves critical model components while efficiently handling less important features through low-rank approximation. This training-free approach achieves significant performance recovery in compressed models, maintaining up to 95% of the original model's capabilities while reducing computational overhead. The research demonstrates ELRA's effectiveness across various compression techniques, including quantization and pruning, making it particularly valuable for deploying LLMs in resource-constrained environments. The paper provides comprehensive empirical evidence showing ELRA's superiority over traditional compensation methods in terms of both computational efficiency and performance restoration.

The paper "**Training-free Compensation for Compressed LLM**" introduces an innovative approach to improve the performance of compressed Large Language Models without requiring additional training. The authors propose **Eigenspace Low-Rank Approximation** (*ELRA*), a novel technique that compensates for performance degradation in compressed LLMs by analyzing and adjusting their weight matrices in eigenspace. The method identifies and preserves critical model components while efficiently handling less important features through low-rank approximation. This *training-free approach* achieves significant performance recovery in compressed models, maintaining up to 95% of the original model's capabilities while reducing computational overhead. The research demonstrates ELRA's effectiveness across various compression techniques, including *quantization* and *pruning*, making it particularly valuable for deploying LLMs in resource-constrained environments. The paper provides comprehensive empirical evidence showing ELRA's superiority over traditional compensation methods in terms of both computational efficiency and performance restoration.

EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

Chat with PDF

Chat with more PDFs

Summarization of Opinionated Political Documents with Varied Perspectives

One Arrow, Many Targets: Probing LLMs for Multi-Attribute Controllable Text Summarization

Retrieval Augmented Retrieval with In Context Examples

Understanding the Effects of Human-written Paraphrases in LLM-generated Text Detection

Visual Caption Restoration

Aligning Large Language Models on Information Extraction

Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding

Rethinking Document Information Extraction Datasets for LLMs

Automatic Generation of Benchmarks and Reliable LLM Judgment for Code Tasks

Controllable Black-Box Attacks on VLM-Powered Web Agents

Detecting Pretraining Data in Large Language Models

Privacy-Preserving In-Context Learning for Large Language Models

AgentBench: Evaluating LLMs as Agents

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Attention is all you need

Send in a query

Talk to an AI expert

DATA CAPTURE

WORKFLOWS

solutions BY FUNCTION

solutions BY INDUSTRY

solutions BY USE CASE

resources

coMPARE

company

get in touch