EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
The paper "**Training-free Compensation for Compressed LLM**" introduces an innovative approach to improve the performance of compressed Large Language Models without requiring additional training. The authors propose **Eigenspace Low-Rank Approximation** (*ELRA*), a novel technique that compensates for performance degradation in compressed LLMs by analyzing and adjusting their weight matrices in eigenspace. The method identifies and preserves critical model components while efficiently handling less important features through low-rank approximation. This *training-free approach* achieves significant performance recovery in compressed models, maintaining up to 95% of the original model's capabilities while reducing computational overhead. The research demonstrates ELRA's effectiveness across various compression techniques, including *quantization* and *pruning*, making it particularly valuable for deploying LLMs in resource-constrained environments. The paper provides comprehensive empirical evidence showing ELRA's superiority over traditional compensation methods in terms of both computational efficiency and performance restoration.