Sparsity In Deep Neural Nets
Large Language Models (LLMs) have captured the attention of the tech world with their remarkable common-sense reasoning and generalizability. However, their large size and server transfer requirements can make them resource-intensive and slow, which is problematic for use in mobile or wearable devices like smart glasses and smart watches. Moreover, on-device computing could offer a solution to privacy concerns by keeping sensitive data, such as text messages or photos, on the device itself. To tackle these challenges, we’ve developed a more compact language model, ranging from 0.5B to 1.4B parameters. This model is designed to run on-device, providing a competitive performance for conversational grounded tasks, while also managing latency and memory usage effectively.