The Best Local Coding LLMs to Run in Your Environment

Currently, large language models (LLMs) have transformed the way developers and data professionals approach their daily tasks. Local LLMs, especially those fine-tuned for coding tasks, have become powerful tools, providing personalized assistance within individual work environments. This mode is particularly attractive as it promotes data privacy and reduces costs associated with API usage. Below are some of the best coding LLMs that can be run locally, along with their distinctive features.

1. GLM-4-32B-0414

The open-source GLM-4-32B-0414 model series, released by Zhipu AI from Tsinghua University, includes a model with 32 billion parameters, comparable to GPT-4o and DeepSeek-V3. This model has been pre-trained on 15 terabytes of reasoning-focused data and has been refined through alignment with human preferences, rejection sampling, and reinforcement learning. As a result, it can effectively follow instructions and produce structured outputs.

The GLM-4-32B-0414 is particularly proficient in generating complex code, code analysis, and producing function call format outputs. Its multi-step reasoning ability in code, such as tracing logic and suggesting improvements, surpasses many models of similar or larger size. It also features a relatively wide context window of up to 32,000 tokens, which facilitates the processing of large blocks of code or multiple files without hassle. This makes it ideal for analyzing entire codebases or providing comprehensive refactoring suggestions in a single run.

2. DeepSeekCoder V2

DeepSeekCoder V2 is a coding model based on a mixture of experts system, specifically designed for programming tasks. It is offered in two open-weight versions: a "Lite" model with 16 billion parameters and another with 236 billion parameters. The model was pre-trained with 6 terabytes of additional data on DeepSeek-V2, expanding its language coverage from 86 to 338.

The model demonstrates top-tier performance, as evidenced by its prominent position on the LLM Aider leaderboard, sitting alongside high-end closed models in code reasoning. The code is licensed under the MIT license, and the model weights are available under the DeepSeek model license, which permits commercial use. The 16 billion parameter model is recommended for local execution to quickly complete code and participate in “vibe coding” sessions, while the 236 billion parameter model is designed for multi-GPU servers, intended for intensive code generation and project-scale reasoning.

3. Qwen3-Coder

Developed by Alibaba Cloud's Qwen team, Qwen3-Coder is a coding-focused model trained on 7.5 terabytes of data, of which 70% corresponds to code. It uses a mixture of experts (MoE) transformer and comes in two versions: one with 35 billion parameters and another with 480 billion parameters. Its performance rivals the coding capabilities of models like GPT-4 and Claude 4 Sonnet, as it features a context window of 256,000 tokens that can be extended up to 1,000,000 using Yarm.

This model can handle complete repositories and long files in a single session, comprehending and generating code in over 350 programming languages, while also being capable of agile coding tasks. Although the 480 billion parameter model requires powerful hardware, such as multiple H100 GPUs or high-memory servers, its MoE design allows only a subset of parameters to be activated per token, making it more efficient. For those seeking lower requirements, the 35 billion and FP8 variants can run on a single high-end GPU for local use. The model weights are publicly available under the Apache 2.0 license, making Qwen3-Coder an accessible and powerful coding assistant.

4. Codestral

Codestral is a code transformer dedicated to software development in over 80 programming languages, created by Mistral AI. It was released in two variants: one with 22 billion parameters and another called Mamba with 7 billion, both featuring a wide context window of 32,000 tokens. These versions are designed to provide low latency relative to their size, which is advantageous during live editing.

For local coding, the 22 billion parameter model is competent and fast enough to operate in 4-/8-bit mode on a single powerful GPU while maintaining the capability to generate longer text for larger-scale projects. Mistral also offers endpoints for Codestral, but for those opting for completely local use, the open weights along with common inference stacks are more than sufficient.

5. Code Llama

Code Llama is a family of models specialized in coding, derived from Llama, which offers multiple sizes (7B, 13B, 34B, 70B) and variations (base, Python-specialized, Instruct) developed by Meta. Depending on the version, the models can reliably operate for specific tasks, such as completing lines of code or performing specific Python tasks, even with extremely long inputs (up to approximately 100,000 tokens using extended context techniques). All are available as open weights under Meta's community license, which permits broad usage in both research and commercial activities.

Code Llama has become a popular reference for local coding agents and copilots in development environments, as the 7 billion and 13 billion models run smoothly on laptops and desktops with a single GPU (especially when quantized). Meanwhile, the 34 billion and 70 billion models offer greater accuracy if more VRAM is available. With its various versions, there are many potential applications; for example, the Python-specialized model is ideal for data and machine learning workflows, while the Instruct variant performs well in conversational interactions and “vibe coding” workflows within editors.

With this variety of options available for running coding models locally, developers can choose the one that best fits their needs and work environment.

To delve deeper into the fascinating world of LLMs and how they can transform your personal and professional development, feel free to keep exploring more content on this blog.