How CODEXT works
A complete technical description of what happens from the moment you drop a folder to the moment the .txt file lands on disk.
When you drop a folder or click "Bundle", CODEXT runs five sequential phases. All processing is synchronous and local. No threads are spawned beyond the Rust runtime's internal pool.
These directories and files are excluded by default with "Skip defaults" enabled (the default setting). They can all be re-enabled individually in settings if you need to include them.
CODEXT reads and applies .gitignore rules from three locations, in order of precedence:
1. Project root .gitignore 2. Subdirectory .gitignore files (applied to their subtree) 3. Global gitignore (~/.gitignore_global) — if it exists 4. CODEXT default exclusions (always applied if "Skip defaults" is on) 5. .codextignore in project root (Pro only, highest precedence)
Negation patterns (lines starting with !) are supported. Glob patterns (**, ?, character classes) are fully supported.
A file is classified as binary if any of the following are true after reading the first 8KB:
— Contains one or more null bytes (0x00) — More than 30% of bytes are non-printable (outside 0x09–0x0D, 0x20–0x7E) — File extension is in the known binary list: .png .jpg .jpeg .gif .webp .ico .svg (if binary-encoded) .exe .dll .so .dylib .bin .zip .tar .gz .bz2 .7z .rar .pdf .doc .docx .xls .xlsx .woff .woff2 .ttf .eot
Binary files appear in the project map with a [binary] label. Their file size is shown. Contents are never included.
The output is a plain UTF-8 encoded .txt file. It has three sections separated by horizontal dividers. The format is designed to be maximally readable by both humans and LLMs.
CODEXT PROTOCOL: 1.2.0 Generated: 2026-04-15T14:32:11Z Mode: full-content Options: gitignore=true, skip_defaults=true, max_size=500KB ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ INFO Folder name : my-project Full path : /Users/dev/projects/my-project File count : 84 Folder count : 12 Token est. : ~34,100 (GPT-4) ━━━ PROJECT MAP ━━━━━━━━━━━━━━━━━━━ 📁 my-project/ ├── 📁 src/ │ ├── 📁 components/ │ │ ├── 📄 Button.tsx │ │ └── 📄 Modal.tsx │ ├── 📄 index.ts │ └── 📄 utils.ts ├── 📄 package.json └── 📄 tsconfig.json ━━━ FILE CONTENTS ━━━━━━━━━━━━━━━━━ [FILE: src/index.ts] // file content here ...
The file tree uses Unicode box-drawing characters (same as the tree command). File content sections use a consistent [FILE: path/to/file.ext] header that models can parse reliably.
The token estimator runs before writing the output file. It counts the approximate number of tokens the output will consume in a model's context window. This helps you decide whether to split the bundle, add exclusions, or increase the size cap threshold.
GPT-4 / Claude 3: ~4 characters per token (English + code average) Estimate shown in UI before bundle completes Estimate included in output header for reference Actual token count varies ±15% depending on content type Code-heavy projects tokenize at ~3.5 chars/token Documentation-heavy projects at ~4.5 chars/token