Config reference: `tokenization`

Enterprise tuning surface

Defaults + constraints are rendered directly from Pydantic.
Env keys when available

Many fields have an env-style alias (from TriBridConfig.to_flat_dict()).
Tooltip-level guidance

If a matching glossary entry exists, you’ll see deeper tuning notes.

Total parameters: 7

Group index

`(root)`

JSON key	Env key(s)	Type	Default	Constraints	Summary
`tokenization.estimate_only`	—	`bool`	`false`	—	If true, use fast approximate token counting.
`tokenization.hf_tokenizer_name`	—	`str`	`"gpt2"`	—	HuggingFace tokenizer name (strategy='huggingface').
`tokenization.lowercase`	—	`bool`	`false`	—	Lowercase before tokenization.
`tokenization.max_tokens_per_chunk_hard`	—	`int`	`8192`	≥ 256, ≤ 65536	Absolute hard limit for tokens per chunk (safety ceiling).
`tokenization.normalize_unicode`	—	`bool`	`true`	—	Normalize unicode (NFKC) before tokenization for stability.
`tokenization.strategy`	—	`Literal["whitespace", "tiktoken", "huggingface"]`	`"tiktoken"`	allowed="whitespace", "tiktoken", "huggingface"	Tokenization strategy used for chunking/budgeting.
`tokenization.tiktoken_encoding`	—	`str`	`"o200k_base"`	—	tiktoken encoding name (strategy='tiktoken').