Nexal Downloads Center

Access all Nexal language resources, datasets, and tools to accelerate your AI language projects

Browse Lexicon Tokenizer Tools

Nexal Lexicon & Corpora

Complete vocabulary and language datasets for training and development

Complete vocabulary with 500 roots, parts of speech, glosses, and notes in JSON format.

Format: JSON Size: 500 entries

Download Preview

Starter lexicon with 500 roots in CSV format for easy integration with data tools.

Format: CSV Size: 500 entries

Download Preview

Space-tokenized corpus with root + gloss tokens + POS tags for training subword models.

Format: TXT Size: 500 lines

Download Preview

Roots-only corpus with one root per line for vocabulary-focused training.

Format: TXT Size: 500 roots

Download Preview

Root	Part of Speech	Gloss	Notes

Ready-to-run scripts for building tokenizers with popular NLP frameworks

Python script to train a Hugging Face-compatible BPE tokenizer from your corpus.

Format: Python Dependencies: tokenizers, transformers

Download Preview

Python script to train SentencePiece models (BPE/unigram/word/char) for Nexal.

Format: Python Dependencies: sentencepiece

Download Preview

Hugging Face Tokenizer

SentencePiece Trainer

Python