Access all Nexal language resources, datasets, and tools to accelerate your AI language projects
Complete vocabulary and language datasets for training and development
Complete vocabulary with 500 roots, parts of speech, glosses, and notes in JSON format.
Starter lexicon with 500 roots in CSV format for easy integration with data tools.
Space-tokenized corpus with root + gloss tokens + POS tags for training subword models.
Root | Part of Speech | Gloss | Notes |
---|
Ready-to-run scripts for building tokenizers with popular NLP frameworks
Python script to train a Hugging Face-compatible BPE tokenizer from your corpus.