WocconWaker

Interactive learning and revitalization for a dormant Indigenous language

Role. Creator — end-to-end design and engineering: interaction and content design, the scholarship-to-database pipeline, and the conversational learning experience. Built in partnership with community language workers, who review and approve every entry before it joins the database.

Creator & Engineer · 2025–present · Live on Facebook Messenger
Reviving the Woccon language, historically spoken by Waccamaw people.

WocconWaker control panel showing the Woccon dictionary on desktop alongside the Messenger language assistant on mobile

A language without a teacher

Woccon was historically spoken by the Waccamaw people. Today it is dormant — there are no fluent speakers to learn from, only the scholarship that survives. For someone who wants to learn it, that scholarship is the entire teacher: scattered transcriptions, comparative notes, and reconstructions written for linguists, not learners.

Learning a dormant, low-resource language purely from that material is, for most people, too confusing to attempt. The knowledge exists, but it isn’t accessible. WocconWaker was built to close that gap — to turn raw scholarship into something a person can actually ask questions of.

Why not just train an AI?

The obvious move in 2025 is to point a large language model at the texts and let it answer. For a low-resource language, that fails in the most damaging way possible: there simply aren’t enough data samples to train or ground a model reliably, so it fills the gaps by inventing words, rules, and pronunciations that were never attested. For a community working to revitalize its language, a confident hallucination isn’t a minor bug — it’s a corruption of the record.

So WocconWaker inverts the usual architecture. Instead of trusting the model to know the language, it builds a deterministic database of every known grammar and pronunciation rule alongside a complete vocabulary, and the AI agent is only ever allowed to answer from that database. The result is interactive without being speculative: every answer is traceable to a source.

The agent doesn’t guess at Woccon. It reads from a record that community language workers have verified — so interactivity never comes at the cost of accuracy.

From scholarship to a living database

WocconWaker dictionary: base words with linked variants, modern orthography, IPA pronunciation, and citation counts

The dictionary — base words, linked variants, and citations.

Vocabulary, corrected and modernized

Claude Opus parses the source scholarship to extract vocabulary, which is then corrected for transcription errors, given a modern orthography, and paired with a pronunciation guide. Each base word carries its linked variants from other sources and the citations that attest it — so a learner sees not just the word, but where it comes from and how confident the record is.

WocconWaker language rules organized by grammar area, part of speech, and construction, each tied to a source citation

Grammar and pronunciation rules, organized and sourced.

Rules extracted, then reviewed by community

The same pipeline extracts grammar and pronunciation rules, classified by grammar area, part of speech, and construction. Nothing is trusted automatically: every new entry enters a pending-review queue where community language workers approve or reject it before it’s committed to the database. The control panel — dictionary, rules, pending review, commit, and audit log — is the workspace where that human verification happens.

The most comprehensive Woccon resource available

A living Woccon dictionary already exists through the Living Dictionaries project, but it stops at raw word lists. WocconWaker goes further on every axis that matters to a learner: it corrects transcription errors, adds modern orthographies and pronunciation guides, and — uniquely — includes the grammar and pronunciation rules extracted from scholarship. Paired with its complete vocabulary, that makes it the most comprehensive Woccon resource available today.

197 base words 206 linked variants 403 total entries Grammar & pronunciation rules Modern orthographies Community-reviewed

Learning by conversation

The learner-facing side lives in Facebook Messenger, where the agent answers questions and runs interactive lessons drawn entirely from the verified database.

Because the database holds vocabulary, grammar, and pronunciation, the agent can do more than look words up — it can teach. Ask how to say something and it answers with the attested source. Start a lesson and it generates a vocabulary or grammar quiz on the fly, tracking your score as you go.

A hub for community language workers

WocconWaker is as much an instrument for the people doing the revitalization work as it is a tool for learners. It gives community language workers a single place to centralize and organize their research, record ongoing reconstruction work, and build a unified database of their language’s rules — rather than scattering that knowledge across papers, notes, and spreadsheets.

That unified record is what makes everything else possible. Because the grammar, pronunciation, and vocabulary all live in one structured, verified database, interactive lessons can be generated directly from it — as the vocab and grammar quizzes already demonstrate. The work of organizing the research and the work of teaching it become the same effort.

WocconWaker is developed in consultation with the Waccamaw Indian People of South Carolina and an associated South Carolina Waccamaw working group — of which I am a member — collaborating with Siouan linguist Corey Roberts to revitalize the language.

Built with

The Woccon agent runs Llama 3-8B with retrieval-augmented generation over the verified database, while scholarship is parsed with Claude Opus — the full system designed and built in-house. It runs on Azure today, with a move to local hosting underway on a Lenovo P8 (48 GB VRAM) secured by UIC.

Llama 3-8B + RAG
Claude Opus
Tailwind CSS
Messenger Platform
Flask + Python
Postgres
Azure

What’s next

With a verified Woccon database in place, the next step reaches backward in time. WocconWaker will extend its knowledge into Catawba and Proto-Siouan — the related and ancestral languages that scholarship draws on to reconstruct Woccon. Holding those rule sets in the same deterministic, community-reviewed structure turns WocconWaker from a learning resource into an active instrument for language reconstruction itself.