Show HN: Local note engine uses LLM to organize notes into a knowledge graph

What is NoteCast

NoteCast is a local engine that enriches raw notes and organizes them into an evolving knowledge graph.

Each note is automatically summarized, keyword-extracted, and embedded before it reaches the organization stage. From there, a three-stage LLM pipeline assigns it to themes, splits those themes into subtopics as they grow and rewires connections across the graph as patterns emerge.

Why Im Building it

I have built it until now because i like to take notes, randomly and for different purposes, but I dislike keeping track of them and also any process other than take the note itself, such as categorize or organize them. Right now the code is not stable, but usable. Might have a lot of bugs and code smells and I am working on it. There are a lot of already planned features that will take this repo closer to the NoteCast I have on paper.

I appreciate feedbacks and also feel free to use it any ways you'd like.

Getting Started

1. Install dependencies

bun link registers the notecast command globally. After this, notecast <command> works from any directory.

This includes keyword-extractor, the TypeScript fallback for topic extraction. Python/YAKE is attempted first, but the TS fallback is automatic — no manual setup required.

2. Configure an LLM provider

Save credentials first, then explicitly set the default provider — the system never auto-picks.

# Save credentials
notecast login openai        # or: anthropic, gemini, deepseek
notecast codex-login         # OAuth via ChatGPT Pro (writes token to ~/.codex/auth.json)

# Set which provider to use (required)
notecast config set defaultProvider openai   # or: anthropic, gemini, deepseek, codex, ollama

Codex credentials are machine-scoped, not project-scoped. codex-login writes a token to ~/.codex/auth.json on the local machine. Any project on the same machine picks it up automatically — no per-project setup needed. On CI or remote servers, you need to run codex login on that machine first.

Ollama needs no credentials — just set it as default directly:

notecast config set defaultProvider ollama

Note: OpenAI is currently the most mature provider — best tested and most reliable across all pipeline stages.

Restart required: activeProvider is read once at startup. Changing defaultProvider via notecast config set takes effect only after restarting the server or CLI process.

defaultProvider applies to all pipeline steps. Use llmConfig to assign different providers per step — see Per-stage LLM config below.

3. Set base themes

notecast config set baseThemes '[{"name": "Tech"}, {"name": "Personal"}, {"name": "Work"}]'

Should exist at least 1 theme before adding notes. A good set of initial base themes creates a better result.

Server mode: notecast start exposes a REST API on port 3000 (configurable via PORT) for remote or programmatic access. The CLI talks directly to the database by default — no server needed for normal use.

Pipeline

Notes move through four statuses. Each transition is handled by a different part of the system.

POST /notes
    │
  pending ──[Stage 1]──► processed ──[Classify]──► scanned ──[Organize]──► organized
                                                                                │
                                                              [Consolidate] ◄──┘
                                                                    │
                                                               organized (refined)

Stage 1 — NoteProcessor

Runs immediately and asynchronously when a note is created. No LLM scan required — this happens in the background for every note.

For each note, it produces:

summary — 2-3 sentence distillation of the content (via LLM)
topics — keywords extracted via YAKE or the TS fallback
contentVector — embedding of title + content + topics, used by the scan pipeline
summaryVector — embedding of title + summary + topics, used for the related-notes graph

If Stage 1 fails (e.g. no LLM provider configured), the note lands in failed. Run notecast retry-failed after fixing the provider.

Classify

Triggered automatically after every N processed notes (default: 10, configurable via classifyEvery).

Groups processed notes by vector similarity before calling the LLM
Assign-only: never creates new themes — only assigns notes to existing ones (base themes + subtopics created by splits)
Each note can be assigned to 1–3 themes when it crosses domains
Advances notes from processed → scanned

This is why base themes must exist before notes arrive — without them, Classify has nothing to assign to.

Organize

Triggered after every N Classify commits (default: 2, configurable via organizeAfterClassifies).

Detects sub-clusters inside large themes using vector similarity
Splits overloaded themes into subtopics (the LLM names them)
The split threshold adapts to depth: root themes split more easily than deep ones
After splits, notes that cross domain boundaries are assigned to multiple themes
Advances notes from scanned → organized

Consolidate

Triggered after every N Organize commits (default: 3, configurable via consolidateAfterOrganizes).

Macro-level restructuring of the entire theme tree
Detects co-occurrence patterns: if ≥50% of a theme's notes also appear in another theme, a parent link is added
Removes empty themes (base themes are protected)
The result is a multi-parent DAG — a theme can have more than one parent when it genuinely belongs to multiple domains

Theme hierarchy

Themes form a directed acyclic graph (DAG), not a tree. A theme like "Cryptography" can have both "Math" and "Software" as parents. These multi-parent connections emerge organically from Consolidate — they are never assigned manually by the pipeline.

The hierarchy grows by subdivision: Classify only assigns, Organize and Consolidate split. Base themes are the permanent anchors; everything else is derived from the data.

CLI

The CLI is the primary interface. It talks directly to the database unless NOTES_URL is set, in which case it proxies to the HTTP server instead.

# Notes
notecast add <file>                             # Create a note from a file (any text format)
notecast add-batch <dir>                        # Create notes from all files in a directory
notecast update <query>                         # Edit a note's source file and reprocess
notecast delete <query>                         # Delete a note by title search
notecast status                                 # Show pipeline status (counts per status)
notecast retry-failed                           # Re-enqueue notes that failed Stage 1

# Scan pipeline
notecast scan propose <classify|organize|consolidate>   # Generate a proposal
notecast scan commit [classify|organize|consolidate]    # Apply proposal (auto-detects if type omitted)

# Themes
notecast theme list
notecast theme add <name> [--parent <name-or-id>] [--desc <text>]
notecast theme update <query> [--name <n>] [--parent <p>] [--desc <d>]
notecast theme merge <source> --into <target>
notecast theme remove <query>

# Manual note ↔ theme assignment
notecast note assign <note-query> --theme <theme-query>
notecast note unassign <note-query> --theme <theme-query>

# Config
notecast config get
notecast config set <key> <value>
notecast config set vaultPath <path>            # Set Obsidian vault output folder

# Providers
notecast providers                              # Show active and available LLM providers
notecast login <provider> [key]                 # Save API key (openai|anthropic|gemini|deepseek)
notecast codex-login                            # OAuth login for ChatGPT Pro (Codex)

# Other
notecast reset [--full]                         # Soft reset (requeue scans) or full reset (delete all notes)
notecast start                                  # Start HTTP server

API

The REST API is available when running notecast start. The CLI is more complete and better tested for real use — the API is provided for remote access and programmatic integrations.

Notes

Method	Endpoint	Description
`POST`	`/notes`	Create a note. Body: `{ title, content }`
`POST`	`/notes/batch`	Create notes in bulk. Body: `[{ title, content }]`
`GET`	`/notes`	List all notes
`GET`	`/notes/:id`	Get note by ID
`PUT`	`/notes/:id`	Edit note (regresses status to `pending`)
`DELETE`	`/notes/:id`	Delete note (bidirectional cleanup)
`POST`	`/notes/retry-failed`	Re-enqueue failed notes

Themes

Method	Endpoint	Description
`GET`	`/themes`	List all themes
`POST`	`/themes`	Create a theme
`PUT`	`/themes/:id`	Update theme (name, parent, description)
`DELETE`	`/themes/:id`	Delete theme
`POST`	`/themes/merge`	Merge one theme into another
`POST`	`/themes/:id/notes/:noteId`	Assign note to theme
`DELETE`	`/themes/:id/notes/:noteId`	Remove note from theme

Scan pipeline

Method	Endpoint	Description
`GET`	`/scan/status`	Counts per status + pending proposals
`POST`	`/scan/classify`	Generate Classify proposal
`POST`	`/scan/classify/commit`	Apply Classify proposal
`POST`	`/scan/organize`	Generate Organize proposal
`POST`	`/scan/organize/commit`	Apply Organize proposal
`POST`	`/scan/consolidate`	Generate Consolidate proposal
`POST`	`/scan/consolidate/commit`	Apply Consolidate proposal
`POST`	`/scan/graph`	Rebuild similarity graph (`relatedNoteIds`)

Config & system

Method	Endpoint	Description
`GET`	`/config`	Read user config
`PUT`	`/config`	Update config
`POST`	`/reset`	Soft or full reset
`GET`	`/providers`	Active and available LLM providers
`GET`	`/calibration`	Pipeline calibration metrics
`GET`	`/health`	Health check

Per-stage LLM config

By default, defaultProvider applies to all pipeline steps. You can override provider, model, temperature, and token limit independently per step via llmConfig.

notecast config set llmConfig '{
  "summary":     { "provider": "ollama", "model": "llama3.2:3b" },
  "classify":    { "provider": "openai",  "model": "gpt-4o-mini", "temperature": 0.2 },
  "organize":    { "provider": "openai",  "model": "gpt-4o" },
  "consolidate": { "provider": "openai",  "model": "gpt-4o" },
  "embedding":   { "provider": "ollama",  "model": "nomic-embed-text" }
}'

Each key is optional — omit a step to inherit defaultProvider. The five configurable steps are:

Step	What it does
`summary`	Generates the 2-3 sentence note summary (Stage 1)
`classify`	Assigns notes to themes
`organize`	Splits themes into subtopics
`consolidate`	Macro restructuring of the theme tree
`embedding`	Produces vectors for all notes (Stage 1)

embedding only accepts provider and model (no temperature or token limit).

Override individual steps (e.g. local embeddings + cloud for scans):

notecast config set llmConfig '{"embedding":{"provider":"ollama","model":"nomic-embed-text"}}'

Optional Dependencies

Python + YAKE (topic extraction)

The service extracts topics from each note during Stage 1. It tries Python/YAKE first and silently falls back to the bundled keyword-extractor (TypeScript) if Python is unavailable. No configuration needed — the fallback is automatic.

YAKE tends to produce cleaner, more linguistically aware keywords. The TS fallback works fine for most use cases.

Ollama (local LLM inference)

Ollama lets you run pipeline stages locally. Install from ollama.com, then configure it as a provider via notecast config set defaultProvider ollama or per-step via llmConfig.

Vault Sync

When vaultPath is set, the service syncs the full state of the database to a folder on the filesystem after every scan commit, note creation, or edit. The output is Obsidian-compatible but works as plain markdown in any editor.

Setup

notecast config set vaultPath ~/my-vault
# or an absolute path
notecast config set vaultPath /Users/you/Documents/my-vault

Use ~ or an absolute path. Relative paths (e.g. ./vault) are resolved against the working directory at sync time, which can vary — avoid them.

What gets written

<vaultPath>/
├── Themes/
│   ├── Tech.md          # one file per theme
│   ├── Software.md
│   └── ...
├── Source/
│   └── ...              # source files (notecast added via `notecast add <file>`)
├── Proposals/
│   └── classify-proposal.json   # saved here when vaultPath is set; otherwise saved as ./proposal-<type>.json in the working directory
└── _Dashboard.md        # pipeline status + theme graph metrics

Theme files contain parent/child wikilinks, a list of assigned notes, and frontmatter tags:

Tag	Meaning
`theme`	all themes
`root`	no parents (top of the DAG)
`leaf`	no children
`multiparent`	belongs to 2+ parent themes
`large`	more than 10 notes
`empty`	no notes assigned

Source files — files added via notecast add <file> are moved to Source/ (the original path is no longer available after the command). The file content is never modified by the pipeline. If vaultLinks is enabled, a related: YAML frontmatter block is injected with wikilinks to semantically similar notes after the graph is built — only that block is touched, the rest of the file is untouched. vaultLinks is false by default; enable it with notecast config set vaultLinks true. Non-markdown files are moved and stored correctly, but Obsidian wikilinks and graph rendering only work with .md files.

_Dashboard.md shows pipeline counts, theme graph depth stats, note coverage, health warnings (empty/large themes), and a delta of themes added or removed in the last sync.

Obsidian setup

The syncer writes .obsidian/graph.json with default color groups on first run (themes in blue, large themes in red, orphans in orange). It never overwrites the file if it already exists, so your customizations are preserved.

Configuration Reference

All config is stored in the database and managed via notecast config set <key> <value> or PUT /config.

Providers

Field	Type	Default	Description
`defaultProvider`	`openai \| anthropic \| gemini \| deepseek \| codex \| ollama`	—	Provider used for all pipeline steps unless overridden
`llmConfig`	object	—	Per-step overrides — see Per-stage LLM config

Pipeline triggers

Field	Type	Default	Description
`pipelineConfig.classifyEvery`	number	`10`	Number of `processed` notes that trigger Classify
`pipelineConfig.organizeAfterClassifies`	number	`2`	Number of Classify commits that trigger Organize
`pipelineConfig.consolidateAfterOrganizes`	number	`3`	Number of Organize commits that trigger Consolidate

Themes

Field	Type	Default	Description
`baseThemes`	`{ name, description? }[]`	`[]`	Anchor themes; Classify only assigns to these (and their subtopics)
`themeStyle`	`single-word \| short-phrase \| descriptive \| custom`	`short-phrase`	Naming style for LLM-generated theme names
`themeStyleInstruction`	string	—	Free-form instruction for theme naming, used when `themeStyle` is `custom`

Content & language

Field	Type	Default	Description
`language`	`english \| portuguese`	`english`	Language for scan prompts and keyword extraction
`context`	string	—	Free-form text injected into every scan prompt — use it to describe your domain, focus areas, or note-taking style

Output

Field	Type	Default	Description
`vaultPath`	string	—	Filesystem path for Obsidian-compatible vault sync
`vaultLinks`	boolean	`false`	Inject `related:` wikilinks into source files after the graph is built

Environment Variables

Variable	Default	Description
`PORT`	`3000`	HTTP server port (`notecast start`)
`NOTES_DB_PATH`	`./notes.db`	SQLite database file path
`LANCEDB_PATH`	`<db>.lancedb`	LanceDB vector store path (defaults to same location as the SQLite file)
`NOTES_URL`	—	When set, the CLI proxies to this remote server instead of accessing the database directly
`OPENAI_API_KEY`	—	OpenAI API key (alternative to `notecast login openai`)
`ANTHROPIC_API_KEY`	—	Anthropic API key
`GEMINI_API_KEY`	—	Gemini API key
`DEEPSEEK_API_KEY`	—	DeepSeek API key

GitHub - AlexWasHeree/NoteCast: Local note engine that uses LLM to build and evolve a knowledge graph

What is NoteCast

Why Im Building it

Getting Started

1. Install dependencies

2. Configure an LLM provider

3. Set base themes

Pipeline

Stage 1 — NoteProcessor

Classify

Organize

Consolidate

Theme hierarchy

CLI

API

Notes

Themes

Scan pipeline

Config & system

Per-stage LLM config

Optional Dependencies

Python + YAKE (topic extraction)

Ollama (local LLM inference)

Vault Sync

Setup

What gets written

Obsidian setup

Configuration Reference

Providers

Pipeline triggers

Themes

Content & language

Output

Environment Variables