Plan: All-Files Graph — Every File Is a Node
Context
The publisher (cyber-publish) currently only builds graph nodes from markdown
files in pages/ and journals/. Non-markdown files (.nu, .rs, .py,
.toml, .zip, etc.) are invisible to the graph. The user wants every file
in the repository to be a first-class graph node — connected, searchable,
ranked by PageRank, and rendered as a page.
Scope of Changes
5 files modified, 0 new files.
1. Scanner: walk entire repo (src/publish/src/scanner/mod.rs)
Current behavior: scans only pages/, journals/, media/ subdirectories.
Filters pages by .md/.markdown/no-extension.
Changes:
- Remove the extension filter from the pages/ scan — accept ALL files in
pages/ - Add a fourth scan loop: walk entire
input_dirrecursively, collect files that are NOT insidepages/,journals/, ormedia/ - These files get a new
FileKind::Filevariant DiscoveredFilesgets a new field:files: Vec<DiscoveredFile>- Default exclude patterns updated: add
public/*,target/*,.DS_Store,*.o,*.rmeta,*.rlib,*.dylib,*.d,*.timestamp,*.bin,*.lock,*.cargo-lockto skip build artifacts - Media files in
media/stay asFileKind::Media(still copied to output) — but ALSO get a parallel entry infilesso they become graph nodes too
2. Scanner classify: file name helper (src/publish/src/scanner/classify.rs)
- Add
file_name_from_path(path, base_dir) -> Stringthat returns the path relative to base_dir WITH extension preserved (unlikepage_name_from_pathwhich strips.md) - For files in
pages/that are NOT markdown: use filename with extension (e.g.pages/sw-v2.2.2-macos.zip→sw-v2.2.2-macos.zip) - For files outside
pages/: relative path from input_dir (e.g.nu/analyze.nu→nu/analyze.nu,Makefile→Makefile)
3. Parser: handle non-markdown files (src/publish/src/parser/mod.rs)
- Add
PageKind::Fileenum variant - Update
parse_allto iteratediscovered.filesand call a newparse_non_md_filefunction parse_non_md_file(file: &DiscoveredFile) -> Result<ParsedPage>:- Detect text vs binary: try
read_to_string; if it fails (invalid UTF-8) → binary - For text files:
- Detect language from extension (map:
.rs→rust,.nu→nu,.py→python,.toml→toml,.yml/.yaml→yaml,.js→javascript,.css→css,.json→json,.sh→bash, etc.; no extension → plaintext) content_md= triple-backtick code fence with language tag wrapping the entire file content- Try to extract
[[wikilinks]]from the raw text (reusewikilinks::collect_wikilinks)
- Detect language from extension (map:
- For binary files:
content_md= file metadata block (extension, size in human-readable format)
PageMeta:title= filename (relative path)tags= auto-generated from extension (e.g.["nushell"]for.nu,["rust"]for.rs) + directory-based tag (e.g.["nu"]for files innu/dir)public=Some(true)(all files are public by default in the graph)
kind=PageKind::File
- Detect text vs binary: try
4. Render: template for file pages (src/publish/src/render/mod.rs)
- Add
PageKind::Filematch arm in the template selection — use"page.html"(reuse existing template; the code-fenced content renders fine through comrak)
5. Config: update default excludes (src/publish/src/config.rs)
- Update
ContentSection::default()exclude_patterns to include:[".git/*", "logseq/*", "draws/*", "public/*", "target/*", "*.o", "*.rmeta", "*.rlib", "*.dylib", "*.d", "*.timestamp", "*.bin", "*.lock", "*.cargo-lock", ".DS_Store"]
Files to Modify
| File | Change |
|---|---|
src/publish/src/scanner/mod.rs |
Add FileKind::File, new scan loop, expand pages/ scan |
src/publish/src/scanner/classify.rs |
Add file_name_from_path |
src/publish/src/parser/mod.rs |
Add PageKind::File, parse_non_md_file, language map |
src/publish/src/render/mod.rs |
Add PageKind::File template match |
src/publish/src/config.rs |
Update default exclude patterns |
What Does NOT Change
- Graph module (
graph/) — works withParsedPageregardless of kind - Output module (
output/) — writes HTML for all rendered pages uniformly - Templates —
page.htmlrenders any HTML content - PageRank, backlinks, tag index — all work automatically on new nodes
- Media copying — still copies media/ to output as before
Verification
cd ~/git/cyber && cargo build -p cyber-publish— compilescargo test -p cyber-publish— existing tests pass./target/debug/cyber-publish build .— builds successfully, prints higher page count than before (should include.nu,.rs,.toml, etc.)- Check output:
ls public/nu-analyze.nu/index.htmlexists - Check graph:
cat public/graph-data.json | python3 -c "import sys,json; d=json.load(sys.stdin); print(len(d['nodes']))"— count is higher than before - Open
http://localhost:8080/nu-analyze.nu— shows code-fenced content of analyze.nu