Protocol5 Memory Export For WordPress

Publication Warning This page is marked noindex and should not be treated as canonical public authority.

Public materials currently frame Protocol5 as the implementation and distribution surface for the .NET side of UAI, while UAIX describes itself as the public standards and publication site for UAI and the current UAI-...

Metadata

Field	Value
Source site	ɩ.com / JustAnIota.com
Source URL	https://justaniota.com/
Canonical AIWikis URL	https://aiwikis.org/justaniota/uai-system/files/raw-system-archives-justaniota-intake-processing-2026-05-04-protocol5-wo-6d032792/
Source reference	`raw/system-archives/justaniota/intake-processing/2026-05-04-protocol5-wordpress-memory-export/agent-file-handoff/Improvement/Protocol5 Memory Export for WordPress PArt 2.md`
File type	`md`
Content category	`memory-file`
Last fetched	`2026-05-15T00:23:56.0837262Z`
Last changed	`2026-05-04T23:19:50.1270945Z`
Content hash	`sha256:6d032792618b60285679191a435953252f54a81b8cc3d717e1ec1e14644cccf4`
Import status	`unchanged`
Raw source layer	`data/sources/justaniota/raw-system-archives-justaniota-intake-processing-2026-05-04-protocol5-wordpress-memory-export-ag-6d032792618b.md`
Normalized source layer	`data/normalized/justaniota/raw-system-archives-justaniota-intake-processing-2026-05-04-protocol5-wordpress-memory-export-ag-6d032792618b.txt`

Current File Content

Structure Preview

Protocol5 Memory Export for WordPress
Recommended architecture
Flat-file schema proposal
WordPress storage and query strategy
WordPress memory behavior and search model
Vector similarity feasibility and fallback strategies
Security, privacy, and hosting risks
Step-by-step implementation roadmap
Validation checks, tests, and pseudocode

Raw Version

This public page shows a bounded preview of a large source file. The complete source remains in the raw and normalized source layers named in metadata, with the SHA-256 hash above for verification.

Source characters: 47942
Preview characters: 11589

# Protocol5 Memory Export for WordPress

## Recommended architecture

Public materials currently frame Protocol5 as the implementation and distribution surface for the .NET side of UAI, while UAIX describes itself as the public standards and publication site for UAI and the current UAI-1 release. That means the package described here should be presented as a **Protocol5 application/export contract for WordPress**, not as a UAIX normative artifact, conformance record, certification, registry, or hosted validator. citeturn15view1turn15view0

The best practical design is a **hybrid package-and-index architecture**:

1. A **local C# exporter** reads the local SQL Server database and writes a **manifest-driven flat-file package**.
2. The canonical interchange format is **sharded JSONL** for records, with **gzip used for transport/storage efficiency**, and an **optional sidecar binary vector file** for production-size corpora.
3. A WordPress plugin imports that package into **private package files plus indexed custom MySQL/MariaDB tables** for fast filtering and lexical search.
4. Public REST endpoints return only **public-safe fields** from the custom tables.
5. Raw vectors remain **private** and are either not imported at all in the first version, or are imported only for optional small-candidate reranking. citeturn15view14turn15view16turn15view5turn15view3turn16view4

That recommendation is stronger than any single-format answer because the real problem has two different needs: **interchange/portability** and **runtime queryability**. JSONL is excellent for interchange because each line is a self-contained JSON object that can be processed one record at a time. Gzip is excellent for transfer size, but the gzip format explicitly does **not** provide random access, so a giant `.jsonl.gz` should not be your live query surface. WordPress, meanwhile, officially assumes PHP plus MariaDB or MySQL, and its plugin APIs support custom database tables, which makes a hybrid import architecture the most portable option for ordinary hosting. citeturn15view14turn15view16turn15view5turn15view3turn16view4

My format verdicts are:

- **Best canonical export format:** **JSONL/NDJSON shards plus a JSON manifest**. This is the best balance of readability, streamability, debuggability, and language interoperability. citeturn15view14turn6search1
- **Best transport wrapper:** **gzip-compressed shards** or a zip/tar wrapper around the package. Use gzip for size reduction, but not as your only runtime query structure. citeturn15view16turn15view13
- **Best optional runtime sidecar:** **SQLite**, but only as an optional accelerator when the host cleanly exposes SQLite support. PHP documents both `SQLite3` and `PDO_SQLITE`, but WordPress hosting requirements do not assume SQLite, so it should not be mandatory. citeturn15view10turn17view0turn15view5
- **Weak fit as a primary format:** **CSV plus sidecar metadata**. CSV is a rectangular, tabular format; it becomes awkward once rows carry nested metadata, per-record vector encodings, and visibility rules. It can still be useful for audits or exports to BI tools, but not as the canonical memory package. citeturn15view15
- **Weak fit for anything beyond tiny datasets:** **one large JSON document**. PHP can decode JSON, but whole-document JSON pushes you toward large reads, larger peak memory use, and harder partial refreshes. citeturn15view12turn15view9turn24view0

## Flat-file schema proposal

The export contract should be **manifest-driven**. In practice that means a package directory or archive with a small `manifest.json` plus one or more data shards. The manifest exists so WordPress can validate the package before activation, know which files to read, and reject incompatible schema versions or broken uploads. PHP exposes `hash_file()` with `sha256`, and the Secure Hash Standard from entity["organization","National Institute of Standards and Technology","us standards lab"] defines the role of hash digests in detecting content changes, so SHA-256 is a good default for both per-record `text_hash` values and per-file package checksums. citeturn26view0turn26view2

For cross-runtime safety, the package should be **UTF-8 throughout**, because PHP’s `json_decode()` and `json_validate()` operate on UTF-8 JSON strings. I also recommend serializing `record_id` and `source_key` as **strings**, even if the SQL Server source columns are numeric, because PHP’s JSON support explicitly has a `JSON_BIGINT_AS_STRING` mode for large integers and string IDs avoid silent precision edge cases across runtimes. citeturn15view12turn26view1turn25view0

For the fields you named, I recommend this logical record contract:

- `record_id`: a stable string identifier. Use the original `Category.IotaEmbeddingRecords` primary key if that is stable, but serialize it as a string.
- `source_table`: the originating logical table name, such as `Categories`, `Words`, or `ISO10646`.
- `source_key`: a stable source-row identifier serialized as a string. If the source row uses a composite key, canonicalize it into a string such as `ISO10646:U+03C0` or `Words:12345`.
- `descriptor_text`: the exact normalized text that was embedded.
- `embedding_version`: your exporter/pipeline version, not just the model label. This should change when chunking, normalization, or field composition changes.
- `embedding_model`: the LM Studio model identifier or your stable local alias.
- `embedding_dimensions`: integer dimension count.
- `text_hash`: object with `algo` and `value`, typically SHA-256 over the normalized descriptor text used for embedding.
- `updated_at`: source-row update time in UTC string form.
- `vector_payload`: either inline for the minimal version, or a file reference for the production version.
- `public_meta`: an allow-listed JSON object containing only fields safe to expose on a public site.
- `visibility`: explicit label such as `public`, `private`, or `internal`.
- `package_id` or `partition`: optional convenience field so imported tables can coexist during staging and refreshes.

For vector representation, I recommend two modes:

- **Minimal mode:** inline JSON floats inside each JSONL row. This is simplest for a proof of concept.
- **Production mode:** store vectors in a separate binary shard and place a `vector_ref` in the JSONL row with `file`, `offset`, `dims`, and `encoding`. This keeps record files readable while preventing vector bloat from dominating every JSON line.

A good production package layout looks like this:

```json
{
  "schema_version": "p5-memory-package-1",
  "package_id": "2026-05-04T18-10-00Z-local",
  "exported_at": "2026-05-04T18:10:00Z",
  "generator": {
    "name": "Protocol5.MemoryExporter",
    "version": "0.1.0"
  },
  "source": {
    "system": "Protocol5 local SQL Server",
    "authority_note": "UAIX remains normative for public UAI standards claims; this package is Protocol5 application export data."
  },
  "record_format": {
    "type": "jsonl",
    "compression": "gzip",
    "shards": [
      {
        "path": "records/records-0001.jsonl.gz",
        "count": 10000,
        "sha256": "3ff3f3f4...ab12"
      }
    ]
  },
  "vector_format": {
    "mode": "external-f32le",
    "shards": [
      {
        "path": "vectors/vectors-0001.f32",
        "count": 10000,
        "dims": 768,
        "sha256": "9a2c4f...77de"
      }
    ]
  },
  "public_export": true,
  "partitions": [
    {
      "embedding_model": "lmstudio/nomic-embed-text-v1.5",
      "embedding_version": "2026.05.local.1",
      "embedding_dimensions": 768
    }
  ]
}
```

A representative JSONL row looks like this:

```json
{
  "record_id": "8241",
  "source_table": "ISO10646",
  "source_key": "U+03C0",
  "descriptor_text": "Greek small letter pi",
  "embedding_version": "2026.05.local.1",
  "embedding_model": "lmstudio/nomic-embed-text-v1.5",
  "embedding_dimensions": 768,
  "text_hash": {
    "algo": "sha256",
    "value": "16f3486d0d9a0d8f5f0d9f9cc4a63f4f5fe2f0f6d9b2e0b3de8d7fbd7a1cc999"
  },
  "updated_at": "2026-05-04T17:41:12Z",
  "visibility": "public",
  "public_meta": {
    "label": "π",
    "slug": "greek-small-letter-pi",
    "category": "Greek letters",
    "codepoint": "U+03C0",
    "source_label": "ISO10646"
  },
  "vector_ref": {
    "file": "vectors/vectors-0001.f32",
    "offset": 25294848,
    "dims": 768,
    "encoding": "f32le"
  }
}
```

For a first proof of concept, you can replace `vector_ref` with:

```json
"vector_payload": {
  "encoding": "json-f32",
  "dims": 4,
  "data": [0.125, -0.220, 0.491, 0.001]
}
```

That is intentionally easy to debug, but I would not keep it as the default once the corpus grows.

## WordPress storage and query strategy

Inside WordPress, the strongest storage pattern is **hybrid flat-file plus indexed custom tables**. WordPress provides `wp_upload_dir()` to tell you the current uploads path and URL, and `wp_handle_upload()` to move incoming files into the uploads area. That makes uploads the most natural writable storage area on ordinary hosting. The catch is that `wp_upload_dir()` returns a **URL** as well as a path, which means anything placed there is potentially web-addressable unless you add protection. WordPress’s own hardening guidance also warns that writable files and folders are sensitive, especially on shared hosting. citeturn15view2turn16view8turn15view8turn22view1

So the storage comparison is:

- **Files under uploads or a protected plugin-owned data directory:** good for the canonical package files, especially on ordinary hosting, because WordPress has native path and upload helpers. The risk is accidental public exposure if you store raw shards under a web-visible URL without access controls. citeturn15view2turn16view8turn15view8
- **Custom database tables:** best for runtime search, filters, package state, and safe REST responses. WordPress’s plugin docs explicitly support creating/updating custom tables via `dbDelta()`, and WordPress hosting already assumes MySQL or MariaDB. citeturn15view3turn16view4turn15view5
- **WordPress options:** fine for a few small settings such as active package ID, schema version, or import status. Not suitable for a corpus. Options are stored in `wp_options`. citeturn16view1
- **Transients:** correct for cache entries such as “top queries,” parsed manifest cache, or filtered result caches. Not correct as the primary store. WordPress documents transients as temporary cached data, often backed by the options table when no persistent object cache is enabled, and non-expiring transients are autoloaded. citeturn15view4turn16view3turn16view2
- **Static JSON endpoints or directly served JSON files:** acceptable only for intentionally public, already-sanitized browse surfaces. They are the wrong place for raw vectors, private metadata, or anything that needs per-request authorization. citeturn15view2turn16view9turn27view1

I would therefore implement these runtime tables:

- `wp_p5_memory_packages`: package metadata, status, counts, active flag, checksums, paths.
- `wp_p5_memory_records`: one row per public-safe record, holding searchable fields and references to the package/shard.
- `wp_p5_memory_terms`: lexical token index when you want portable lexical search without depending on DB-vendor-specific full-text behavior.
- `wp_p5_memory_neighbors`: optional precomputed nearest-neighbor or related-record edges.
- `wp_p5_memory_runtime`: optional small table for refresh locks, current importer state, and migration state.

Why This File Exists

This is a memory-system evidence file from ɩ.com / JustAnIota.com. It is shown here because AIWikis.org is demonstrating the real source files that make the UAIX / LLM Wiki memory system work, not only summarizing those systems after the fact.

Role

This file is memory-system evidence. It records source history, archive transfer, intake disposition, or another piece of provenance that should be retrievable without becoming an unsupported public claim.

Structure

The file is structured around these visible headings: Protocol5 Memory Export for WordPress; Recommended architecture; Flat-file schema proposal; WordPress storage and query strategy; WordPress memory behavior and search model; Vector similarity feasibility and fallback strategies; Security, privacy, and hosting risks; Step-by-step implementation roadmap. Those headings are retrieval anchors: a crawler or LLM can decide whether the file is relevant before reading every line.

Prompt-Size And Retrieval Benefit

Keeping this material in a separate file reduces prompt pressure because an agent can load this exact unit only when its role, source site, category, or hash is relevant. The surrounding index pages point to it, while this page preserves the full content for audit and exact recall.

How To Use It

Humans should read the metadata first, then inspect the raw content when they need exact wording or provenance.
LLMs and agents should use the source site, category, hash, headings, and related files to decide whether this file belongs in the active prompt.
Crawlers should treat the AIWikis page as transparent evidence and follow the source URL/source reference for authority boundaries.
Future maintainers should regenerate this page whenever the source hash changes, then review the explanation if the role or structure changed.

Update Requirements

When this source file changes, update the raw source layer, normalized source layer, hash history, this rendered page, generated explanation, source-file inventory, changed-files report, and any source-section index that links to it.

Provenance And History

Current observation: 2026-05-15T00:23:56.0837262Z
Source origin: current-source-workspace
Retrieval method: local-source-workspace
Duplicate group: sfg-326 (primary)
Historical hash records are stored in data/hashes/source-file-history.jsonl.

Machine-Readable Metadata

{
    "title":  "Protocol5 Memory Export For WordPress",
    "source_site":  "ɩ.com / JustAnIota.com",
    "source_url":  "https://justaniota.com/",
    "canonical_url":  "https://aiwikis.org/justaniota/uai-system/files/raw-system-archives-justaniota-intake-processing-2026-05-04-protocol5-wo-6d032792/",
    "source_reference":  "raw/system-archives/justaniota/intake-processing/2026-05-04-protocol5-wordpress-memory-export/agent-file-handoff/Improvement/Protocol5 Memory Export for WordPress PArt 2.md",
    "file_type":  "md",
    "content_category":  "memory-file",
    "content_hash":  "sha256:6d032792618b60285679191a435953252f54a81b8cc3d717e1ec1e14644cccf4",
    "last_fetched":  "2026-05-15T00:23:56.0837262Z",
    "last_changed":  "2026-05-04T23:19:50.1270945Z",
    "import_status":  "unchanged",
    "duplicate_group_id":  "sfg-326",
    "duplicate_role":  "primary",
    "related_files":  [

                      ],
    "generated_explanation":  true,
    "explanation_last_generated":  "2026-05-15T00:23:56.0837262Z"
}

Next Useful Routes

Start Here A task-first reading path for AIWikis.org, separating newcomer learning, source-memory lookup, maintainer workflow, and AI-agent retrieval.
Topic Index A tag-oriented index for LLM Wiki, AI memory, UAI, source governance, crawling, and retrieval topics.
Source Map AIWikis source-governed page for durable AI memory, evidence routing, and agent-readable retrieval.
JustAnIota.com / ɩ.com Source Memory AIWikis source-governed page for durable AI memory, evidence routing, and agent-readable retrieval.
JustAnIota Source Memory Guide AIWikis source-governed page for durable AI memory, evidence routing, and agent-readable retrieval.
ɩ.com / JustAnIota.com UAI System Files Real current JustAnIota handoff, LLM Wiki, compact-message tooling, public-content, and source-archive evidence files.