t-401 - omni

t-401·WorkTask···

Created1 month ago·Updated4 weeks ago

Description

Explore using an LLM at temperature=0 as a deterministic interpreter for code/pseudocode across languages.

Core Hypothesis

At temperature=0 with structured I/O, an LLM is a deterministic function: eval : Code × Context → Value

Unlike traditional interpreters that parse syntax, the LLM does semantic inference - it understands what code *means* and produces what it *should produce*.

Key Properties

1. Deterministic: temperature=0 → same input, same output 2. Language-agnostic: Python, Haskell, Swift, pseudocode → same semantic space 3. Specification = Implementation: precise description is executable

Prototype Ideas

1. Polyglot REPL

> [1,2,3].map(x => x + 1)     // JavaScript
[2, 3, 4]
> map (+1) [1,2,3]            -- Haskell  
[2, 3, 4]
> [x+1 for x in [1,2,3]]      # Python
[2, 3, 4]

Same semantic operation, any syntax.

2. Pseudocode Interpreter

> sort [3,1,4,1,5] using quicksort
[1, 1, 3, 4, 5]
> find shortest path from A to B in graph G
[A, C, B]

Natural language algorithms, executable.

3. Cross-Language Transform

> parse '{"a": 1}' as JSON, extract 'a', format as XML
<a>1</a>

Chain operations across language semantics.

4. Semantic Diff/Equivalence

> equivalent? 'fold (+) 0 xs' and 'sum(xs)'
true : both compute the sum of xs

Implementation

1. Simple prompt structure:

System: 'You are a semantic interpreter. Execute the code/pseudocode exactly. Output only the result value, no explanation.'
User: the code
Output: the value (structured)

2. Test harness:

Run same semantic operation in multiple syntaxes
Verify outputs match
Measure determinism (run N times, check identical)

3. Compare to ground truth:

Run actual Python/Haskell/etc
Compare LLM output to real interpreter
Track accuracy

Research Questions

1. Determinism: Is temperature=0 actually deterministic across calls? Across sessions?

2. Complexity: What can it 'compute'?

Sorting: yes (seen many examples)
Novel algorithms: ?
Undecidable problems: should fail/loop?

3. State: Can it maintain a REPL environment?

x = 5 then x + 1 → 6?
Context window as environment?

4. Limits: Where does it break?

Large data structures
Deep recursion
Numerical precision

Success Criteria

[ ] Polyglot: same output for equivalent code in 3+ languages
[ ] Deterministic: 100 runs → 100 identical outputs
[ ] Accurate: matches real interpreter on test suite
[ ] Pseudocode: executes English algorithm descriptions

Notes

This reframes the LLM from 'text generator' to 'semantic compute engine'. If it works reliably, it's a new kind of interpreter - one that operates on meaning rather than syntax.

Prototype: LLM as universal semantic interpreter (temperature=0)