Prototype: LLM as universal semantic interpreter (temperature=0)

t-401·WorkTask·
·
·
Created1 month ago·Updated4 weeks ago

Description

Edit

Explore using an LLM at temperature=0 as a deterministic interpreter for code/pseudocode across languages.

Core Hypothesis

At temperature=0 with structured I/O, an LLM is a deterministic function: eval : Code × Context → Value

Unlike traditional interpreters that parse syntax, the LLM does semantic inference - it understands what code *means* and produces what it *should produce*.

Key Properties

1. Deterministic: temperature=0 → same input, same output 2. Language-agnostic: Python, Haskell, Swift, pseudocode → same semantic space 3. Specification = Implementation: precise description is executable

Prototype Ideas

1. Polyglot REPL

> [1,2,3].map(x => x + 1)     // JavaScript
[2, 3, 4]
> map (+1) [1,2,3]            -- Haskell  
[2, 3, 4]
> [x+1 for x in [1,2,3]]      # Python
[2, 3, 4]

Same semantic operation, any syntax.

2. Pseudocode Interpreter

> sort [3,1,4,1,5] using quicksort
[1, 1, 3, 4, 5]
> find shortest path from A to B in graph G
[A, C, B]

Natural language algorithms, executable.

3. Cross-Language Transform

> parse '{"a": 1}' as JSON, extract 'a', format as XML
<a>1</a>

Chain operations across language semantics.

4. Semantic Diff/Equivalence

> equivalent? 'fold (+) 0 xs' and 'sum(xs)'
true : both compute the sum of xs

Implementation

1. Simple prompt structure:

  • System: 'You are a semantic interpreter. Execute the code/pseudocode exactly. Output only the result value, no explanation.'
  • User: the code
  • Output: the value (structured)

2. Test harness:

  • Run same semantic operation in multiple syntaxes
  • Verify outputs match
  • Measure determinism (run N times, check identical)

3. Compare to ground truth:

  • Run actual Python/Haskell/etc
  • Compare LLM output to real interpreter
  • Track accuracy

Research Questions

1. Determinism: Is temperature=0 actually deterministic across calls? Across sessions?

2. Complexity: What can it 'compute'?

  • Sorting: yes (seen many examples)
  • Novel algorithms: ?
  • Undecidable problems: should fail/loop?

3. State: Can it maintain a REPL environment?

  • x = 5 then x + 1 → 6?
  • Context window as environment?

4. Limits: Where does it break?

  • Large data structures
  • Deep recursion
  • Numerical precision

Success Criteria

  • [ ] Polyglot: same output for equivalent code in 3+ languages
  • [ ] Deterministic: 100 runs → 100 identical outputs
  • [ ] Accurate: matches real interpreter on test suite
  • [ ] Pseudocode: executes English algorithm descriptions

Notes

This reframes the LLM from 'text generator' to 'semantic compute engine'. If it works reliably, it's a new kind of interpreter - one that operates on meaning rather than syntax.

Timeline (0)

No activity yet.