feat(minga-core): alpha-hashing per-language para Python, TS, JS, Go

Cierra el ultimo pendiente fundamentado del CHANGELOG. Cada lenguaje
soportado por minga tiene ahora su propio profile alpha-equivalente
— refactorings tipo "rename variable" no inflan el storage del repo
en ningun dialecto.

Refactor de alpha.rs (639 LOC) a modulo alpha/:
- alpha/common.rs: primitives compartidos (TAG_*, write_kind_and_field,
  emit_*, push_identifier_name). Garantiza wire bit-equivalente.
- alpha/rust.rs: logica Rust movida sin cambios funcionales.
- alpha/python.rs, alpha/ecmascript.rs, alpha/go.rs: nuevos.
- alpha/mod.rs: re-exporta hash_node_alpha (Rust legacy) + expone
  hash_alpha_with(dialect, node) que despacha al profile correcto.

Cobertura per-language:

Python: function_definition, lambda, for_statement, list/set/dict
comprehensions, generator_expression (con scope incremental:
binders del for_in_clause viven en clauses siguientes + body),
with_statement (recursando en as_pattern_target).

ECMAScript (TS+JS): function_declaration, function_expression,
method_definition, generator_function_*, arrow_function (paren y
shorthand), statement_block (con lexical_declaration y
variable_declaration introduciendo binders al resto), for_in_statement
(cubre for-of/for-in), for_statement (initializer C-style),
catch_clause, TS typed/optional parameters.

Go: function_declaration, method_declaration, func_literal (closure),
parameter_declaration con multi-name agrupados, block (con
short_var_declaration), for_statement con range_clause y for_clause,
if_statement con initializer.

Tests: 26 nuevos en alpha_polyglot.rs cubriendo rename invariants +
sanity negatives (function name matters, type matters, operation
matters) por cada lenguaje + cross-language sanity (mismo source en
distintos lenguajes -> hashes distintos).

141 tests verdes en minga-core (115 antes; +26 polyglot). 36 alpha
tests Rust intactos (sin regresion).

Pendientes Minga: minga-vfs (FUSE, proyecto independiente).
Cobertura adicional por-lenguaje (Python class, JS destructuring,
Go type_switch) queda como nice-to-have.
This commit is contained in:
Sergio
2026-05-09 19:06:48 +00:00
parent d1888e0901
commit 6be50c5b73
8 changed files with 1585 additions and 82 deletions
+90
View File
@@ -6,6 +6,96 @@ ratio/diff ver `git show <sha>`.
## 2026-05-09 ## 2026-05-09
### feat(minga-core): α-hashing per-language para Python, TypeScript, JavaScript, Go
Cierra el último pendiente fundamentado del CHANGELOG. Cada lenguaje
soportado por `minga` tiene ahora su propio profile α-equivalente —
dos versiones del mismo programa que difieren sólo en nombres de
variables ligadas producen el mismo hash, no importa el lenguaje.
Refactorings tipo "rename variable" no inflan el storage del repo
en ningún dialecto.
Refactor de `alpha.rs` (639 LOC) a módulo `alpha/`:
- **`alpha/common.rs`**: primitives compartidos (TAG_*, write_kind_and_field,
emit_leaf_marker, emit_binder_body, emit_identifier_ref, push_identifier_name).
Garantiza que el formato wire del hash sea bit-equivalente entre
todos los profiles.
- **`alpha/rust.rs`**: la lógica de Rust (movida desde alpha.rs sin
cambios funcionales).
- **`alpha/python.rs`**: nuevo.
- **`alpha/ecmascript.rs`**: nuevo (cubre TypeScript + JavaScript;
comparten la mayoría de los kinds).
- **`alpha/go.rs`**: nuevo.
- **`alpha/mod.rs`**: re-exporta `hash_node_alpha` (Rust legacy) +
expone `hash_alpha_with(dialect, node)` que despacha al profile
correspondiente.
Cobertura per-language:
**Python** (`def`, `lambda`, `for`, comprehensions, `with`):
- `function_definition` y `lambda`: parámetros (incluyendo
typed_parameter, default_parameter, *args, **kwargs) introducen
binders al body. El nombre de la función NO es α-anónimo.
- `for_statement`: el `left` (identifier o tuple) introduce
binder(es) al body.
- `list_comprehension`, `set_comprehension`, `dictionary_comprehension`,
`generator_expression`: cada `for_in_clause` añade binders que
viven en el body + clauses siguientes (semántica de scope
incremental de Python).
- `with_statement`: `as` introduce binder al body (recursando en
`as_pattern_target` para llegar al identifier).
**ECMAScript** (TS + JS):
- `function_declaration`, `function_expression`, `method_definition`,
`generator_function_*`: parameters → body. Soporta TS
`required_parameter` y `optional_parameter` (`x: number`,
`x?: number`).
- `arrow_function`: tanto `(x, y) => body` como shorthand `x => body`.
- `statement_block`: `lexical_declaration` (let/const) y
`variable_declaration` (var) introducen binders al resto del block.
- `for_in_statement` (cubre `for-of` y `for-in`): `left` → body.
- `for_statement` (C-style): initializer (lexical decl) introduce
binders al condition + increment + body.
- `catch_clause`: parameter → body.
**Go**:
- `function_declaration`, `method_declaration`, `func_literal` (closure):
`parameter_list` → body. `parameter_declaration` con varios names
agrupa varios binders bajo un mismo tipo (`a, b int`).
- `block`: `short_var_declaration` (`x := ...`) introduce binders
al resto.
- `for_statement` con `range_clause` (`for k, v := range m`): los
identifiers del `left` son binders al body.
- `for_statement` con `for_clause` (C-style): initializer → body.
- `if_statement` con `initializer` (`if x := init(); x > 0`):
binders viven en condition + consequence + alternative.
API:
- `hash_alpha_with(Dialect, &SemanticNode) -> ContentHash`
despacho per-dialect.
- `hash_node_alpha(&SemanticNode) -> ContentHash` — alias histórico
asume Rust (back-compat).
Tests: 26 nuevos en `tests/alpha_polyglot.rs`:
- Python (9): def rename, lambda rename, for-loop rename, list comp,
nested comp, with rename, function name matters, iterable name
matters, sanity negativo (operación distinta → hash distinto).
- JS/TS (9): function rename, function name matters, arrow rename,
arrow shorthand rename, let/const rename, for-of rename, classic
for rename, catch rename, TS typed param rename, TS type matters.
- Go (6): function rename, function name matters, short var decl
rename, range_clause rename, if-init rename, func_literal closure
rename.
- Cross-language (1): mismos shapes en lenguajes distintos
producen hashes distintos (sanity para evitar colisiones).
141 tests verdes en minga-core (115 antes; +26 polyglot). Refactor
sin regresión: 36 α-Rust tests siguen pasando.
Pendientes que quedan en Minga (orden de prioridad):
- `minga-vfs` FUSE (proyecto independiente, scope grande).
- Cobertura adicional por-lenguaje: Python class, JS destructuring,
Go type_switch, etc. — cada uno pequeño, no urgente.
### feat(minga-core): cierre del α-hashing de Rust — if let, while let, let-else, or-pattern, let-chains ### feat(minga-core): cierre del α-hashing de Rust — if let, while let, let-else, or-pattern, let-chains
Cierra los 5 pendientes documentados en `alpha.rs`. El hash Cierra los 5 pendientes documentados en `alpha.rs`. El hash
α-equivalente ahora es estable bajo renombre de TODOS los binders α-equivalente ahora es estable bajo renombre de TODOS los binders
@@ -0,0 +1,105 @@
//! Primitives compartidos entre todos los profiles α-hashing.
//!
//! Cada profile per-language (rust, python, ecmascript, go) tiene su
//! propia lógica de "qué nodos introducen binders" y "cómo distinguir
//! binders de constructors". Pero el formato del wire del hash
//! (TAG_LEAF, TAG_BINDER, índice de Bruijn) es universal: lo emitimos
//! desde acá para garantizar que dos lenguajes con la misma
//! estructura semántica produzcan hashes comparables a nivel de bits.
use crate::ast::SemanticNode;
use blake3::Hasher;
pub const TAG_NO_LEAF: u8 = 0;
pub const TAG_LEAF: u8 = 1;
pub const TAG_BINDER: u8 = 2;
pub const TAG_REF_BOUND: u8 = 3;
pub const TAG_REF_FREE: u8 = 4;
/// Emite el kind del nodo + presencia/ausencia de field_name.
pub fn write_kind_and_field(h: &mut Hasher, node: &SemanticNode) {
write_str(h, &node.kind);
match &node.field_name {
Some(f) => {
h.update(&[1]);
write_str(h, f);
}
None => {
h.update(&[0]);
}
}
}
pub fn write_str(h: &mut Hasher, s: &str) {
h.update(&(s.len() as u64).to_le_bytes());
h.update(s.as_bytes());
}
/// Emite el marker de leaf: TAG_LEAF + bytes del leaf si lo hay,
/// TAG_NO_LEAF si no.
pub fn emit_leaf_marker(h: &mut Hasher, node: &SemanticNode) {
match &node.leaf_text {
Some(t) => {
h.update(&[TAG_LEAF]);
h.update(&(t.len() as u64).to_le_bytes());
h.update(t);
}
None => {
h.update(&[TAG_NO_LEAF]);
}
}
}
/// Emite un binder anónimo: el contenido textual NO afecta el hash.
/// Esta es la primitiva de α-equivalencia: dos términos que sólo
/// difieren en nombres de variables ligadas hashean idénticos.
pub fn emit_binder_body(h: &mut Hasher) {
h.update(&[TAG_NO_LEAF]);
h.update(&[TAG_BINDER]);
h.update(&[0u8; 8]);
}
/// Emite el kind del nodo + binder body. Atajo para nodos cuyo único
/// rol es ser binder (e.g. un identifier en posición de pattern).
pub fn emit_binder_node(h: &mut Hasher, node: &SemanticNode) {
write_kind_and_field(h, node);
emit_binder_body(h);
}
/// Emite un identifier referencia: si está en scope, índice de
/// Bruijn (offset desde la cima); si no, nombre literal (variable
/// libre).
pub fn emit_identifier_ref(h: &mut Hasher, node: &SemanticNode, scope: &[String]) {
h.update(&[TAG_NO_LEAF]);
if let Some(t) = &node.leaf_text {
if let Ok(name) = std::str::from_utf8(t) {
if let Some(i) = scope.iter().rposition(|n| n == name) {
let de_bruijn = (scope.len() - 1 - i) as u64;
h.update(&[TAG_REF_BOUND]);
h.update(&de_bruijn.to_le_bytes());
} else {
h.update(&[TAG_REF_FREE]);
h.update(&(t.len() as u64).to_le_bytes());
h.update(t);
}
} else {
h.update(&[TAG_REF_FREE]);
h.update(&(t.len() as u64).to_le_bytes());
h.update(t);
}
} else {
h.update(&[TAG_REF_FREE]);
h.update(&[0u8; 8]);
}
h.update(&[0u8; 8]);
}
/// Push el nombre del identifier al vector de binders, si tiene
/// leaf_text válido. Helper común para todos los `collect_binders`.
pub fn push_identifier_name(node: &SemanticNode, out: &mut Vec<String>) {
if let Some(t) = &node.leaf_text {
if let Ok(s) = std::str::from_utf8(t) {
out.push(s.to_string());
}
}
}
@@ -0,0 +1,365 @@
//! α-hashing per-language para JavaScript / TypeScript.
//!
//! Las dos gramáticas comparten la mayoría de los kinds (TypeScript
//! es JS + type annotations), así que un solo profile las cubre. El
//! caller (`hash_alpha_with`) despacha tanto `Dialect::JavaScript`
//! como `Dialect::TypeScript` acá.
//!
//! Cobertura:
//! - **`function_declaration`**, **`function_expression`**,
//! **`method_definition`**, **`generator_function_declaration`**:
//! parameters introducen binders al body.
//! - **`arrow_function`**: parameters (formal_parameters O identifier
//! directo si es shorthand `x => ...`) introducen binder(es) al body.
//! - **`statement_block`**: cualquier `lexical_declaration` (let/const)
//! o `variable_declaration` (var) dentro del block introduce binders
//! al resto del block.
//! - **`for_in_statement`** (cubre tanto `for (x in obj)` como
//! `for (x of arr)` en tree-sitter-javascript): el `left` es
//! binder al `body`.
//! - **`for_statement`**: el `initializer` (lexical_declaration)
//! introduce binder(es) al `condition`, `increment` y `body`.
//! - **`catch_clause`**: el `parameter` introduce binder al `body`.
//!
//! TypeScript-specific: `type` annotations (`x: number`) viajan como
//! children con field=type que se feedean por el path normal — el
//! tipo afecta el hash (cambiar de `number` a `string` rompe
//! α-equivalencia, intencionalmente).
//!
//! Pendientes (scope acotado):
//! - Destructuring (`const {a, b} = obj`, `const [x, y] = arr`).
//! - Class fields y constructor con `this.x = ...`.
//! - Hoisting de `var` a function scope (hoy se trata como block-scoped).
use crate::alpha::common::{
emit_binder_body, emit_identifier_ref, emit_leaf_marker, push_identifier_name,
write_kind_and_field, TAG_NO_LEAF,
};
use crate::ast::SemanticNode;
use crate::cas::ContentHash;
use blake3::Hasher;
pub fn hash_node_alpha_ecmascript(node: &SemanticNode) -> ContentHash {
let mut h = Hasher::new();
let mut scope: Vec<String> = Vec::new();
feed(&mut h, node, &mut scope);
ContentHash(*h.finalize().as_bytes())
}
fn feed(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
match node.kind.as_str() {
"function_declaration"
| "function_expression"
| "generator_function_declaration"
| "generator_function"
| "method_definition" => feed_callable(h, node, scope),
"arrow_function" => feed_arrow(h, node, scope),
"statement_block" => feed_block(h, node, scope),
"for_in_statement" => feed_for_in(h, node, scope),
"for_statement" => feed_for(h, node, scope),
"catch_clause" => feed_catch(h, node, scope),
// Lexical declarations dispatcheadas también desde feed
// general, no sólo desde feed_block. Necesario para
// for_statement (initializer) y otros contextos donde una
// declaration aparece sin ser hijo directo de un block.
"lexical_declaration" | "variable_declaration" => feed_var_decl(h, node, scope),
"identifier" => emit_identifier_ref(h, node, scope),
_ => feed_default(h, node, scope),
}
}
fn feed_default(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
emit_leaf_marker(h, node);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
feed(h, c, scope);
}
}
/// Callable estándar: parameters → body.
fn feed_callable(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.field_name.as_deref() == Some("parameters") {
collect_formal_param_binders(c, &mut binders);
}
}
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("parameters") => feed_formal_params(h, c, scope),
Some("body") => {
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
feed(h, c, scope);
scope.truncate(scope_before);
}
_ => feed(h, c, scope),
}
}
}
/// Arrow function: dos formas. `x => body` (single identifier) o
/// `(x, y) => body` (formal_parameters). Detectamos cuál.
fn feed_arrow(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
match c.field_name.as_deref() {
Some("parameter") => {
// `x => ...` — el identifier solo.
if c.kind == "identifier" {
push_identifier_name(c, &mut binders);
}
}
Some("parameters") => {
collect_formal_param_binders(c, &mut binders);
}
_ => {}
}
}
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("parameter") => emit_arrow_single_binder(h, c),
Some("parameters") => feed_formal_params(h, c, scope),
Some("body") => {
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
feed(h, c, scope);
scope.truncate(scope_before);
}
_ => feed(h, c, scope),
}
}
}
fn emit_arrow_single_binder(h: &mut Hasher, node: &SemanticNode) {
write_kind_and_field(h, node);
if node.kind == "identifier" {
emit_binder_body(h);
} else {
// Otra forma (rare); fallback al feed normal sin binder.
emit_leaf_marker(h, node);
h.update(&(node.children.len() as u64).to_le_bytes());
}
}
/// Statement block: `let`/`const`/`var` declarations introducen
/// binders al resto del block (lexical scope).
fn feed_block(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let scope_before = scope.len();
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.kind.as_str() {
"lexical_declaration" | "variable_declaration" => {
feed_var_decl(h, c, scope);
collect_var_decl_binders(c, scope);
}
_ => feed(h, c, scope),
}
}
scope.truncate(scope_before);
}
/// Procesa una let/const/var declaration: el `value` se evalúa en el
/// scope previo (los binders aún no existen para sí mismos); el
/// `name` se emite como binder anónimo.
fn feed_var_decl(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
if c.kind == "variable_declarator" {
feed_declarator(h, c, scope);
} else {
feed(h, c, scope);
}
}
}
fn feed_declarator(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("name") if c.kind == "identifier" => emit_named_binder(h, c),
_ => feed(h, c, scope),
}
}
}
fn collect_var_decl_binders(node: &SemanticNode, out: &mut Vec<String>) {
for c in &node.children {
if c.kind == "variable_declarator" {
for cc in &c.children {
if cc.field_name.as_deref() == Some("name") && cc.kind == "identifier" {
push_identifier_name(cc, out);
}
}
}
}
}
/// `for (x of arr)` o `for (x in obj)`. left = identifier (con
/// posible kind=const/let prefix para lexical decl), right = expr,
/// body = block.
fn feed_for_in(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.field_name.as_deref() == Some("left") && c.kind == "identifier" {
push_identifier_name(c, &mut binders);
}
}
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("left") if c.kind == "identifier" => emit_named_binder(h, c),
Some("body") => {
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
feed(h, c, scope);
scope.truncate(scope_before);
}
_ => feed(h, c, scope),
}
}
}
/// `for (let i = 0; i < n; i++) { body }`. El initializer (lexical
/// decl) introduce binders que viven en condition + increment + body.
fn feed_for(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.field_name.as_deref() == Some("initializer")
&& (c.kind == "lexical_declaration" || c.kind == "variable_declaration")
{
collect_var_decl_binders(c, &mut binders);
}
}
let scope_before = scope.len();
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("initializer") => {
feed(h, c, scope);
// Tras procesar el initializer extendemos scope para
// que condition/increment/body lo vean.
scope.extend(binders.iter().cloned());
}
_ => feed(h, c, scope),
}
}
scope.truncate(scope_before);
}
/// `catch (e) { body }`. parameter es identifier → binder al body.
fn feed_catch(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.field_name.as_deref() == Some("parameter") && c.kind == "identifier" {
push_identifier_name(c, &mut binders);
}
}
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("parameter") if c.kind == "identifier" => emit_named_binder(h, c),
Some("body") => {
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
feed(h, c, scope);
scope.truncate(scope_before);
}
_ => feed(h, c, scope),
}
}
}
/// formal_parameters de function declarations. Soporta:
/// - `identifier` (param simple).
/// - `required_parameter` (TypeScript: `x: number`).
/// - `optional_parameter` (TypeScript: `x?: number`).
/// - `rest_pattern` / `rest_parameter` (`...rest`).
fn feed_formal_params(h: &mut Hasher, params: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, params);
h.update(&[TAG_NO_LEAF]);
h.update(&(params.children.len() as u64).to_le_bytes());
for c in &params.children {
match c.kind.as_str() {
"identifier" => emit_named_binder(h, c),
"required_parameter" | "optional_parameter" => {
feed_typed_param(h, c, scope);
}
"rest_pattern" | "rest_parameter" => {
feed_rest_param(h, c, scope);
}
_ => feed(h, c, scope),
}
}
}
fn feed_typed_param(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
let mut named_binder = false;
for c in &node.children {
if !named_binder && c.kind == "identifier" {
emit_named_binder(h, c);
named_binder = true;
} else {
feed(h, c, scope);
}
}
}
fn feed_rest_param(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
if c.kind == "identifier" {
emit_named_binder(h, c);
} else {
feed(h, c, scope);
}
}
}
fn collect_formal_param_binders(params: &SemanticNode, out: &mut Vec<String>) {
for c in &params.children {
match c.kind.as_str() {
"identifier" => push_identifier_name(c, out),
"required_parameter" | "optional_parameter" | "rest_pattern" | "rest_parameter" => {
if let Some(ident) = c.children.iter().find(|cc| cc.kind == "identifier") {
push_identifier_name(ident, out);
}
}
_ => {}
}
}
}
fn emit_named_binder(h: &mut Hasher, node: &SemanticNode) {
write_kind_and_field(h, node);
emit_binder_body(h);
}
@@ -0,0 +1,283 @@
//! α-hashing per-language para Go.
//!
//! Cobertura:
//! - **`function_declaration`**, **`method_declaration`**,
//! **`func_literal`** (closure): `parameter_list` introduce
//! binder(es) al `body`.
//! - **`parameter_declaration`**: puede agrupar varios names con un
//! tipo (`a, b int`). Cada `name` es binder; `type` viaja como
//! referencia.
//! - **`block`**: `short_var_declaration` (`x := ...`) introduce
//! binders al resto del block.
//! - **`for_statement`** con **`range_clause`** (`for k, v := range m`):
//! los identifiers del `left` son binders al `body`.
//! - **`for_statement`** con **`for_clause`** (C-style `for i := 0; i < n; i++`):
//! el `initializer` (short_var_declaration) introduce binders al
//! condition + update + body.
//! - **`if_statement`** con **`initializer`**: binders del
//! short_var_declaration viven en condition + consequence + alternative.
//!
//! Pendientes (scope acotado):
//! - `var_declaration` (`var x = ...`) tratado como literal por
//! ahora; introduce binder al scope envolvente igual que
//! short_var_declaration pero distinto kind.
//! - `type_switch_statement` con assertion binding.
//! - `select` statements con send/receive binding.
use crate::alpha::common::{
emit_binder_body, emit_identifier_ref, emit_leaf_marker, push_identifier_name,
write_kind_and_field, TAG_NO_LEAF,
};
use crate::ast::SemanticNode;
use crate::cas::ContentHash;
use blake3::Hasher;
pub fn hash_node_alpha_go(node: &SemanticNode) -> ContentHash {
let mut h = Hasher::new();
let mut scope: Vec<String> = Vec::new();
feed(&mut h, node, &mut scope);
ContentHash(*h.finalize().as_bytes())
}
fn feed(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
match node.kind.as_str() {
"function_declaration" | "method_declaration" | "func_literal" => {
feed_callable(h, node, scope)
}
"block" => feed_block(h, node, scope),
"for_statement" => feed_for_statement(h, node, scope),
"if_statement" => feed_if_statement(h, node, scope),
// Dispatcheados también fuera de block/for/if para que sus
// identifiers se emitan como binders cuando aparecen en
// contextos como range_clause o initializer de if/for.
"short_var_declaration" => feed_short_var_decl(h, node, scope),
"range_clause" => feed_range_clause(h, node, scope),
"identifier" => emit_identifier_ref(h, node, scope),
_ => feed_default(h, node, scope),
}
}
/// `for k, v := range m` — el `left` (expression_list) tiene
/// identifiers que son binders. El `right` se evalúa como referencia
/// normal (es la fuente de iteración).
fn feed_range_clause(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
if c.field_name.as_deref() == Some("left") {
feed_short_var_left(h, c);
} else {
feed(h, c, scope);
}
}
}
fn feed_default(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
emit_leaf_marker(h, node);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
feed(h, c, scope);
}
}
fn feed_callable(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.field_name.as_deref() == Some("parameters") {
collect_parameter_list_binders(c, &mut binders);
}
}
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("parameters") => feed_parameter_list(h, c, scope),
Some("body") => {
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
feed(h, c, scope);
scope.truncate(scope_before);
}
_ => feed(h, c, scope),
}
}
}
fn feed_parameter_list(h: &mut Hasher, params: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, params);
h.update(&[TAG_NO_LEAF]);
h.update(&(params.children.len() as u64).to_le_bytes());
for c in &params.children {
if c.kind == "parameter_declaration" {
feed_parameter_declaration(h, c, scope);
} else {
feed(h, c, scope);
}
}
}
/// `a, b int` — todos los `name=identifier` son binders; `type`
/// viaja como referencia normal (puede mencionar tipos importados).
fn feed_parameter_declaration(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
if c.field_name.as_deref() == Some("name") && c.kind == "identifier" {
emit_named_binder(h, c);
} else {
feed(h, c, scope);
}
}
}
fn collect_parameter_list_binders(params: &SemanticNode, out: &mut Vec<String>) {
for c in &params.children {
if c.kind == "parameter_declaration" {
for cc in &c.children {
if cc.field_name.as_deref() == Some("name") && cc.kind == "identifier" {
push_identifier_name(cc, out);
}
}
}
}
}
/// Block: `short_var_declaration` introduce binders al resto.
fn feed_block(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let scope_before = scope.len();
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
if c.kind == "short_var_declaration" {
feed_short_var_decl(h, c, scope);
collect_short_var_binders(c, scope);
} else {
feed(h, c, scope);
}
}
scope.truncate(scope_before);
}
fn feed_short_var_decl(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
if c.field_name.as_deref() == Some("left") {
feed_short_var_left(h, c);
} else {
feed(h, c, scope);
}
}
}
fn feed_short_var_left(h: &mut Hasher, node: &SemanticNode) {
write_kind_and_field(h, node);
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
if c.kind == "identifier" {
emit_named_binder(h, c);
} else {
// separadores ',' y otros tokens — emit literal.
emit_leaf_marker(h, c);
h.update(&(c.children.len() as u64).to_le_bytes());
}
}
}
fn collect_short_var_binders(node: &SemanticNode, out: &mut Vec<String>) {
for c in &node.children {
if c.field_name.as_deref() == Some("left") {
for cc in &c.children {
if cc.kind == "identifier" {
push_identifier_name(cc, out);
}
}
}
}
}
/// `for k, v := range m { body }` o `for i := 0; i < n; i++ { body }`.
fn feed_for_statement(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
match c.kind.as_str() {
"range_clause" => {
for cc in &c.children {
if cc.field_name.as_deref() == Some("left") {
for ccc in &cc.children {
if ccc.kind == "identifier" {
push_identifier_name(ccc, &mut binders);
}
}
}
}
}
"for_clause" => {
for cc in &c.children {
if cc.field_name.as_deref() == Some("initializer")
&& cc.kind == "short_var_declaration"
{
collect_short_var_binders(cc, &mut binders);
}
}
}
_ => {}
}
}
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("body") => {
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
feed(h, c, scope);
scope.truncate(scope_before);
}
_ => feed(h, c, scope),
}
}
}
/// `if x := init(); cond { ... } else { ... }`. El initializer
/// introduce binders que viven en condition + consequence +
/// alternative.
fn feed_if_statement(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.field_name.as_deref() == Some("initializer")
&& c.kind == "short_var_declaration"
{
collect_short_var_binders(c, &mut binders);
}
}
let scope_before = scope.len();
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("initializer") => {
feed(h, c, scope);
scope.extend(binders.iter().cloned());
}
_ => feed(h, c, scope),
}
}
scope.truncate(scope_before);
}
fn emit_named_binder(h: &mut Hasher, node: &SemanticNode) {
write_kind_and_field(h, node);
emit_binder_body(h);
}
@@ -0,0 +1,43 @@
//! Hash α-equivalente per-language.
//!
//! Cada dialecto soportado por [`crate::parse`] tiene su propio
//! profile en este módulo. Todos comparten primitives de wire en
//! [`common`] para garantizar comparabilidad bit-a-bit del hash
//! entre lenguajes con la misma estructura semántica.
//!
//! ## API
//!
//! - [`hash_node_alpha`] — alias histórico. Asume Rust. Mantenido
//! por compat con callers viejos (`alpha::hash_node_alpha` sigue
//! apuntando a Rust).
//! - [`hash_alpha_with`] — toma [`crate::parse::Dialect`] y delega
//! al profile correspondiente.
pub mod common;
pub mod ecmascript;
pub mod go;
pub mod python;
pub mod rust;
pub use rust::hash_node_alpha;
use crate::ast::SemanticNode;
use crate::cas::ContentHash;
use crate::parse::Dialect;
/// Calcula el hash α-equivalente de `node` usando el profile del
/// `dialect`. Cada profile entiende los binders propios de su
/// lenguaje (def/lambda/comprehensions en Python, function/arrow en
/// JS/TS, func/range en Go, etc.).
///
/// Para callers que ya saben que están en Rust, [`hash_node_alpha`]
/// es atajo equivalente.
pub fn hash_alpha_with(dialect: Dialect, node: &SemanticNode) -> ContentHash {
match dialect {
Dialect::Rust => rust::hash_node_alpha(node),
Dialect::Python => python::hash_node_alpha_python(node),
Dialect::TypeScript => ecmascript::hash_node_alpha_ecmascript(node),
Dialect::JavaScript => ecmascript::hash_node_alpha_ecmascript(node),
Dialect::Go => go::hash_node_alpha_go(node),
}
}
@@ -0,0 +1,387 @@
//! α-hashing per-language para Python.
//!
//! Cobertura:
//! - **`function_definition`** y **`lambda`**: parámetros introducen
//! binders al body. Soporta defaults (`def f(x=1)`) y type hints
//! (`def f(x: int)`) — el binder es el identifier; el default y el
//! type viajan como expresiones referenciables al scope previo.
//! - **`for_statement`**: el `left` (identifier o tuple_pattern)
//! introduce binder(es) al `body`.
//! - **Comprehensions**: `list_comprehension`, `set_comprehension`,
//! `dictionary_comprehension`, `generator_expression`. Cada
//! `for_in_clause` introduce binder(es) que viven en el `body` +
//! `if_clause`s + `for_in_clause`s siguientes (semántica de scope
//! incremental de Python).
//! - **`with_statement`**: `with X() as y:` introduce `y` al body.
//!
//! Python NO distingue binders por capitalización (a diferencia de
//! Rust con `Some` vs `x`). En posición de parámetro/for-target,
//! todo identifier es binder.
//!
//! Pendientes (no cubiertos hoy, scope acotado):
//! - `class_definition` y métodos (`self` no es binder explícito en
//! la firma; el primer parámetro recibe nombre arbitrario).
//! - `assignment` como introductor de scope (Python no tiene `let`
//! explícito; un `x = 1` agrega x al scope global o local del
//! bloque envolvente — manejarlo bien requiere análisis de scope
//! que va más allá del α-hashing tradicional).
//! - Nested defaults, walrus operator (`:=`), starred patterns.
use crate::alpha::common::{
emit_binder_body, emit_identifier_ref, emit_leaf_marker, push_identifier_name,
write_kind_and_field, TAG_NO_LEAF,
};
use crate::ast::SemanticNode;
use crate::cas::ContentHash;
use blake3::Hasher;
pub fn hash_node_alpha_python(node: &SemanticNode) -> ContentHash {
let mut h = Hasher::new();
let mut scope: Vec<String> = Vec::new();
feed(&mut h, node, &mut scope);
ContentHash(*h.finalize().as_bytes())
}
fn feed(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
match node.kind.as_str() {
"function_definition" => feed_function_definition(h, node, scope),
"lambda" => feed_lambda(h, node, scope),
"for_statement" => feed_for_statement(h, node, scope),
"list_comprehension"
| "set_comprehension"
| "dictionary_comprehension"
| "generator_expression" => feed_comprehension(h, node, scope),
"with_statement" => feed_with_statement(h, node, scope),
// Cuando un as_pattern_target aparece (típicamente dentro de
// un with_clause), sus identifiers son binders. El scope ya
// se extendió en feed_with_statement antes de llegar al body;
// pero el target mismo necesita emitir binders anónimos para
// que el hash no varíe con el nombre.
"as_pattern_target" => feed_target_as_binders(h, node),
"identifier" => emit_identifier_ref(h, node, scope),
_ => feed_default(h, node, scope),
}
}
fn feed_default(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
emit_leaf_marker(h, node);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
feed(h, c, scope);
}
}
/// `def f(x, y=1, z: int): body` → params son binders al body.
/// El `name` (identifier de la función) se trata como literal — no
/// es un binder local (es publicado al scope envolvente, no manejado
/// acá).
fn feed_function_definition(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.field_name.as_deref() == Some("parameters") {
collect_param_binders(c, &mut binders);
}
}
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("parameters") => feed_params(h, c, scope),
Some("body") => {
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
feed(h, c, scope);
scope.truncate(scope_before);
}
Some("name") => {
// Nombre de la función: viaja como literal (afecta el
// hash, no es α-anónimo). Mismo tratamiento que en
// Rust con `function_item.name`.
feed_as_literal(h, c);
}
_ => feed(h, c, scope),
}
}
}
/// `lambda x, y: body` — params binders al body.
fn feed_lambda(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.field_name.as_deref() == Some("parameters") {
collect_param_binders(c, &mut binders);
}
}
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("parameters") => feed_params(h, c, scope),
Some("body") => {
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
feed(h, c, scope);
scope.truncate(scope_before);
}
_ => feed(h, c, scope),
}
}
}
/// `for x in iterable: body` — x es binder al body.
fn feed_for_statement(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.field_name.as_deref() == Some("left") {
collect_target_binders(c, &mut binders);
}
}
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("left") => feed_target_as_binders(h, c),
Some("body") => {
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
feed(h, c, scope);
scope.truncate(scope_before);
}
_ => feed(h, c, scope),
}
}
}
/// `[expr for x in xs if cond]` — los `for_in_clause` y `if_clause`
/// se procesan en orden: cada `for_in_clause` añade binders que
/// viven en lo siguiente. El `body` (la expresión final) ve TODOS
/// los binders acumulados.
fn feed_comprehension(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
// Recolectamos TODOS los binders de TODAS las for_in_clauses.
// Python evalúa la comprehension de izquierda a derecha pero el
// body ve todo; α-hashing colapsa eso a "todos visibles en body".
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.kind == "for_in_clause" {
for cc in &c.children {
if cc.field_name.as_deref() == Some("left") {
collect_target_binders(cc, &mut binders);
}
}
}
}
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
if c.kind == "for_in_clause" {
feed_for_in_clause(h, c, scope);
} else {
feed(h, c, scope);
}
}
scope.truncate(scope_before);
}
/// `for x in xs` dentro de una comprehension. El `left` es binder
/// (anónimo); el `right` se evalúa en el scope previo (sin x).
/// Pero como `feed_comprehension` ya extendió el scope antes de
/// llamarnos, x sí está en scope para el right de un `for X in expr`
/// posterior — semántica correcta de comprehensions de Python.
fn feed_for_in_clause(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
if c.field_name.as_deref() == Some("left") {
feed_target_as_binders(h, c);
} else {
feed(h, c, scope);
}
}
}
/// `with X() as y, Z() as w: body` — los `as` introducen binders al body.
fn feed_with_statement(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]);
let mut binders: Vec<String> = Vec::new();
for c in &node.children {
if c.kind == "with_clause" {
collect_with_clause_binders(c, &mut binders);
}
}
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
match c.field_name.as_deref() {
Some("body") => {
let scope_before = scope.len();
scope.extend(binders.iter().cloned());
feed(h, c, scope);
scope.truncate(scope_before);
}
_ => feed(h, c, scope),
}
}
}
fn collect_with_clause_binders(node: &SemanticNode, out: &mut Vec<String>) {
// En tree-sitter-python, with_item.value puede ser un as_pattern
// que tiene su propio alias. Recursamos para encontrar cualquier
// as_pattern_target en el subárbol.
for c in &node.children {
if c.kind == "with_item" {
collect_as_pattern_targets(c, out);
}
}
}
fn collect_as_pattern_targets(node: &SemanticNode, out: &mut Vec<String>) {
if node.kind == "as_pattern_target" {
collect_target_binders(node, out);
return;
}
for c in &node.children {
collect_as_pattern_targets(c, out);
}
}
/// Los parameters de def/lambda se procesan emitiendo cada
/// identifier como binder anónimo. Defaults / type hints / *args /
/// **kwargs se preservan literalmente (afectan el hash).
fn feed_params(h: &mut Hasher, params: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, params);
h.update(&[TAG_NO_LEAF]);
h.update(&(params.children.len() as u64).to_le_bytes());
for c in &params.children {
match c.kind.as_str() {
"identifier" => emit_param_binder(h, c),
"typed_parameter" | "default_parameter" | "typed_default_parameter" => {
feed_complex_param(h, c, scope);
}
"list_splat_pattern" | "dictionary_splat_pattern" => {
// *args, **kwargs: el binder es el identifier interno.
feed_splat_param(h, c);
}
_ => feed(h, c, scope),
}
}
}
fn emit_param_binder(h: &mut Hasher, ident: &SemanticNode) {
write_kind_and_field(h, ident);
emit_binder_body(h);
}
/// `x: int`, `x = 1`, `x: int = 1` — el primer identifier es binder;
/// el resto (type, default) son referenciables.
fn feed_complex_param(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
write_kind_and_field(h, node);
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
let mut named_binder = false;
for c in &node.children {
if !named_binder && c.kind == "identifier" {
emit_param_binder(h, c);
named_binder = true;
} else {
feed(h, c, scope);
}
}
}
fn feed_splat_param(h: &mut Hasher, node: &SemanticNode) {
write_kind_and_field(h, node);
h.update(&[TAG_NO_LEAF]);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
if c.kind == "identifier" {
emit_param_binder(h, c);
} else {
feed_as_literal(h, c);
}
}
}
fn collect_param_binders(params: &SemanticNode, out: &mut Vec<String>) {
for c in &params.children {
match c.kind.as_str() {
"identifier" => push_identifier_name(c, out),
"typed_parameter" | "default_parameter" | "typed_default_parameter" => {
if let Some(ident) = c.children.iter().find(|cc| cc.kind == "identifier") {
push_identifier_name(ident, out);
}
}
"list_splat_pattern" | "dictionary_splat_pattern" => {
if let Some(ident) = c.children.iter().find(|cc| cc.kind == "identifier") {
push_identifier_name(ident, out);
}
}
_ => {}
}
}
}
/// El `left` de `for x in xs:` o de `with X as y:` puede ser un
/// identifier solo o una tupla destructurada (`for k, v in ...`).
fn collect_target_binders(target: &SemanticNode, out: &mut Vec<String>) {
match target.kind.as_str() {
"identifier" => push_identifier_name(target, out),
"tuple_pattern" | "pattern_list" | "list_pattern" => {
for c in &target.children {
collect_target_binders(c, out);
}
}
_ => {
// Recursamos por si hay subnodos relevantes (e.g. parens).
for c in &target.children {
collect_target_binders(c, out);
}
}
}
}
/// Emit del target como binders anónimos. Mismo recorrido que collect.
fn feed_target_as_binders(h: &mut Hasher, target: &SemanticNode) {
write_kind_and_field(h, target);
match target.kind.as_str() {
"identifier" => emit_binder_body(h),
"tuple_pattern" | "pattern_list" | "list_pattern" => {
h.update(&[TAG_NO_LEAF]);
h.update(&(target.children.len() as u64).to_le_bytes());
for c in &target.children {
feed_target_as_binders(h, c);
}
}
_ => {
// Fallback: literal (preserva la estructura textual).
emit_leaf_marker(h, target);
h.update(&(target.children.len() as u64).to_le_bytes());
for c in &target.children {
feed_target_as_binders(h, c);
}
}
}
}
fn feed_as_literal(h: &mut Hasher, node: &SemanticNode) {
write_kind_and_field(h, node);
emit_leaf_marker(h, node);
h.update(&(node.children.len() as u64).to_le_bytes());
for c in &node.children {
feed_as_literal(h, c);
}
}
@@ -42,16 +42,14 @@
//! enforcement); recolectamos sólo del primer alternativo para //! enforcement); recolectamos sólo del primer alternativo para
//! evitar duplicados, emitimos feed_pattern para cada uno. //! evitar duplicados, emitimos feed_pattern para cada uno.
use crate::alpha::common::{
emit_binder_body, emit_binder_node, emit_identifier_ref, emit_leaf_marker,
push_identifier_name, write_kind_and_field, TAG_NO_LEAF,
};
use crate::ast::SemanticNode; use crate::ast::SemanticNode;
use crate::cas::ContentHash; use crate::cas::ContentHash;
use blake3::Hasher; use blake3::Hasher;
const TAG_NO_LEAF: u8 = 0;
const TAG_LEAF: u8 = 1;
const TAG_BINDER: u8 = 2;
const TAG_REF_BOUND: u8 = 3;
const TAG_REF_FREE: u8 = 4;
pub fn hash_node_alpha(node: &SemanticNode) -> ContentHash { pub fn hash_node_alpha(node: &SemanticNode) -> ContentHash {
let mut h = Hasher::new(); let mut h = Hasher::new();
let mut scope: Vec<String> = Vec::new(); let mut scope: Vec<String> = Vec::new();
@@ -171,55 +169,6 @@ fn feed_default(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
} }
} }
fn emit_identifier_ref(h: &mut Hasher, node: &SemanticNode, scope: &Vec<String>) {
h.update(&[TAG_NO_LEAF]);
if let Some(t) = &node.leaf_text {
if let Ok(name) = std::str::from_utf8(t) {
if let Some(i) = scope.iter().rposition(|n| n == name) {
let de_bruijn = (scope.len() - 1 - i) as u64;
h.update(&[TAG_REF_BOUND]);
h.update(&de_bruijn.to_le_bytes());
} else {
h.update(&[TAG_REF_FREE]);
h.update(&(t.len() as u64).to_le_bytes());
h.update(t);
}
} else {
h.update(&[TAG_REF_FREE]);
h.update(&(t.len() as u64).to_le_bytes());
h.update(t);
}
} else {
h.update(&[TAG_REF_FREE]);
h.update(&[0u8; 8]);
}
h.update(&[0u8; 8]);
}
fn emit_binder_body(h: &mut Hasher) {
h.update(&[TAG_NO_LEAF]);
h.update(&[TAG_BINDER]);
h.update(&[0u8; 8]);
}
fn emit_binder_node(h: &mut Hasher, node: &SemanticNode) {
write_kind_and_field(h, node);
emit_binder_body(h);
}
fn emit_leaf_marker(h: &mut Hasher, node: &SemanticNode) {
match &node.leaf_text {
Some(t) => {
h.update(&[TAG_LEAF]);
h.update(&(t.len() as u64).to_le_bytes());
h.update(t);
}
None => {
h.update(&[TAG_NO_LEAF]);
}
}
}
fn feed_callable(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) { fn feed_callable(h: &mut Hasher, node: &SemanticNode, scope: &mut Vec<String>) {
h.update(&[TAG_NO_LEAF]); h.update(&[TAG_NO_LEAF]);
@@ -585,16 +534,8 @@ fn collect_field_pattern_binders(fp: &SemanticNode, out: &mut Vec<String>) {
} }
} }
fn push_identifier_name(node: &SemanticNode, out: &mut Vec<String>) {
if let Some(t) = &node.leaf_text {
if let Ok(s) = std::str::from_utf8(t) {
out.push(s.to_string());
}
}
}
/// Determina si un `identifier` en posición de patrón se interpreta como /// Determina si un `identifier` en posición de patrón se interpreta como
/// binder. Reglas: /// binder. Reglas (específicas de Rust):
/// - Si tiene `field_name == "pattern"` (parámetros, lets), siempre es binder. /// - Si tiene `field_name == "pattern"` (parámetros, lets), siempre es binder.
/// - Si su nombre comienza con minúscula, es binder. /// - Si su nombre comienza con minúscula, es binder.
/// - Si comienza con `_` seguido de letra/dígito, es binder (convención /// - Si comienza con `_` seguido de letra/dígito, es binder (convención
@@ -619,21 +560,3 @@ fn is_binder_name(s: &str) -> bool {
None => false, None => false,
} }
} }
fn write_kind_and_field(h: &mut Hasher, node: &SemanticNode) {
write_str(h, &node.kind);
match &node.field_name {
Some(f) => {
h.update(&[1]);
write_str(h, f);
}
None => {
h.update(&[0]);
}
}
}
fn write_str(h: &mut Hasher, s: &str) {
h.update(&(s.len() as u64).to_le_bytes());
h.update(s.as_bytes());
}
@@ -0,0 +1,307 @@
//! α-equivalencia para Python, TypeScript, JavaScript, Go.
//!
//! Mismas propiedades que `alpha_invariants.rs` para Rust:
//! - Renombre de variables ligadas → mismo hash.
//! - Cambio de estructura / nombres libres → hash distinto.
use minga_core::alpha::hash_alpha_with;
use minga_core::parse::Dialect;
fn h(d: Dialect, src: &str) -> minga_core::cas::ContentHash {
let n = d.parse(src).expect("parse OK");
hash_alpha_with(d, &n)
}
// ============================================================================
// Python
// ============================================================================
#[test]
fn python_def_param_rename_invariant() {
let a = h(Dialect::Python, "def f(x):\n return x + 1\n");
let b = h(Dialect::Python, "def f(y):\n return y + 1\n");
assert_eq!(a, b);
}
#[test]
fn python_def_function_name_matters() {
let a = h(Dialect::Python, "def f(x):\n return x\n");
let b = h(Dialect::Python, "def g(x):\n return x\n");
assert_ne!(a, b, "el nombre de la función NO es α-anónimo");
}
#[test]
fn python_lambda_rename_invariant() {
let a = h(Dialect::Python, "f = lambda x: x + 1\n");
let b = h(Dialect::Python, "f = lambda y: y + 1\n");
assert_eq!(a, b);
}
#[test]
fn python_for_loop_rename_invariant() {
let a = h(
Dialect::Python,
"for x in xs:\n print(x)\n",
);
let b = h(
Dialect::Python,
"for y in xs:\n print(y)\n",
);
assert_eq!(a, b);
}
#[test]
fn python_for_iterable_name_matters() {
let a = h(
Dialect::Python,
"for x in xs:\n print(x)\n",
);
let b = h(
Dialect::Python,
"for x in ys:\n print(x)\n",
);
assert_ne!(a, b, "el iterable es variable libre, su nombre importa");
}
#[test]
fn python_list_comprehension_rename_invariant() {
let a = h(Dialect::Python, "result = [x*2 for x in xs]\n");
let b = h(Dialect::Python, "result = [y*2 for y in xs]\n");
assert_eq!(a, b);
}
#[test]
fn python_nested_comprehension_rename_invariant() {
// Doble for_in_clause: x e y son binders.
let a = h(
Dialect::Python,
"result = [(x, y) for x in xs for y in ys]\n",
);
let b = h(
Dialect::Python,
"result = [(a, b) for a in xs for b in ys]\n",
);
assert_eq!(a, b);
}
#[test]
fn python_with_statement_rename_invariant() {
let a = h(
Dialect::Python,
"with open(p) as f:\n f.read()\n",
);
let b = h(
Dialect::Python,
"with open(p) as g:\n g.read()\n",
);
assert_eq!(a, b);
}
#[test]
fn python_lambda_does_not_collide_with_unrelated() {
let plus = h(Dialect::Python, "f = lambda x: x + 1\n");
let minus = h(Dialect::Python, "f = lambda x: x - 1\n");
assert_ne!(plus, minus, "operación distinta debe dar hash distinto");
}
// ============================================================================
// JavaScript / TypeScript (mismo profile)
// ============================================================================
#[test]
fn js_function_rename_invariant() {
let a = h(Dialect::JavaScript, "function f(x) { return x + 1; }");
let b = h(Dialect::JavaScript, "function f(y) { return y + 1; }");
assert_eq!(a, b);
}
#[test]
fn js_function_name_matters() {
let a = h(Dialect::JavaScript, "function f(x) { return x; }");
let b = h(Dialect::JavaScript, "function g(x) { return x; }");
assert_ne!(a, b);
}
#[test]
fn js_arrow_function_rename_invariant() {
let a = h(Dialect::JavaScript, "const f = (x) => x + 1;");
let b = h(Dialect::JavaScript, "const f = (y) => y + 1;");
assert_eq!(a, b);
}
#[test]
fn js_arrow_shorthand_rename_invariant() {
// `x => ...` (sin paréntesis) — single identifier.
let a = h(Dialect::JavaScript, "const f = x => x + 1;");
let b = h(Dialect::JavaScript, "const f = y => y + 1;");
assert_eq!(a, b);
}
#[test]
fn js_let_const_rename_invariant() {
let a = h(Dialect::JavaScript, "function f() { const x = 1; return x + 2; }");
let b = h(Dialect::JavaScript, "function f() { const y = 1; return y + 2; }");
assert_eq!(a, b);
}
#[test]
fn js_for_of_rename_invariant() {
let a = h(
Dialect::JavaScript,
"function f() { for (const x of xs) { use(x); } }",
);
let b = h(
Dialect::JavaScript,
"function f() { for (const y of xs) { use(y); } }",
);
assert_eq!(a, b);
}
#[test]
fn js_for_classic_rename_invariant() {
let a = h(
Dialect::JavaScript,
"function f() { for (let i = 0; i < n; i++) { use(i); } }",
);
let b = h(
Dialect::JavaScript,
"function f() { for (let j = 0; j < n; j++) { use(j); } }",
);
assert_eq!(a, b);
}
#[test]
fn js_catch_rename_invariant() {
let a = h(
Dialect::JavaScript,
"function f() { try { x(); } catch (e) { log(e); } }",
);
let b = h(
Dialect::JavaScript,
"function f() { try { x(); } catch (err) { log(err); } }",
);
assert_eq!(a, b);
}
#[test]
fn ts_typed_param_rename_invariant() {
// El TIPO afecta el hash, pero el nombre del parámetro no.
let a = h(
Dialect::TypeScript,
"function f(x: number): number { return x + 1; }",
);
let b = h(
Dialect::TypeScript,
"function f(y: number): number { return y + 1; }",
);
assert_eq!(a, b);
}
#[test]
fn ts_typed_param_type_matters() {
let int_v = h(
Dialect::TypeScript,
"function f(x: number): number { return x; }",
);
let str_v = h(
Dialect::TypeScript,
"function f(x: string): string { return x; }",
);
assert_ne!(int_v, str_v, "el tipo afecta semántica");
}
// ============================================================================
// Go
// ============================================================================
#[test]
fn go_function_rename_invariant() {
let a = h(
Dialect::Go,
"package main\nfunc add(a, b int) int { return a + b }\n",
);
let b = h(
Dialect::Go,
"package main\nfunc add(x, y int) int { return x + y }\n",
);
assert_eq!(a, b);
}
#[test]
fn go_function_name_matters() {
let a = h(
Dialect::Go,
"package main\nfunc add(a, b int) int { return a + b }\n",
);
let b = h(
Dialect::Go,
"package main\nfunc sub(a, b int) int { return a + b }\n",
);
assert_ne!(a, b);
}
#[test]
fn go_short_var_decl_rename_invariant() {
let a = h(
Dialect::Go,
"package main\nfunc main() { x := compute(); use(x) }\n",
);
let b = h(
Dialect::Go,
"package main\nfunc main() { y := compute(); use(y) }\n",
);
assert_eq!(a, b);
}
#[test]
fn go_range_clause_rename_invariant() {
let a = h(
Dialect::Go,
"package main\nfunc main() { for k, v := range m { use(k, v) } }\n",
);
let b = h(
Dialect::Go,
"package main\nfunc main() { for x, y := range m { use(x, y) } }\n",
);
assert_eq!(a, b);
}
#[test]
fn go_if_init_rename_invariant() {
let a = h(
Dialect::Go,
"package main\nfunc main() { if x := lookup(); x > 0 { use(x) } }\n",
);
let b = h(
Dialect::Go,
"package main\nfunc main() { if y := lookup(); y > 0 { use(y) } }\n",
);
assert_eq!(a, b);
}
#[test]
fn go_func_literal_closure_rename_invariant() {
let a = h(
Dialect::Go,
"package main\nvar f = func(x int) int { return x + 1 }\n",
);
let b = h(
Dialect::Go,
"package main\nvar f = func(y int) int { return y + 1 }\n",
);
assert_eq!(a, b);
}
// ============================================================================
// Cross-language sanity
// ============================================================================
#[test]
fn structurally_similar_programs_in_different_languages_have_distinct_hashes() {
// `def f(x): return x+1` en Python vs `function f(x){return x+1}` en JS.
// Mismo "shape" en idea pero distintas gramáticas → distintos kinds →
// distintos hashes. Importante para evitar colisiones cross-language.
let py = h(Dialect::Python, "def f(x):\n return x + 1\n");
let js = h(Dialect::JavaScript, "function f(x) { return x + 1; }");
assert_ne!(py, js);
}