Introducing GenSIE 2026
An invitation for researchers and developers interested in Agentic AI
If you have built an “Agentic” workflow recently, you know the pain. You prompt a model to perform a complex action—maybe scraping a website, analyzing a contract, or controlling a robot—and you ask it to return the result in a nice, clean JSON format so your Python code can parse it.
And it works. Mostly.
Until it doesn’t. Until the model decides to wrap the JSON in markdown backticks. Or adds a “Here is your data” preamble. Or hallucinates a field that doesn’t exist in your Pydantic model. Or, worse, it just makes up a fact that sounds plausible but isn’t there.
We are entering the era of agentic AI, where models talk to machines, not just humans. And machines speak in protocols. If we want AI that is reliable, robust, and affordable, we cannot rely on massive 100B+ parameter models for every single function call. We need Small Language Models (SLMs) that can speak general-purpose structured data fluently.
That is why, together with my colleagues at the University of Havana and the University of Alicante, we are launching GenSIE: General-purpose Schema-guided Information Extraction at IberLEF 2026.
The Challenge: Zero-Shot Structure
Most information extraction tasks are “fixed.” You train a model to find PERSON, ORG, and DATE. You show it thousands of examples. It learns. GenSIE is different, it is a Zero-Shot Schema task.
At inference time, your system receives a text and a schema it has never seen before. It might be a legal verdict today, a recipe tomorrow, and a chemical compound specification next week. Your system must read the schema definition (provided as a JSON Schema), understand the semantic constraints (like “extract the verdict, but map it to POSITIVE or NEGATIVE“), and generate valid, grounded JSON.
There is a hallucination trap built-in also—sometimes we’ll squeeze a field that can be answered by the model’s original training data, but is not explicitly answered in the input text, so your system has to output null for it.
Oh, and there is a catch: No Fine-Tuning allowed.
Why This Matters (and Why It’s Hard)
We explicitly designed this task to close an existing innovation gap. Huge models like GPT-5 or Gemini 3 Pro can often brute-force this problem through sheer scale. But running a 1TB model just to parse a date is sustainable neither economically nor ecologically.
GenSIE challenges you to use Small, Open-Weights Models (like Llama 3 8B, Qwen 14B, or Salamandra 2B). To make these smaller models perform at a high level, you can’t just throw compute at the problem. You need Inference-Time Engineering:
Constrained Decoding: Forcing the token sampling to obey a grammar.
Chain-of-Thought: Letting the model reason about the schema before outputting the JSON.
Self-Correction: Catching validation errors and asking the model to fix them in a loop.
This is the real challenge. You cannot just throw more compute at the problem and hope it will get solved by an expensive API call. You’ll have to work around these hard technical constraints to achieve something that works as good a Gemini or ChatGPT but runs on commodity hardware.
Join Us
We are building a dataset of 1,000 human-curated examples—rigorously checked to punish hallucinations and reward precision.
The timeline is tight, but the barrier to entry is low. On March 1st we’ll release a first batch of annotated examples and a startup kit with Docker templates and a baseline implementation. Then you have until May 8th to submit a working system. All participants (either solo or in teams) have the option to submit a system paper that will be peer-reviewed and indexed in Scopus.
If you are interested in the future of reliable AI agents, structured generation, or just want to test your engineering skills against a hard benchmark, I invite you to participate. This is particularly interesting for master or PhD students in search of interesting, open research problems.
Check out the official site, and leave me a comment if you want to know more.



Hey, great read. How zero-shot handles unseen schemas?