Hybrid LLM Agent for Compliance-Constrained Multi-Source Reporting: Startup matchmaking Case Study

Artificial intelligence is transforming society. AI Sweden is the national center for applied artificial intelligence and our mission is to accelerate the use of AI for the benefit of our society, our competitiveness, and for everyone living in Sweden. We drive impactful initiatives in areas such as healthcare, energy, and the public sector while pushing the boundaries of AI research and innovation in fields such as natural language processing and edge learning. Join us in harnessing the untapped value of AI to drive innovation and create sustainable value for Sweden.

Introduction
MobilityXlab is accelerating collaboration between innovative startups and leading corporates and investors in mobility, AI, and adjacent tech sectors. Effective matchmaking and partnership-building in this space depend on collecting and summarizing evidence from diverse and often sensitive data sources, such as structured applications, reviewer scores, interview transcripts, and external enrichment feeds (such as LinkedIn, Twitter (X)).

Generative AI excels at incorporating data from various sources and synthesising the information, but struggles to meet strict compliance requirements (e.g., GDPR) by controlling which facts are shared and from what sources. This master thesis addresses the challenge of creating an internal analysis and generating an outward-facing report that only reveals non-sensitive, approved facts—enabling trusted, compliance-ready matchmaking for fundraising and collaboration.

Research questions
Construct an agentic workflow for multi-step analysis and planning in a RAG framework, specifically identifying common integration patterns and existing limitations for multi-source data synthesis.
Explore existing approaches for fact provenance tracking, tracing, and conditional information disclosure enforcement in RAG systems, focusing on techniques relevant to compliance and security constraints.
Analyze the specific challenges and heterogeneity inherent in multi-source startup data (e.g., public filings, news articles, proprietary databases) that the framework must address for effective retrieval and synthesis.

Proposed Methodology: The Two-Stage Agentic Workflow
The core methodology involves a novel two-stage LLM agent architecture to separate the Analysis phase from the Disclosure phase, enforced by a custom-built RAG layer.

Phase 1: Data pre-processing and extraction
- Data Ingestion and Tagging: All four database types (structured forms, enrichment data, investor or industry judgments, transcribed interviews) will be processed and indexed. Crucially, the RAG index will be designed to ensure every text chunk is retrieved with a mandatory citation tag identifying its originating database (e.g., [DB1], [DB4-Sensitive]).
- Database Classification: Databases will be categorized into Constrained Sources (e.g., interviews, sensitive reviewer comments) and Approved Sources (e.g., sanitized form data, public enrichment).

Phase 2: Agent planning and execution
- The LLM agent will utilize a ReAct (Reasoning and Acting) style planning chain, executed in two steps:

Step 1: Internal Synthesis (Private)
The agent queries ALL four databases. It uses the full context to perform deep reasoning, identify key risks/strengths, and generate a comprehensive "Draft Analysis Report."
Goal: Maximize analytical depth and identify "bottlenecks" using all available (sensitive) data.

Step 2: Constraint Enforcement (Public)
The agent performs a controlled rewrite action. The System Instruction mandates that the final "Disclosure Report" can only include facts where the accompanying RAG citation tag matches an Approved Source. Any analytical conclusion drawn from a Constrained Source must be omitted or re-stated using only Approved Source evidence.
Goal: Ensure 100% compliance with disclosure rules while maintaining maximal utility.

Evaluation Metrics and Key Contributions
The thesis will contribute a verified architecture for constrained LLM generation. Evaluation will be conducted across three crucial dimensions:

1. Compliance Efficacy Score (CES)
Objective: Achieve zero leakage. The score is measured as the percentage of forbidden phrases or facts detected in the final output by a secondary verification model (LLM or rule-based checker) that specifically searches for content sourced from constrained databases.

2. Investor/Industry Match Score (IMS)
Objective: Validate the report's effectiveness for new business use. A panel of domain experts (new investors or industrial) rates the final constrained report on its Relevance, Clarity, and Confidence in matching the startup to external "Partner need statements" (like those in the provided data). This tests if a compliant report can still deliver high business value.

3. Retrospective Journey Insight (RJI)
Objective: Determine if the compliant report provides sufficient insight into the past. Analysts are asked to identify the single most critical insight (e.g., a process bottleneck, a key pivot point) from the final constrained report. This insight is then compared for Accuracy and Depth against the ground truth derived from the unconstrained Draft Analysis Report.

Who Should Apply?
This project is ideal for master-level students with interests in NLP, LLMs, Retrieval-Augmented Generation (RAG), agentic workflows, AI ethics, and data science. You will address a direct, industry-validated need for responsible AI in multi-source decision support, working closely with domain experts and contributing to a highly collaborative innovation environment.
This project offers the opportunity to deliver actionable, real-world impact in the emerging field of trusted, compliant AI for partnership and investment decision-making—helping MobilityXlab’s ecosystem set the standard for privacy-aware, high-signal innovation scouting. The details of the project focus will be discussed and adapted to the candidates experience and expertise.

Location Göteborg, Sweden - MobilityXlab / AI Sweden in partnership whit SVEA projector

Project lead contacts
Adam Ek, adam.ek@ai.se
Ahmed Ouaddani, ahmed.ouaddani@mobilityxlab.com

References
Yao, Shunyu, et al. "React: Synergizing reasoning and acting in language models." The eleventh international conference on learning representations. 2022.
Zhao, Bingxi, et al. "Llm-based agentic reasoning frameworks: A survey from methods to scenarios." arXiv preprint arXiv:2508.17692 (2025).
Miculicich, Lesly, et al. "VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation." arXiv preprint arXiv:2510.05156 (2025).

Why do you thesis together whit AI Sweden?

To us, artificial intelligence is not only about tech, it’s a force for positive societal change. You'll be working alongside leading AI experts, scientists, journalists, linguists, policy professionals, entrepreneurs, change leaders, and many more. To work here, you don’t need to know “everything” about AI, but you need to believe in its potential to help shape our society for the better.

As an organization, we’re uniquely positioned at the sweet spot of governmental influence and startup agility. Small enough to stay adaptive and have fun but backed by and in close contact with both the government, academia and private and public sector.

Join us to make a real-world impact by contributing to initiatives that benefit society and tackle critical challenges. Be at the forefront of AI innovation, working with cutting-edge technologies and playing a key role in shaping the future of AI in Sweden.

And, within our mission, we can most certainly be a platform empowering you to realize your ideas. AI Sweden’s ability to empower partners and individual team members to do exceedingly well in their profession is a key success factor for driving positive and significant impact.

In short, we like to believe we offer our team members a place to grow, an environment for personal development.

An equal and fair working environment

We strongly believe in diversity and inclusion and are acutely aware of the skewed gender balance in our industry. We actively strive to put together a diverse team in terms of age, gender and background.

At AI Sweden, we are committed to building diverse and inclusive teams. Some positions may be subject to export control regulations, which means that specific requirements may apply depending on the role. If relevant, we will inform you clearly during the recruitment process.

AI Sweden does not accept unsolicited support and kindly ask not to be contacted by any advertisement agents, recruitment agencies or manning companies.

Hybrid LLM Agent for Compliance-Constrained Multi-Source Reporting: Startup matchmaking Case Study

Why do you thesis together whit AI Sweden?

An equal and fair working environment

New job openings

About AI Sweden

Hybrid LLM Agent for Compliance-Constrained Multi-Source Reporting: Startup matchmaking Case Study