Internship: Automatic Generation of SOFA Documentation and Interactive Assistance from Source Code and Scientific Literature Using AI

Share this post:

M2 Internship Proposal

Automatic Generation of SOFA Documentation and Interactive Assistance from Source Code and Scientific Literature Using AI

Context and Motivation

The SOFA (Simulation Open Framework Architecture) framework is a powerful open-source platform for real-time physics-based simulation, used in fields such as soft robotics, biomechanics, and medical simulation. Its flexibility and modular architecture allow advanced modeling, but this richness also introduces a steep learning curve for new users and developers.

To facilitate its use, SOFA developers provide documentation, tutorials, and scientific publications. However, these resources are:

  • Distributed across code, documentation websites, and research papers.
  • Difficult to keep synchronized with rapid code evolution.
  • Often too technical or fragmented for newcomers.

As SOFA continues to grow, there is a strong need for automated, up-to-date, and intelligible documentation, as well as interactive tools that help users understand how code, theory, and simulation concepts relate to each other.

Recent advances in Artificial Intelligence, particularly Large Language Models (LLM)s and code understanding systems, open new opportunities to automatically extract knowledge from source code and scientific literature, and to present it in human-readable and interactive forms.

Objectives

The main objective of this internship is to design and prototype an AI-based tool capable of generating structured SOFA documentation from existing sources, including:

  • SOFA C++ source code and scene descriptionsIn the SOFA framework, a scene is a structured description of a simulation setup, including the simulated objects, their physical properties, and the numerical methods used to compute their behavior.
  • Existing SOFA documentation
  • Scientific articles and publications related to SOFA

A secondary but closely related objective is to explore the feasibility of an interactive chatbot that can answer user questions about SOFA by leveraging the same knowledge base.

Scientific and Technical Goals

The internship will focus on the following goals:

  • Automatic Documentation Generation
    • Extract high-level concepts from SOFA source code (components, solvers, etc.)
    • Generate human-readable documentation describing:
      • Component purpose and usage
      • Relationships between modules
      • Links between theoretical models and implementation details
  • Knowledge Integration
    • Align information extracted from code with:
      • Official SOFA documentation
      • Scientific publications explaining underlying physical and numerical models
    • Build a coherent, structured knowledge representation
  • AI-Assisted Question Answering
    • Prototype a chatbot capable of answering questions such as:
      • “What is the role of this SOFA component?”
      • “Which solver is appropriate for this type of simulation?”
      • “How does this implementation relate to the theory described in a given paper?”
  • Exploration of automatic simulation scene generation
    • Investigate the feasibility of automatically generating simple SOFA simulation scenes from existing components, in particular using pre-fabs, in order to illustrate documented concepts and facilitate experimentation

Methodology

The internship will be organized into several phases:

  • State of the Art and Familiarization
    • Study the SOFA framework architecture (scene graph, components, solvers)
    • Review existing SOFA documentation and tutorials
    • Analyze relevant scientific literature related to SOFA and real-time soft-body simulation
    • Survey AI techniques for:
      • Code analysis and summarization
      • Documentation generation
      • Retrieval-augmented generation (RAG)
  • Knowledge Extraction
    • Develop scripts or pipelines to extract:
      • Structural information from SOFA source code (classes, inheritance, templates, dependencies)
      • Semantic information from comments, naming conventions, and documentation
    • Identify links between code elements and scientific concepts
  • AI-Based Documentation Generation
    • Use or fine-tune language models to:
      • Generate structured documentation (markdown, web-ready formats)
      • Produce explanations adapted to different levels (beginner / advanced)
    • Evaluate documentation quality in terms of accuracy, clarity, and usefulness
  • Chatbot Prototype
    • Build a chatbot interface backed by:
      • The generated documentation
      • Code-level knowledge
      • Relevant publications
    • Implement query handling and context retrieval to answer SOFA-related questions
  • Validation and Case Studies
    • Test the tool on selected SOFA modules or plugins
    • Compare AI-generated documentation with existing human-written documentation
    • Gather feedback from SOFA users or developers if possible

Expected Outcomes

By the end of the internship, the following outcomes are expected:

  • A working prototype capable of generating documentation from SOFA code and related texts
  • A proof-of-concept chatbot for assisting SOFA users
  • A structured knowledge base linking: (1) Code, (2) Documentation, and (3) Scientific theory
  • A technical report describing: (1) Methodology, (2) Design choices, and (3) Limitations and future improvements

Candidate Profile

  • Programming: C++, Python
  • AI & NLP: Large language models, embeddings, retrieval-augmented generation
  • Scientific Computing: Numerical simulation, physics-based modeling (desired)

Starting Date and Allowance

The internship start date is flexible and may range from mid-February to early April.

The internship will be compensated according to the standard remuneration in force for M2 internships in French public research institutions (about 600€).

Application

If interested, please send your resume to zeinab.awada@inria.fr.

Find more about SOFA and the DEFROST team at:

Scroll to Top