GSoC 2026 | Shailesh Kumar | XML Error Context (#751)

Hi everyone,

I am Shailesh Kumar, a Third -year CSE student from India.

I am interested in working on the GSoC 2026 project:
“Improving XML error messages with location context” (#751).

I have carefully read the full issue discussion and explored the existing codebase, especially ConfigParser.cpp.

My current understanding:

  • Location should be extracted using xmlSAX2GetLineNumber() and xmlSAX2GetColumnNumber()
  • parserContext is only valid during xmlParseChunk()
  • The main challenge is propagating location information to configuration handlers

Based on the mentor discussion, I am focusing on making this easy to use in configuration code, for example:

  • tag.location()
  • tag.attributeLocation(ATTR_NAME)

I will start with this approach and refine it based on maintainers’ feedback.

I have also started engaging with the issue and exploring possible design approaches.

I would really appreciate any feedback on my understanding and approach.

Thanks!
Shailesh Kumar

1 Like

Update on my technical exploration:

After diving deeper into the codebase, I want to share my refined understanding.

What I found in ConfigParser.cpp:

The xmlParserCtxtPtr (_context) is stored as a member of ConfigParser. During SAX callbacks like startElement(), calling xmlGetLineNumber(_context) returns a valid line number. The issue is that this context pointer never reaches the XMLTag handlers — so by the time an error is thrown in configuration code, the location is already gone.

My proposed fix — capture location at parse time:

struct XMLLocation {
  int line   = 0;
  int column = 0;
  std::string file;
};

Inside ConfigParser::startElement(), right when the tag is identified, store the location into the tag before passing it to handlers:

tag.setLocation(
  xmlGetLineNumber(_context),
  xmlGetColumnNo(_context),
  _filename
);

Then error messages become:

auto loc = tag.location();
PRECICE_ERROR("Unknown tag <{}> at {}:{}",
              tag.getName(), loc.line, loc.column);
```

**Open questions I want to verify with maintainers:**

- Is `xmlGetColumnNo` reliable across all libxml2 versions preCICE supports?
- Should location be stored per-attribute separately, or is tag-level location enough?

Would love feedback before I start prototyping.

``
1 Like

Update on my technical exploration:

After diving deeper into the codebase, I want to share my refined understanding.

What I found in ConfigParser.cpp:

The xmlParserCtxtPtr (_context) is stored as a member of ConfigParser. During SAX callbacks like startElement(), calling xmlGetLineNumber(_context) returns a valid line number. The issue is that this context pointer never reaches the XMLTag handlers — so by the time an error is thrown in configuration code, the location is already gone.

My proposed fix — capture location at parse time:

cpp

struct XMLLocation {
  int line   = 0;
  int column = 0;
  std::string file;
};

Inside ConfigParser::startElement(), right when the tag is identified, store the location into the tag before passing it to handlers:

cpp

tag.setLocation(
  xmlGetLineNumber(_context),
  xmlGetColumnNo(_context),
  _filename
);

Then error messages become:

cpp

auto loc = tag.location();
PRECICE_ERROR("Unknown tag <{}> at {}:{}",
              tag.getName(), loc.line, loc.column);
```

**Open questions I want to verify with maintainers:**

- Is `xmlGetColumnNo` reliable across all libxml2 versions preCICE supports?
- Should location be stored per-attribute separately, or is tag-level location enough?

Would love feedback before I start prototyping.

---

**Paste karne ke baad yeh check karo:**

Discourse mein **Preview** tab hoga right side pe — wahan code blocks green/grey background mein dikhne chahiye aisi:
```
struct XMLLocation {

Update — Initial Implementation Pushed to Fork

I have pushed an initial implementation to my fork: https://github.com/SKM2227229725/precice/tree/feature/xml-error-context

Changes so far:

  • Added m_Line, m_Column, m_File fields to CTag struct

  • Stored _parserContext as member of ConfigParser

  • Captured location in OnStartElement() while context is valid

  • Saved file content as _fileLines for context display

Next step: Propagate from CTagXMLTag to expose clean APIs:

cpp

tag.location()
tag.attributeLocation(ATTR_NAME)
```

So error messages look like:
```
ERROR: Data was defined with an empty name.
2:   <data:scalar name="" />
                  ^^^^^^^

Update — XMLTag Propagation In Progress

Following up on my last post — I’ve started propagating location data from CTagXMLTag so the clean API is now taking shape.

Current status:

  • CTag now carries m_Line, m_Column, and m_File fields, populated inside OnStartElement() while the parser context is still valid
  • Working on exposing tag.location() and tag.attributeLocation(ATTR_NAME) on the XMLTag side
  • File lines are cached in _fileLines so we can render the context snippet at error time

Target error output:

ERROR: Data was defined with an empty name.
2:   <data:scalar name="" />
                  ^^^^^^^

Still open — would appreciate maintainer input:

  1. Is xmlGetColumnNo reliable across all libxml2 versions preCICE currently supports?
  2. Is tag-level location granularity sufficient, or should individual attributes track their own location?

I’ll keep the fork updated as I push further changes. Happy to discuss design decisions before anything gets too deep — feedback welcome at any stage.

Fork: GitHub - SKM2227229725/precice at feature/xml-error-context · GitHub

Update — tag.location() API working on fork

Quick progress update:

CTag → XMLTag propagation is now complete on my fork.
tag.location() and tag.attributeLocation(ATTR_NAME)
are now accessible in configuration handlers.

One correction from my earlier post — I mistakenly
wrote xmlGetLineNumber(). The correct API used in
fork is xmlSAX2GetLineNumber(_parserCtxt) as
specified in issue #751.

Fork: GitHub - SKM2227229725/precice at feature/xml-error-context · GitHub

Would love feedback from maintainers before
going deeper.

Update — Attribute-Level Location Now Wired, Snippet Rendering In Progress

Following up on Post 4 — I’ve moved forward on both open questions while waiting for maintainer input.

What’s done on the fork:

  • tag.attributeLocation(ATTR_NAME) is now wired end-to-end — calling it inside DataConfiguration.cpp at line 92 (mentor’s exact example) returns correct file/line/column
  • xmlSAX2GetColumnNumber(_parserCtxt) is being used (not xmlGetColumnNo) — so the libxml2 version concern from my last post is addressed
  • _fileLines cache is populated at parse time — snippet extraction is next

Current target output (working toward):

ERROR: Data was defined with an empty name.
2:   <data:scalar name="" />
                  ^^^^^^^

Still open — would appreciate maintainer input before I go further:

  • Should PRECICE_CONFIG_CHECK live in assertion.hpp alongside PRECICE_CHECK, or in a separate config-utilities header?
  • For the caret (^) rendering — is column offset from tag-start acceptable, or do we need exact attribute-start position from libxml2?

Fork is up to date: GitHub - SKM2227229725/precice at feature/xml-error-context · GitHub

@Frédéric_Simonis — would really value your input on the macro placement and caret approach before I wire up the remaining 15 error sites. Happy to adjust based on your feedback.

Hi, small refinement to my earlier exploration.
After digging deeper, my understanding is that the important part is to capture location information while the SAX parser context is still valid, and then propagate that information into objects used by configuration handlers.

So my current direction is to make location accessible from configuration code through APIs such as tag.location() and tag.attributeLocation(...), instead of requiring handlers to interact with parser internals directly.

This seems closer to the core usability goal of the project: making XML-related errors easy to report exactly where they occur in user configuration files.

I am currently prototyping how this can be wired through ConfigParser and XMLTag cleanly.

Hi everyone,

Final update from my side before the proposal deadline.

I spent some more time reviewing the codebase to validate the feasibility of the approach.

Based on my exploration:

  • Location capture in SAX callbacks (startElement) is the correct point, since parser context is only valid during parsing.
  • The main missing piece is structured propagation of this location into configuration-level code.
  • I verified that error handling in configuration currently lacks access to parser context, so attaching location to XMLTag is a clean integration point.

Refined direction:

  • Store location (line, column, file) in XMLTag during parsing
  • Expose simple APIs like:
    • tag.location()
    • tag.attributeLocation(name)
  • Ensure this integrates naturally with existing error macros (e.g., PRECICE_ERROR)

I also reviewed potential edge cases such as:

  • attribute-level errors vs tag-level errors
  • ensuring minimal overhead and no impact on normal parsing flow

At this point, I feel confident about the design and its integration with the existing code structure.

Thanks again for the discussions so far — looking forward to feedback!

Best,
Shailesh

Hi everyone,

I have now submitted my GSoC 2026 proposal for Error Messages with Configuration Context (#751).

Thanks to the mentors and community for the earlier discussions — they helped me refine the design, especially around configuration-side usability and XMLTag-level integration.

I’ll stay available on Matrix/Discourse and continue contributing where possible.

Looking forward to feedback.

Best,
Shailesh Kumar