Re: XML Encoding and character set errors

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: XML Encoding and character set errors

rfscholte
While working on the build/consumer pom I hit this issue as well when transforming the XML.
I was punished by one of our integation tests: MavenITmng2254PomEncodingTest
This would imply this is not an issue for current poms, and based on the MNG number already for a long time.

Robert
On 25-2-2020 16:53:35, Elliotte Rusty Harold <[hidden email]> wrote:
I'm investigating some non-Apache code that claims to work around
encoding bugs in many pom.xml files on Maven Central by rewriting the
POMs to use Latin-1. The claim is that there are many pom.xml files on
Maven Central that are Latin-1 (and not UTF-8) but are not properly
identified as such in the XML declaration.

How likely is this? Is anyone aware of such malformed pom.xml files?
Are there any checks in place that would prevent such a pom.xml file
from being published?

I need to figure out whether I should focus my attention on looking
for bad pom.xml files or looking for bugs in the XML parsing code. :-)

--
Elliotte Rusty Harold
[hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: XML Encoding and character set errors

Hervé BOUTEMY
if any tool writes pom.xml, it's its own responsibility to write valid XML:
Maven itself cannot do anything on programs writing invalid XML, even when the
generated file is named pom.xml :)

for Maven-team-managed Maven plugins working on pom.xml, I worked on such
encoding issues a long time ago, see [1] = FYI, this was one of my first
contributions, because my french surname contains an accent that made me see
many issues when using Maven :)
AFAIK, for more than 10 years, such issues looked to be completely part of the
past

If there are currently new issues, it's probably by a new tool that generates
invalid XML: please provide examples, and we can dig into how such invalid XML
file was generated, and help to fix it (and avoid people improperly blaming
Maven...)

Regards,

Hervé

[1] https://cwiki.apache.org/confluence/display/MAVEN/XML+Encoding

Le mardi 25 février 2020, 16:52:50 CET Elliotte Rusty Harold a écrit :

> I'm investigating some non-Apache code that claims to work around
> encoding bugs in many pom.xml files on Maven Central by rewriting the
> POMs to use Latin-1. The claim is that there are many pom.xml files on
> Maven Central that are Latin-1 (and not UTF-8) but are not properly
> identified as such in the XML declaration.
>
> How likely is this? Is anyone aware of such malformed pom.xml files?
> Are there any checks in place that would prevent such a pom.xml file
> from being published?
>
> I need to figure out whether I should focus my attention on looking
> for bad pom.xml files or looking for bugs in the XML parsing code. :-)





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]