Re: Allowed characters in GAV and how/where to sanitize?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Allowed characters in GAV and how/where to sanitize?

Andreas Sewe-2
Hi Stephen,

> In an url path segment space is mapped to +

not quite. This holds only for the query part of an URI which is
typically encoding according to the application/x-www-form-urlencoded
scheme. Elsewhere in a URI, e.g., the path component, a space is simply
percent-encoded a %20.

> The repo manager should be blocking those... likely not doing it’s job.

I agree. IMHO the repo manager should block (if only as a last resort
for people using something to deploy that doesn't do the check earlier).

That being said, the situation on Maven Central is not that dire; there
are very few versions in the wild that I consider broken:

Additional quotes:

- "1.0.0
- '1.0'

CLI trouble:

- mvn+release:perform
- version=1.6.2.1

Commas instead of dots as separator

- 1,0

Expressions or expression-like constructs:

- ${env.VERSION}
- ${parent.version}
- @metro.version@
- $%7Bcucumber-jvm.version%7D

If you are interested, I have a more complete list (about 30 entries
overall), together with a histogram of characters used in versions.
Interestingly, no non-ASCII characters are used, not even in qualifiers.

> We probably should also barf on : in a version. There is validation on
> artifactId and groupId when last I checked

Different validators barf on different things. The
org.apache.maven.*project*.validation.DefaultModelValidator used by
deploy:deploy-file is happy with *any* non-empty version, whereas
org.apache.maven.*model*.validation.DefaultModelValidator does a bit
more; in particular, it checks for certain filesystem-unsafe characters,
including the colon: \ / : " < > | ? *

I don't really know why deploy:deploy-file prefers one ModelValidator
over the other, though. Is this a bug in the the maven-deploy-plugin?

Best wishes,

Andreas


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allowed characters in GAV and how/where to sanitize?

Fred Cooke
Re versions, I know the background on it, but it annoys me that maven can't
handle 4 part versions, 1.2.3.4 as sometimes it's handy to do a patch level
that deep. Lots of messed up software in the world :-)

Format should be N[.N as many times as needed][optional hyphen and
qualifier of some sort] or something like that. Not hard limited to 1 2 or
3 parts.

On 10 January 2018 at 22:21, Andreas Sewe <
[hidden email]> wrote:

> Hervé BOUTEMY wrote:
> > notice that Central contains artifacts produced by Maven but also by
> other
> > tools: I did some analysis myself and found strange things also that are
> > clearly not produced by Maven. Scala for example produces some artifacts
> that
> > I doubt could be referenced by Maven.
>
> Yes, Maven is not the only tool that can deploy artifacts to Central --
> and that's a good thing.
>
> > Then: what do we call "broken"?
> > Something that seems "clearly" related to a typo?
> > Something that can't be consumed by Maven?
> > Something that people who produced the release (with any tooling) won't
> > consume for syntactic reasons on the result? Something that they won't
> consume
> > for other reasons? (like for example because it's continuous deployment
> and
> > it's the 4th version of the day)
>
> I wouldn't go so far to treat version=1.6.2.1 as an illegal version
> (after all, I can image someone using legitimately using a qualifier
> scheme like 1.2.3-os=linux), but there are IMHO two cases which I always
> consider broken:
>
> - Spaces in any of the components of a GAV
> - A colon in any of the components of a GAV
>
> Spaces are just likely to cause trouble for some tool further down the
> line.
>
> And for colons we know that they will cause trouble, being the default
> separator for GAVs when written as a single string.
>
> Aside from those characters, I would probably just ban a few characters
> (non-printable control characters). A bit similar to what XML did with a
> its NCName (non-colon name) production [1].
>
> However, for groudId and artifactId we already have much stricter rules
> (A-Z, a-z, 0-9, ., -, _), so the argument can be made that
> versions/classifiers/extensions should also be made up of a more limited
> character set as well.
>
> In particular, care should to be taken that the path component can still
> be parsed unambiguously, so allowing '.' in a classifier is probably
> asking for trouble.
>
> > And what can we do?
> > On the past artifacts, removing anything is not really an option: IMHO,
> the
> > issue does not deserve the effort and to break our base rule about
> > inalterability.
> > On the future, perhaps we can do something:
> > - at Maven level, sure we can and we should improve controls as much as
> > possible
>
> Yes, if only that at this level we can provide the best error messages,
> as the error is recognized closest to the user.
>
> > - on other build tools: perhaps we should try not only to implement
> checks in
> > Maven but also document rules for other tools to implement same rules
>
> The Maven Resolver is a great place to enforce some rules in
> DefaultArtifact (or whatever replaces it). Granted, not everyone deploys
> using the Maven Resolver, but its *the* place that knows about all the
> intricacies of the repository layout already.
>
> > - on repo managers used by the publishers: same rules documentation
> > prerequisite, but other tools target
>
> Well, Nexus already has some checks in place, to avoid versions like
> "1/../../other-artifact/2". However, groupIds like "org...example" are
> still accepted (deployed under org/example).
>
> > - on sync to central: this is the only location where some rules can be
> > checked for absolutely any new artifact then really interesting at a
> first
> > glance. But making rules evolve at this level is really hard since there
> is no
> > real feedback process I know of when base Central publication rules are
> not
> > met. Base Central publication rules were defined from the beginning
> (signature,
> > ...), then are implemented by publishers' repo managers. I suppose failed
> > controls done by sync to central (then sync blocked) are rare: I'm not
> sure
> > there is a strong process/tooling. And adding it would cost some
> management:
> > not easy. IMHO, we should start by first detecting if there are really
> issues
> > on new artifacts these days before trying to take actions at this level.
>
> That being said, I think the first step is to document the syntax for
> GAVs somewhere (e.g., at [2] or [3]).
>
> Best wishes,
>
> Andreas
>
> [1] <https://www.w3.org/TR/REC-xml-names/#NT-NCName>
> [2] <http://maven.apache.org/pom.html#Maven_Coordinates>
> [3] <http://maven.apache.org/ref/3.5.2/maven-model/maven.html>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Allowed characters in GAV and how/where to sanitize?

Fred Cooke
In reply to this post by Andreas Sewe-2
Looks like I'm just plain wrong (and happy about it). I'm not sure where
that memory came from. Perhaps maven 2 some time ago, though it felt fresh
in my mind. Apologies for the noise! :-(

On 11 January 2018 at 11:22, Hervé BOUTEMY <[hidden email]> wrote:

> Le mercredi 10 janvier 2018, 10:21:35 CET Andreas Sewe a écrit :
> > Hervé BOUTEMY wrote:
> > > notice that Central contains artifacts produced by Maven but also by
> other
> > > tools: I did some analysis myself and found strange things also that
> are
> > > clearly not produced by Maven. Scala for example produces some
> artifacts
> > > that I doubt could be referenced by Maven.
> >
> > Yes, Maven is not the only tool that can deploy artifacts to Central --
> > and that's a good thing.
> +1
>
> >
> > > Then: what do we call "broken"?
> > > Something that seems "clearly" related to a typo?
> > > Something that can't be consumed by Maven?
> > > Something that people who produced the release (with any tooling) won't
> > > consume for syntactic reasons on the result? Something that they won't
> > > consume for other reasons? (like for example because it's continuous
> > > deployment and it's the 4th version of the day)
> >
> > I wouldn't go so far to treat version=1.6.2.1 as an illegal version
> I was not talking about a version with 4 parts: Maven 3 supports an
> arbitrary
> count of parts.
> I was talking about an artifact that is released 4 times per day, because
> it's
> continuous delivery (I suppose): a vast majority of releases are IMHO never
> used
>
> > (after all, I can image someone using legitimately using a qualifier
> > scheme like 1.2.3-os=linux), but there are IMHO two cases which I always
> > consider broken:
> >
> > - Spaces in any of the components of a GAV
> > - A colon in any of the components of a GAV
> +1
>
> >
> > Spaces are just likely to cause trouble for some tool further down the
> line.
> >
> > And for colons we know that they will cause trouble, being the default
> > separator for GAVs when written as a single string.
> >
> > Aside from those characters, I would probably just ban a few characters
> > (non-printable control characters). A bit similar to what XML did with a
> > its NCName (non-colon name) production [1].
> +1
>
> >
> > However, for groudId and artifactId we already have much stricter rules
> > (A-Z, a-z, 0-9, ., -, _), so the argument can be made that
> > versions/classifiers/extensions should also be made up of a more limited
> > character set as well.
> +1
>
> >
> > In particular, care should to be taken that the path component can still
> > be parsed unambiguously, so allowing '.' in a classifier is probably
> > asking for trouble.
> +1 again
>
> >
> > > And what can we do?
> > > On the past artifacts, removing anything is not really an option: IMHO,
> > > the
> > > issue does not deserve the effort and to break our base rule about
> > > inalterability.
> > > On the future, perhaps we can do something:
> > > - at Maven level, sure we can and we should improve controls as much as
> > > possible
> >
> > Yes, if only that at this level we can provide the best error messages,
> > as the error is recognized closest to the user.
> >
> > > - on other build tools: perhaps we should try not only to implement
> checks
> > > in Maven but also document rules for other tools to implement same
> rules
> > The Maven Resolver is a great place to enforce some rules in
> > DefaultArtifact (or whatever replaces it). Granted, not everyone deploys
> > using the Maven Resolver, but its *the* place that knows about all the
> > intricacies of the repository layout already.
> >
> > > - on repo managers used by the publishers: same rules documentation
> > > prerequisite, but other tools target
> >
> > Well, Nexus already has some checks in place, to avoid versions like
> > "1/../../other-artifact/2". However, groupIds like "org...example" are
> > still accepted (deployed under org/example).
> probably ".." should be forbidden also
>
> >
> > > - on sync to central: this is the only location where some rules can be
> > > checked for absolutely any new artifact then really interesting at a
> first
> > > glance. But making rules evolve at this level is really hard since
> there
> > > is no real feedback process I know of when base Central publication
> rules
> > > are not met. Base Central publication rules were defined from the
> > > beginning (signature, ...), then are implemented by publishers' repo
> > > managers. I suppose failed controls done by sync to central (then sync
> > > blocked) are rare: I'm not sure there is a strong process/tooling. And
> > > adding it would cost some management: not easy. IMHO, we should start
> by
> > > first detecting if there are really issues on new artifacts these days
> > > before trying to take actions at this level.
> > That being said, I think the first step is to document the syntax for
> > GAVs somewhere (e.g., at [2] or [3]).
> +1
> there is an edit button near the title to find the source and propose a PR
> :)
>
> Regards,
>
> Hervé
>
> >
> > Best wishes,
> >
> > Andreas
> >
> > [1] <https://www.w3.org/TR/REC-xml-names/#NT-NCName>
> > [2] <http://maven.apache.org/pom.html#Maven_Coordinates>
> > [3] <http://maven.apache.org/ref/3.5.2/maven-model/maven.html>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Allowed characters in GAV and how/where to sanitize?

Hervé BOUTEMY
ok, I suppose this is a reminiscence of Maven 2: yes, in Maven 2, there was
this limitation that caused multiple issues linked in MNG-3010
https://issues.apache.org/jira/browse/MNG-3010

As Robert Scholte noticed, perhaps there are still a few glitches with some
plugins here and there, that require precise description to be fixed

It's good to have memory: now, I hope the fix for MNG-3010 is part of our
common memory :)

Regards,

Hervé

Le mercredi 10 janvier 2018, 23:47:13 CET Fred Cooke a écrit :

> Looks like I'm just plain wrong (and happy about it). I'm not sure where
> that memory came from. Perhaps maven 2 some time ago, though it felt fresh
> in my mind. Apologies for the noise! :-(
>
> On 11 January 2018 at 11:22, Hervé BOUTEMY <[hidden email]> wrote:
> > Le mercredi 10 janvier 2018, 10:21:35 CET Andreas Sewe a écrit :
> > > Hervé BOUTEMY wrote:
> > > > notice that Central contains artifacts produced by Maven but also by
> >
> > other
> >
> > > > tools: I did some analysis myself and found strange things also that
> >
> > are
> >
> > > > clearly not produced by Maven. Scala for example produces some
> >
> > artifacts
> >
> > > > that I doubt could be referenced by Maven.
> > >
> > > Yes, Maven is not the only tool that can deploy artifacts to Central --
> > > and that's a good thing.
> >
> > +1
> >
> > > > Then: what do we call "broken"?
> > > > Something that seems "clearly" related to a typo?
> > > > Something that can't be consumed by Maven?
> > > > Something that people who produced the release (with any tooling)
> > > > won't
> > > > consume for syntactic reasons on the result? Something that they won't
> > > > consume for other reasons? (like for example because it's continuous
> > > > deployment and it's the 4th version of the day)
> > >
> > > I wouldn't go so far to treat version=1.6.2.1 as an illegal version
> >
> > I was not talking about a version with 4 parts: Maven 3 supports an
> > arbitrary
> > count of parts.
> > I was talking about an artifact that is released 4 times per day, because
> > it's
> > continuous delivery (I suppose): a vast majority of releases are IMHO
> > never
> > used
> >
> > > (after all, I can image someone using legitimately using a qualifier
> > > scheme like 1.2.3-os=linux), but there are IMHO two cases which I always
> > > consider broken:
> > >
> > > - Spaces in any of the components of a GAV
> > > - A colon in any of the components of a GAV
> >
> > +1
> >
> > > Spaces are just likely to cause trouble for some tool further down the
> >
> > line.
> >
> > > And for colons we know that they will cause trouble, being the default
> > > separator for GAVs when written as a single string.
> > >
> > > Aside from those characters, I would probably just ban a few characters
> > > (non-printable control characters). A bit similar to what XML did with a
> > > its NCName (non-colon name) production [1].
> >
> > +1
> >
> > > However, for groudId and artifactId we already have much stricter rules
> > > (A-Z, a-z, 0-9, ., -, _), so the argument can be made that
> > > versions/classifiers/extensions should also be made up of a more limited
> > > character set as well.
> >
> > +1
> >
> > > In particular, care should to be taken that the path component can still
> > > be parsed unambiguously, so allowing '.' in a classifier is probably
> > > asking for trouble.
> >
> > +1 again
> >
> > > > And what can we do?
> > > > On the past artifacts, removing anything is not really an option:
> > > > IMHO,
> > > > the
> > > > issue does not deserve the effort and to break our base rule about
> > > > inalterability.
> > > > On the future, perhaps we can do something:
> > > > - at Maven level, sure we can and we should improve controls as much
> > > > as
> > > > possible
> > >
> > > Yes, if only that at this level we can provide the best error messages,
> > > as the error is recognized closest to the user.
> > >
> > > > - on other build tools: perhaps we should try not only to implement
> >
> > checks
> >
> > > > in Maven but also document rules for other tools to implement same
> >
> > rules
> >
> > > The Maven Resolver is a great place to enforce some rules in
> > > DefaultArtifact (or whatever replaces it). Granted, not everyone deploys
> > > using the Maven Resolver, but its *the* place that knows about all the
> > > intricacies of the repository layout already.
> > >
> > > > - on repo managers used by the publishers: same rules documentation
> > > > prerequisite, but other tools target
> > >
> > > Well, Nexus already has some checks in place, to avoid versions like
> > > "1/../../other-artifact/2". However, groupIds like "org...example" are
> > > still accepted (deployed under org/example).
> >
> > probably ".." should be forbidden also
> >
> > > > - on sync to central: this is the only location where some rules can
> > > > be
> > > > checked for absolutely any new artifact then really interesting at a
> >
> > first
> >
> > > > glance. But making rules evolve at this level is really hard since
> >
> > there
> >
> > > > is no real feedback process I know of when base Central publication
> >
> > rules
> >
> > > > are not met. Base Central publication rules were defined from the
> > > > beginning (signature, ...), then are implemented by publishers' repo
> > > > managers. I suppose failed controls done by sync to central (then sync
> > > > blocked) are rare: I'm not sure there is a strong process/tooling. And
> > > > adding it would cost some management: not easy. IMHO, we should start
> >
> > by
> >
> > > > first detecting if there are really issues on new artifacts these days
> > > > before trying to take actions at this level.
> > >
> > > That being said, I think the first step is to document the syntax for
> > > GAVs somewhere (e.g., at [2] or [3]).
> >
> > +1
> > there is an edit button near the title to find the source and propose a PR
> >
> > :)
> >
> > Regards,
> >
> > Hervé
> >
> > > Best wishes,
> > >
> > > Andreas
> > >
> > > [1] <https://www.w3.org/TR/REC-xml-names/#NT-NCName>
> > > [2] <http://maven.apache.org/pom.html#Maven_Coordinates>
> > > [3] <http://maven.apache.org/ref/3.5.2/maven-model/maven.html>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]