Re: Maven Filtering Plugin should avoid overwriting even when filtering

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Maven Filtering Plugin should avoid overwriting even when filtering

Robert Oxspring

> On 25 Apr 2020, at 15:37, Romain Manni-Bucau <[hidden email]> wrote:
>
> Hi Robert, two thoughts maybe
>
> 1. all that work (thinking also of assembly one) should probably end up by
> enriching
> https://github.com/apache/maven-shared-incremental/blob/master/src/main/java/org/apache/maven/shared/incremental/IncrementalBuildHelper.java
> model
> and impl.

Great! It’s been a couple of years since I’ve worked on a maven project so pointers towards infrastructure pieces I’ve missed or forgotten is v helpful! I’ll take a look at the incremental build helper.

> I tend to think it does not need any hash but just a lastUpdated
> track (the most recent file invalidating previous cache, it is enough and
> faster than any hash computation)

I’m not totally sure of this. Filtering, for example, is particularly dependant on properties and configuration which aren’t directly linked with a source file change. Allowing the feature to consider wider inputs should make the feature more reliable, and might avoid the need to make it optional. Definitely up for running the numbers and seeing the impact before deciding though!

> 2. by default nothing should be skipped - I guess a -Dxxx.incremental=true
> should be added, maybe even to maven core (see next point), all this theory
> is based on fully defining inputs/outputs fully. This is what gradle
> implemented and I almost never saw a single complex gradle build correctly
> configured to not bypass modules it shouldn't bypass (or worse, skip tests
> it shouldn't) so I think we should be conservative here + several plugins
> refetch data when executed so even if the result of 1 is "no change" you
> can need to reexecute the plugin (including the filtering one if the
> filtered data depends on the time or so - I don't want to discuss it is a
> good or bad practise but it is used a lot in CI/CD pipelines).

Interesting. To me the the default should be that nothing is repeated. Our experiences of Gradle are entirely opposed too - I’ve never seen it’s incremental build support doing the wrong thing. I’m well aware of the necessity to support timestamps when filtering though.

> 3. Not being per plugin but global on maven sounds better than starting to
> be specific in all plugins. I really see it as a dev feature you can
> activate while working and not enable on the CI for example. Added to
> maven-core it can enable to define in mojo the preconditions which would be
> checked with 1 and potentially the conditions triggering inconditionally a
> rebuild. Maybe a @MojoInput("${project.build.outputDirectory}",
> "${project.artifacts}", "method:getState()") or so.

Yeah, that’s all looking much closer to what I’ve used in Gradle more recently. It would make it a bigger piece of work, but I’m happy to explore that if it’s the preferred option!

Thanks for the input!

Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Maven Filtering Plugin should avoid overwriting even when filtering

Robert Oxspring

> On 25 Apr 2020, at 16:31, Romain Manni-Bucau <[hidden email]> wrote:
>
> Le sam. 25 avr. 2020 à 17:12, Robert Oxspring <[hidden email]> a
> écrit :
>
>>
>>> On 25 Apr 2020, at 15:37, Romain Manni-Bucau <[hidden email]>
>> wrote:
>>>
>>> Hi Robert, two thoughts maybe
>>>
>>> 1. all that work (thinking also of assembly one) should probably end up
>> by
>>> enriching
>>>
>> https://github.com/apache/maven-shared-incremental/blob/master/src/main/java/org/apache/maven/shared/incremental/IncrementalBuildHelper.java
>>> model
>>> and impl.
>>
>> Great! It’s been a couple of years since I’ve worked on a maven project so
>> pointers towards infrastructure pieces I’ve missed or forgotten is v
>> helpful! I’ll take a look at the incremental build helper.
>>
>>> I tend to think it does not need any hash but just a lastUpdated
>>> track (the most recent file invalidating previous cache, it is enough and
>>> faster than any hash computation)
>>
>> I’m not totally sure of this. Filtering, for example, is particularly
>> dependant on properties and configuration which aren’t directly linked with
>> a source file change. Allowing the feature to consider wider inputs should
>> make the feature more reliable, and might avoid the need to make it
>> optional. Definitely up for running the numbers and seeing the impact
>> before deciding though!
>>
>
> Hmm, more you'll put there, more you'll defeat your goal (didn't check on
> maven but did it for some "big" downloads where the size can be compared on
> maven project and hashing was a disaster if done globally).
> Globally if you depend on properties then you have to use more advanced
> hashing but also just rebuild generally so timestamp still work, you just
> force the rebuild in such a case.
> There is also the issue of plugin adding properties and downstream plugins
> using them, here you would need a state manager you reinject in
> maven...which will highly likely be wrong at the rebuild time.
> So maybe let's start simple?

This does take us somewhat full circle: The diff I linked to at the start of this thread was a far simpler place to start. No state storage. Files only modified if the content changes. Tbh, with this change in place I suspect I’d be happy with lastModified driven incremental builds.

>>> 2. by default nothing should be skipped - I guess a
>> -Dxxx.incremental=true
>>> should be added, maybe even to maven core (see next point), all this
>> theory
>>> is based on fully defining inputs/outputs fully. This is what gradle
>>> implemented and I almost never saw a single complex gradle build
>> correctly
>>> configured to not bypass modules it shouldn't bypass (or worse, skip
>> tests
>>> it shouldn't) so I think we should be conservative here + several plugins
>>> refetch data when executed so even if the result of 1 is "no change" you
>>> can need to reexecute the plugin (including the filtering one if the
>>> filtered data depends on the time or so - I don't want to discuss it is a
>>> good or bad practise but it is used a lot in CI/CD pipelines).
>>
>> Interesting. To me the the default should be that nothing is repeated. Our
>> experiences of Gradle are entirely opposed too - I’ve never seen it’s
>> incremental build support doing the wrong thing. I’m well aware of the
>> necessity to support timestamps when filtering though.
>>
>
> At beginning i was wondering how Gradle was so fast...then i realized it
> was not doing its job. I didn't review enough projects probably (some
> dozen) but not a single one was well configured to ensure the build was
> deterministic until you kill the daemon and clean the project before the
> rebuild. It is very nasty and I hope maven does not get that kind of
> behavior *by default* (to explicit that '*': it is fine to me to get it in
> dev since on some projects it can help in dev iterations).
>
>
>>
>>> 3. Not being per plugin but global on maven sounds better than starting
>> to
>>> be specific in all plugins. I really see it as a dev feature you can
>>> activate while working and not enable on the CI for example. Added to
>>> maven-core it can enable to define in mojo the preconditions which would
>> be
>>> checked with 1 and potentially the conditions triggering inconditionally
>> a
>>> rebuild. Maybe a @MojoInput("${project.build.outputDirectory}",
>>> "${project.artifacts}", "method:getState()") or so.
>>
>> Yeah, that’s all looking much closer to what I’ve used in Gradle more
>> recently. It would make it a bigger piece of work, but I’m happy to explore
>> that if it’s the preferred option!
>>
>
> BTW, we should have started by that, but do you have figures about your
> project?
> Something like "each mojo taks X ms", i modify a class, estimated rebuild
> time is "Y ms" (maybe call the mojos manually, like "mvn compiler:compile
> jar:jar" etc), actual rebuild time with "mvn install" is "Z ms" (+
> drilldown per mojo).

Does seem a sensible to start. Is there a reasonable way to extract such timing information? Was hoping that grepping A debut log would be sufficient but no timings are obvious beyond the overall summary. Figuring out the relevant mojos and manually timing for 30 modules Is pretty boring.

>
> Don't know if Hervé read this already but can't it be close to the
> reproducible track which has a kind of state - to make more generic
> probably but it sounds like a "standard"/existing option to investigate
> maybe?

Didn’t follow any of that but seems it was mostly for Hervé.

>
>
>>
>> Thanks for the input!
>>
>> Rob
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Maven Filtering Plugin should avoid overwriting even when filtering

Robert Oxspring
In reply to this post by Robert Oxspring


> On 29 Apr 2020, at 14:43, Tobias <[hidden email]> wrote:
>
> Hi,
>
> I just came across this issue the other day, too. A mvn install after a
> mvn clean install, i.e. nothing has to be compiled, results in a jar
> being created for one module which takes one minute. This is due to
> filtered resources being overwriten though this is not necessary. This
> happens for multiples modules in our build. I too think that the speed
> of the build process could be greatly improved if resources wouldn't be
> overwriten if nothing has changed.
>
>> I think we’re doomed to having to recreate the file each time, because we have no way of knowing whether the file has changed until you generate it.
>
> I'm new to Maven and just discovering it, but my sketch of a naive and
> simple, not optimized idea was the following:
>
> - Identify all locations where information can be stored which is used
> to generate filtered resources (pom.xml, .properties files, ...?)
> - For all these locations, get the date it was last written to
> - Take the most recent date from all these dates
> - For each filtered resource which is already in the target folder from
> the last build, take the date of this resource file and check if this
> date is newer than the date identified in the previouis step. If it is,
> don't copy the filtered resource to the target folder
>
> Is this not possible?

If we need to look at all potential sources of filterable properties then that will include environment variables which aren’t necessarily associated with a file, and potentially properties that include a timestamp and will necessarily be different every run, so using the lastModified date of some set of input files is not practical.

I’ve opened a pull request using a temporary file to write filtered content to, and compare with the previous result. If the contents are equal then the temporary file is deleted and the target file is left unmodified, otherwise we rename the temporary file over the target, updating its lastModified date implicitly. This approach requires no knowledge of what filtering was performed, or where those properties came from.

Admittedly it means that the Maven Filtering Plugin is still doing a little more IO work each run, and it temporarily consumes 2x the storage in the target volume, but filtered resources tend to be small and disk space is relatively cheap so I think this is acceptable. This approach does recreate _a_ file each time, but avoids recreating _the_ file each time, and can avoid triggering downstream work based on the lastModified date.

I think this is the simplest and least risk solution, and doesn’t preclude trying something fancier later.

https://github.com/apache/maven-filtering/pull/5

Feedback welcome!

Thanks,

Rob



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]