Add an option to execute the plugin with a custom encoding setting #70

eyalroth · 2018-04-04T14:02:06Z

The data scoverage is covering is often encoded with custom charsets, which may lead to problems with reading / writing the coverage data, since scalac plugin relies on the default encoding configured on the JVM and doesn't fully support configuring custom encoding with other means.

This could be seen in multiple locations such as:

Serializer#serialize() and Serializer#deserialize
ScoveragePlugin#newPhase() - using serialize without a custom encoding.
ScoverageHtmlWriter#filePage() - this was a problem in the version currently used by the gradle plugin (in the current master the scalac plugin actually allows for custom encoding).

This means that to overcome such problems one must be able to set the default JVM encoding when using the scalac plugin. This PR adds the ability to do so with the gradle plugin.

It makes sure to execute all the launched processes / JVMs -- including both the report and aggregate apps and the scalac compilation with the compiler plugin -- with the custom encoding as the default JVM encoding (using the -Dfile.encoding=encoding argument).

This PR fixes #68.

eyalroth · 2018-04-04T14:39:45Z

src/main/groovy/org/scoverage/ScoverageExtension.groovy

@@ -146,10 +150,23 @@ class ScoverageExtension {
                if (extension.highlighting) {
                    parameters.add('-Yrangepos')
                }
+                if (extension.encoding) {


This part was not actually necessary in my build, but I suspect that this is only because I was using the Gradle daemon, where as in the case of not using it, it may be that the other settings (in the fork options) will be irrelevant and this one will actually matter.

maiflai

Thanks very much for this - it's great to find fellow-minded folks.

It seems that there is already a property on ScalaCompileOptions, which can be set globally with:

tasks.withType(ScalaCompile) {
 scalaCompileOptions.encoding = 'UTF-8' 
}

I think I prefer that this plugin respect the compiler configuration rather than introducing a new place to configure things. The report task reports on something that has to be compiled in.

I think this makes the PR much simpler, I'll see if I can get a fix out later today unless you're keen to take a closer look?

Cheers,
Stu.

eyalroth · 2018-04-08T10:06:00Z

That was one of the things I tried at first actually, as I already mentioned in the issue (#68):

gradle.projectsEvaluated {
    tasks.withType(AbstractScalaCompile) {
        options.encoding = "UTF-8"
        scalaCompileOptions.setEncoding("UTF-8")
        List<String> parameters = ['-encoding', 'UTF-8']
        List<String> existingParameters = scalaCompileOptions.additionalParameters
        if (existingParameters) {
            parameters.addAll(existingParameters)
        }
        scalaCompileOptions.additionalParameters = parameters
    }
}

AbstractScalaCompile is much like using ScalaCompile, and setEncoding is equivalent to .encoding = . That didn't work for me.

The only thing that actually worked was adding the JVM encoding argument to the fork options. I'm guessing this has something to do with how the Gradle daemon executes the tasks. I added the JVM argument to the non-fork options as well only to try cover cases when executing gradle without the daemon (but haven't actually tested this).

Other than this, I added the JVM argument to both of the applications' (report and aggregate) execution since those make use of "third party" code that doesn't support custom encoding other than this method.

maiflai · 2018-04-08T19:51:07Z

Sorry for the confusion.

I've created #71 - the intention was to actually respect the values that are being configured on the ScalaCompile tasks, rather than to add additional configuration options for this plugin.

I've added this to the check task as it too uses the default encoding when parsing XML.

Would it be possible to see if the PR works for you? I still need to figure out how to test this automatically.

eyalroth · 2018-04-09T11:30:12Z

Again, the problem is not just with the gradle plugin code, but with the scalac plugin as well. The scalac plugin is generating a coverage XML with corrupted data.

I've managed to find the problematic source code that causes this problem. The source code has a literal string with the character \u201c, which is e2809c in hex. For some reason, the scalac plugin converts this to a mere 93 hex in the XML.

To reproduce / automatically test this, you could create a project with the following source code:

object Foo {
  def foo = "this is unicode - “ - this is unicode"
}

And reportScoverage will fail (at least it does on my environment).

I don't know exactly why this is happening. I tried to reproduce this with my own code writing this character to a file and it wrote it just fine. Perhaps there's something going on when the plugin / compiler reads the source code. Again, this could only be solved if the fork JVM arguments are set to the right encoding.

I don't mind relying on the encoding configured for the ScalaCompile task instead of an additional parameter for the plugin, though I still think this case should be documented in the readme.md in a "troubleshooting" section.

maiflai · 2018-04-09T20:21:39Z

Ah, ok. It looks as thought the current version of scalac plugin should produce UTF-8 regardless of any specified encoding.

It may be that SBT adds the fork option to set encoding as well as passing the command-line argument to the compiler (as you have done above).

I'll see if I can take a closer look tomorrow; this may also involve an upstream PR or two :-/.

Stu.

gslowikowski · 2018-04-10T20:17:29Z

For me a test with this code:

object Foo {
  def foo = "this is unicode - “ - this is unicode"
}

works. The files scoverage.coverage.xml and Foo.scala.html in html report contain properly encoded left upper quotation mark character.
I have encoding set to UTF-8 by default.

eyalroth · 2018-04-10T20:21:38Z

Well, I'm running on Windows 10 and my default Scala codec is windows-1252:

scala> def foo()(implicit codec: scala.io.Codec) = codec
scala> foo()
res0: scala.io.Codec = windows-1252

Perhaps that has something to do with that...

gslowikowski · 2018-04-10T20:24:32Z

@eyalroth can you confirm that with encoding set to UTF-8 it works for you?

eyalroth · 2018-04-10T20:28:02Z

Well, yes, this is what this is about - forcing UTF-8 as the default encoding on all of the JVM processes involved in the build. When the default encoding is UTF-8 then everything works.

gslowikowski · 2018-04-10T20:37:52Z

So Gradle plugin must provide a way to set desired encoding (like in #70 or #71) and pass it to forked JVM.

gslowikowski · 2018-04-10T20:40:37Z

Don't force UTF-8. Any user-defined encoding should work (provided source files are properly encoded).

eyalroth · 2018-04-10T20:51:42Z

Yep, obviously we don't want to force UTF-8 specifically, but only what the user has defined (via scalaCompileOptions.setEncoding(???), I guess).

It would have been better if there was a way to invoke the compiler / scalac plugin with custom encoding, but this method should work for the Gradle plugin as well.

gslowikowski · 2018-04-10T21:54:39Z

I've changed my system encoding to Windows-1250 and:

generated html reports are not encoded properly on "Code Grid" tab, but encoded properly on "Statement List" tab, scoverage.coverage.xml report is not encoded properly
after adding scalacOptions ++= Seq("-encoding", "UTF-8") in SBT or <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> in Maven html reports are encoded properly on both tabs, scoverage.coverage.xml report is not encoded properly

Conclusions:

scoverage.coverage.xml file should always be written and read in UTF-8 IMO (Serializer#serialize and Serializer#deserialize). This is internal XML file so I see no reason in using other, system default so unpredictable, encoding. This needs separate PR in scalac-scoverage-plugin`.
properly passing encoding to scalac via scalacOptions (in Maven project.build.sourceEncoding is converted to scalacOptions) is the only requirement to properly render HTML reports.

maiflai · 2018-04-27T19:57:55Z

Sorry, I'm struggling to test this on my local machine.

I've set the fork option as you suggest, please could you see if #71 helps?

Thanks,
Stu

maiflai · 2018-05-06T21:31:11Z

Hi,

I've managed to source a build environment with Windows 10, and I can't seem to reproduce the issue with a simple Gradle project.

I did have to explicitly state the encoding. If I don't, I see

IO error while decoding ...\example\src\main\scala\hello\World.scala with UTF-8
Please try specifying another one using the -encoding option

If I do specify the encoding

tasks.withType(ScalaCompile) {
    scalaCompileOptions.encoding='windows-1252'
}

Then reportScoverage faithfully reproduces the magic quotes.

println("this is unicode - “ - this is unicode")

This is with gradle-scoverage-2.1.0, scalac-scoverage-1.3.1, and scalac 2.12.5

Sorry to be a pain, but please could you share an example repository that fails so that I can take a closer look?

Thanks,
Stu.

maiflai · 2018-05-07T20:33:23Z

I've uploaded my test repository to https://github.com/maiflai/gradle-scoverage-encoding - please could you let me know if it builds?

Thanks,
Stu

eyalroth · 2018-05-07T22:35:49Z

@maiflai Your system's encoding has to be changed to Windows-1250. This cannot be done with Gradle, but you'd have to configure your Windows machine as so.

maiflai · 2018-05-08T20:48:45Z

Sorry, I'm not sure I follow.

My Windows machine is configured for the United States and reports that it is using Cp-1250.

How do I make my system encoding match yours?

maiflai/gradle-scoverage-encoding@91658cc

Task :showEncoding
Cp1252

eyalroth · 2018-05-08T20:59:19Z

I believe cp-1250 is equivalent to windows-1250 ("cp" stands for code page). Mine is actually windows-1252, but looks like the problem exists with both encodings, which are pretty standard on windows machines.

Can you please check the encoding with a scala REPL using the code I used in one of my previous comments (the one with the implicit codec variable)?

maiflai · 2018-05-08T21:23:57Z

$ bin/scala
Welcome to Scala 2.12.6 (OpenJDK Client VM, Java 1.8.0_40).
Type in expressions for evaluation. Or try :help.

scala> def foo()(implicit codec: scala.io.Codec) = codec
foo: ()(implicit codec: scala.io.Codec)scala.io.Codec

scala> foo()
res0: scala.io.Codec = windows-1252

maiflai · 2018-05-08T21:24:22Z

does my test repository build?

eyalroth · 2018-05-13T11:09:43Z

@maiflai well, not exactly.

If you remove this part from the build.grade:

tasks.withType(ScalaCompile) {
    scalaCompileOptions.encoding='windows-1252'
}

Then the compilation fails with:

IO error while decoding World.scala with UTF-8
Please try specifying another one using the -encoding option

Which is strange and shouldn't normally happen. I realized that the problem is with the World.scala file itself, which is not encoded correctly. As an example, take a look at how IntelliJ reads this file:

Creating a file with proper UTF-8 encoding makes the compilation work. Re-adding the scalaCompileOptions.encoding part to the gradle actually fails the compilation again with a similar exception, though this time it is windows-1252:

IO error while decoding World.scala with windows-1252

Anyhow, after encoding the file properly and removing that gradle part, reportScoverage actually seems to work with \u201C (“), but doesn't work with \u201D (”).

eyalroth · 2019-09-16T18:48:45Z

See #117.

Add an option to execute the plugin with a custom encoding setting

865633a

eyalroth commented Apr 4, 2018

View reviewed changes

maiflai reviewed Apr 8, 2018

View reviewed changes

eyalroth mentioned this pull request Apr 9, 2018

UnmappableCharacterException / MalformedInputException (problems with encoding) #68

Closed

eyalroth closed this Sep 16, 2019

helfper mentioned this pull request Oct 15, 2020

Add support for Scala 2.13 #145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to execute the plugin with a custom encoding setting #70

Add an option to execute the plugin with a custom encoding setting #70

eyalroth commented Apr 4, 2018 •

edited

Loading

eyalroth Apr 4, 2018

maiflai left a comment

eyalroth commented Apr 8, 2018

maiflai commented Apr 8, 2018

eyalroth commented Apr 9, 2018 •

edited

Loading

maiflai commented Apr 9, 2018

gslowikowski commented Apr 10, 2018

eyalroth commented Apr 10, 2018

gslowikowski commented Apr 10, 2018 •

edited

Loading

eyalroth commented Apr 10, 2018

gslowikowski commented Apr 10, 2018

gslowikowski commented Apr 10, 2018

eyalroth commented Apr 10, 2018

gslowikowski commented Apr 10, 2018

maiflai commented Apr 27, 2018

maiflai commented May 6, 2018

maiflai commented May 7, 2018

eyalroth commented May 7, 2018

maiflai commented May 8, 2018

eyalroth commented May 8, 2018 •

edited

Loading

maiflai commented May 8, 2018

maiflai commented May 8, 2018

eyalroth commented May 13, 2018

eyalroth commented Sep 16, 2019

Add an option to execute the plugin with a custom encoding setting #70

Add an option to execute the plugin with a custom encoding setting #70

Conversation

eyalroth commented Apr 4, 2018 • edited Loading

eyalroth Apr 4, 2018

Choose a reason for hiding this comment

maiflai left a comment

Choose a reason for hiding this comment

eyalroth commented Apr 8, 2018

maiflai commented Apr 8, 2018

eyalroth commented Apr 9, 2018 • edited Loading

maiflai commented Apr 9, 2018

gslowikowski commented Apr 10, 2018

eyalroth commented Apr 10, 2018

gslowikowski commented Apr 10, 2018 • edited Loading

eyalroth commented Apr 10, 2018

gslowikowski commented Apr 10, 2018

gslowikowski commented Apr 10, 2018

eyalroth commented Apr 10, 2018

gslowikowski commented Apr 10, 2018

maiflai commented Apr 27, 2018

maiflai commented May 6, 2018

maiflai commented May 7, 2018

eyalroth commented May 7, 2018

maiflai commented May 8, 2018

eyalroth commented May 8, 2018 • edited Loading

maiflai commented May 8, 2018

maiflai commented May 8, 2018

eyalroth commented May 13, 2018

eyalroth commented Sep 16, 2019

eyalroth commented Apr 4, 2018 •

edited

Loading

eyalroth commented Apr 9, 2018 •

edited

Loading

gslowikowski commented Apr 10, 2018 •

edited

Loading

eyalroth commented May 8, 2018 •

edited

Loading