UnmappableCharacterException / MalformedInputException (problems with encoding) #68

eyalroth · 2018-04-03T18:28:11Z

I have a hybrid scala/java gradle project with multiple sub-projects and I'm trying to make scoverage work for me.

reportScoverage fails to execute on one of my sub-projects with the following exception:

Exception in thread "main" java.nio.charset.UnmappableCharacterException: Input length = 1
        at java.nio.charset.CoderResult.throwException(CoderResult.java:282)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.read1(BufferedReader.java:210)
        at java.io.BufferedReader.read(BufferedReader.java:286)
        at java.io.Reader.read(Reader.java:140)
        at scala.io.BufferedSource.mkString(BufferedSource.scala:96)
        at scoverage.report.CodeGrid.source(CodeGrid.scala:63)
        at scoverage.report.CodeGrid.<init>(CodeGrid.scala:17)
        at scoverage.report.ScoverageHtmlWriter.filePage(ScoverageHtmlWriter.scala:78)
        at scoverage.report.ScoverageHtmlWriter.scoverage$report$ScoverageHtmlWriter$$writeFile(ScoverageHtmlWriter.scala:44)
        at scoverage.report.ScoverageHtmlWriter$$anonfun$scoverage$report$ScoverageHtmlWriter$$writePackage$1.apply(ScoverageHtmlWriter.scala:37)
        at scoverage.report.ScoverageHtmlWriter$$anonfun$scoverage$report$ScoverageHtmlWriter$$writePackage$1.apply(ScoverageHtmlWriter.scala:37)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scoverage.report.ScoverageHtmlWriter.scoverage$report$ScoverageHtmlWriter$$writePackage(ScoverageHtmlWriter.scala:37)
        at scoverage.report.ScoverageHtmlWriter$$anonfun$write$1.apply(ScoverageHtmlWriter.scala:27)
        at scoverage.report.ScoverageHtmlWriter$$anonfun$write$1.apply(ScoverageHtmlWriter.scala:27)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scoverage.report.ScoverageHtmlWriter.write(ScoverageHtmlWriter.scala:27)
        at org.scoverage.ScoverageWriter.write(ScoverageWriter.java:65)
        at org.scoverage.SingleReportApp.main(SingleReportApp.java:41)

I found an issue on the SBT plugin issue tracker (sbt-scoverage#204) with the same exception, and it implied using the proper encoding setting when invoking the scala compiler. Well then, I added this to my project:

gradle.projectsEvaluated {
    tasks.withType(AbstractScalaCompile) {
        options.encoding = "UTF-8"
        scalaCompileOptions.setEncoding("UTF-8")
        List<String> parameters = ['-encoding', 'UTF-8']
        List<String> existingParameters = scalaCompileOptions.additionalParameters
        if (existingParameters) {
            parameters.addAll(existingParameters)
        }
        scalaCompileOptions.additionalParameters = parameters
    }
}

This code is an overkill to make sure Scala compiles with UTF-8. But it didn't change a thing.

I looked at source code of where the exception is thrown at CodeGrid.source(), and I saw that it's using a custom encoding propagated by ScoverageHtmlWriter. The problem here though is that ScoverageHtmlWriter is never initialized with any custom encoding; not in the original SBT plugin code nor in this project's ScoverageWriter.

This obviously should be fixed. But anyway I thought "let's just try overriding the default encoding with a simple -Dfile.encoding=utf-8". Well, this caused an exception in an even earlier stage:

Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1
        at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.read1(BufferedReader.java:210)
        at java.io.BufferedReader.read(BufferedReader.java:286)
        at java.io.Reader.read(Reader.java:140)
        at scala.io.BufferedSource.mkString(BufferedSource.scala:96)
        at scoverage.Serializer$.deserialize(Serializer.scala:130)
        at scoverage.Serializer.deserialize(Serializer.scala)
        at org.scoverage.SingleReportApp.main(SingleReportApp.java:36)

Again, problems with encoding. I'm assuming this is a problem with the original SBT plugin and not with the gradle plugin, but I thought I'd file the issue here since I'm using the gradle plugin (well, I'm trying to use it, but it seems it needs to be fixed in order for it to be relevant to me).

The text was updated successfully, but these errors were encountered:

maiflai · 2018-04-03T19:38:03Z

Hi,

How are you passing the file.encoding system property please?

You may find that exporting an environment variable before launching Gradle makes this work transparently.

export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8

Thanks,
Stu

eyalroth · 2018-04-03T20:36:01Z

It worked, but IMHO this is only a workaround, and not a very good one. Making sure this variable is set on every environment (devs and CI) would result in build inconsistencies for sure. I couldn't find a way to define these variables on a per-project basis other than placing the export in the wrapper scripts, but I prefer not to have these scripts in my VCS but rather rely on the Wrapper task.

The scoverage plugins should allow for overriding the default JVM encoding. Until that would be implemented (if at all), perhaps there is a different way of injecting the default encoding to the actual gradle process executing the report?

Note that the following configuration didn't work either:

configure(reportScoverage) {
    systemProperty "file.encoding", "utf-8"
}

That makes sense since the SingleReportApp is being executed in its own JVM. Maybe add a custom support for giving custom VM args? Or maybe even just the default encoding (which will be translated into a VM arg in ScoverageReport.groovy)?

maiflai · 2018-04-03T20:42:43Z

Yes, this was just a way to test that the default JVM encoding was to blame.

I seem to recall Gradle washing their hands of this a long time ago; Maven does support a project level encoding.

I suspect the reporting task is not respecting the configured system properties; I will take a look tomorrow.

maiflai · 2018-04-03T20:46:25Z

That said - are you not using the gradle wrapper scripts everywhere?

This gives you a convenient location to configure the build environment consistently.

eyalroth · 2018-04-03T20:58:17Z

I suspect the reporting task is not respecting the configured system properties; I will take a look tomorrow.

It seems so. Perhaps this only happens while using the gradle daemon (child JVMs spawned from within the daemon will not "inherit" its arguments). There could be a relatively easy workaround here by adjusting ScoverageReport.groovy as I've mentioned earlier.

That said - are you not using the gradle wrapper scripts everywhere?

Well, I make sure to use gradlew and I configure my IDE to use the gradle wrapper task as well, but I honestly have nothing else configured there. I try to make my code and builds cross platform and not rely on environment variables :)

eyalroth · 2018-04-04T14:40:29Z

I added a PR which allows to configure the plugin with a custom encoding, thus preventing the exceptions.

gslowikowski · 2018-04-09T21:15:45Z

Hi guys.

Gradle plugin should call this ScoverageWriter constructor (with sourceEncoding parameter) like in SBT plugin or in Maven plugin.

eyalroth · 2018-04-09T21:32:09Z

@gslowikowski Thanks for joining in. That is true, but it would hardly solve the problem. Take a look at the PR I created (#70) for further details on this.

kknd22 · 2018-06-13T13:44:15Z

we have seen this behavior as well - however they seem to happen very inconsistently - only happens on some ci workers but works in some. Any reason why?
Thanks
-cl

eyalroth · 2019-09-16T17:57:27Z

I've gotten back to investigating this issue and I believe I have the full picture in mind.

There are basically three steps to Scoverage:

Instrumenting the source files via the compiler. This includes generating the scoverage.coverage.xml file, which is a sort of a "map" of the source files.
Running the tests, which generate the measurement files scoverage.measurements.X.
Generating a report based on the "mapping" file from step 1 and the measurement files from step 2.

First off, there is an encoding problem in the scalac plugin with the creation of the "mapping" file in step 1. The scalac plugin relies on the JVM encoding instead of the -encoding compiler option. This is discussed in my (outdated) PR #70, and eventually results in a badly encoded "mapping" file. I believe this should be fixed in the original scalac plugin, and shouldn't be handled by the Gradle plugin at all.

The problem that the Gradle plugin is responsible for is failing to generate an HTML report file. The ScoverageHtmlWriter actually reads the original source files when generating a report; therefore, it must read the source files with the right encoding. Right now it uses the default JVM encoding to read the source files, but ScoverageHtmlWriter also accepts an encoding parameter. The ScoverageXmlWriter avoids this problem by not reading the source files, presumably since it doesn't contain source code in the report).

What then happened in my original cas, and how come some Unicode characters work and some don't? Well, my default OS encoding -- and therefore my default JVM encoding -- is Windows-1252, while my source files are encoded in UTF-8. In one of my files I had the ” (U+201D) character, which translates to 0xE2 0x80 0x9D hex in UTF-8, and that's how my source file was written on the file system. Thing is, 0x9D is not mapped in Windows-1252, so when trying to read that file with this encoding, an exception is thrown. How come the scala compiler able to read this file? That's because its default encoding is UTF-8 and is not based on the default JVM encoding.

What the Gradle plugin should do then is to invoke the ScoverageHtmlWriter with the source files encoding, which are either configured via the gradle scala plugin scalaCompileOptions, or defaults to UTF-8 much like the scala compiler.

…coverage#68)

eyalroth mentioned this issue Apr 4, 2018

Add an option to execute the plugin with a custom encoding setting #70

Closed

eyalroth added a commit to eyalroth/gradle-scoverage that referenced this issue Sep 16, 2019

Fix incorrect reading of source files when generating an HTML report (s…

d507880

…coverage#68)

eyalroth mentioned this issue Sep 16, 2019

Fix incorrect reading of source files when generating an HTML report #117

Merged

maiflai closed this as completed in #117 Sep 17, 2019

eyalroth mentioned this issue Oct 15, 2020

Add support for Scala 2.13 #145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnmappableCharacterException / MalformedInputException (problems with encoding) #68

UnmappableCharacterException / MalformedInputException (problems with encoding) #68

eyalroth commented Apr 3, 2018

maiflai commented Apr 3, 2018

eyalroth commented Apr 3, 2018

maiflai commented Apr 3, 2018

maiflai commented Apr 3, 2018

eyalroth commented Apr 3, 2018

eyalroth commented Apr 4, 2018

gslowikowski commented Apr 9, 2018

eyalroth commented Apr 9, 2018

kknd22 commented Jun 13, 2018

eyalroth commented Sep 16, 2019 •

edited

Loading

UnmappableCharacterException / MalformedInputException (problems with encoding) #68

UnmappableCharacterException / MalformedInputException (problems with encoding) #68

Comments

eyalroth commented Apr 3, 2018

maiflai commented Apr 3, 2018

eyalroth commented Apr 3, 2018

maiflai commented Apr 3, 2018

maiflai commented Apr 3, 2018

eyalroth commented Apr 3, 2018

eyalroth commented Apr 4, 2018

gslowikowski commented Apr 9, 2018

eyalroth commented Apr 9, 2018

kknd22 commented Jun 13, 2018

eyalroth commented Sep 16, 2019 • edited Loading

eyalroth commented Sep 16, 2019 •

edited

Loading