Skip to content

Extra line Read as Path in Pdf reading when ClipPaths option is True #1242

@MarcoSegapeli

Description

@MarcoSegapeli

Hi PdfPig team,

I ran into an issue when reading the attached Pdf file Halle12.pdf in my application that uses PdfPig (PdfPig Version="0.1.13", PdfPig.Filters.Dct.JpegLibrary Version="0.1.12.1").

In my application I simply read all the paths stored in Page.ExperimentalAccess.Paths, convert them to lines and render them into the Design.
When ClipPaths is set to true in the ParsingOptions, in the pdf rendering appears an extra vertical line
I attach a sample image produced by my application when reading the pdf:

Image

As you can see, in the pdf rendering there's an extra vertical line. We tried several PdfReader (Adobe Acrobat, MuPdf) and all of them does not show the extra vertical line. I attach image of rendering of this pdf using MuPdf:

Image

That extra line comes out directly from PdfPig Path parsing.
I produced a simple test application (that I attached here) with minimal code for reproducing the problem. I used following lines:

private class MyFilterProviderTest : UglyToad.PdfPig.Filters.BaseFilterProvider
{
    /// <summary>
    /// The single instance of this provider.
    /// </summary>
    public static readonly IFilterProvider Instance = new MyFilterProviderTest();

    /// <inheritdoc/>
    private MyFilterProviderTest() : base(GetDictionary())
    {
    }

    private static Dictionary<string, IFilter> GetDictionary()
    {
        var ascii85 = new Ascii85Filter();
        var asciiHex = new AsciiHexDecodeFilter();
        var ccitt = new CcittFaxDecodeFilter();
        var dct = new JpegLibraryDctDecodeFilter(); // new filter
        var flate = new FlateFilter();
        var jbig2 = new Jbig2DecodeFilter();
        var jpx = new JpxDecodeFilter();
        var runLength = new RunLengthFilter();
        var lzw = new LzwFilter();

        return new Dictionary<string, IFilter>
        {
            { NameToken.Ascii85Decode.Data, ascii85 },
            { NameToken.Ascii85DecodeAbbreviation.Data, ascii85 },
            { NameToken.AsciiHexDecode.Data, asciiHex },
            { NameToken.AsciiHexDecodeAbbreviation.Data, asciiHex },
            { NameToken.CcittfaxDecode.Data, ccitt },
            { NameToken.CcittfaxDecodeAbbreviation.Data, ccitt },
            { NameToken.DctDecode.Data, dct }, // new filter
            { NameToken.DctDecodeAbbreviation.Data, dct }, // new filter
            { NameToken.FlateDecode.Data, flate },
            { NameToken.FlateDecodeAbbreviation.Data, flate },
            { NameToken.Jbig2Decode.Data, jbig2 },
            { NameToken.JpxDecode.Data, jpx },
            { NameToken.RunLengthDecode.Data, runLength },
            { NameToken.RunLengthDecodeAbbreviation.Data, runLength },
            { NameToken.LzwDecode.Data, lzw },
            { NameToken.LzwDecodeAbbreviation.Data, lzw }
        };
    }
}


protected override void OnLoad(EventArgs e)
{
    Page[] documentPages = null;
    IEnumerable<TextBlock>[] textBlocksByPage = null;
    IEnumerable<IPdfImage>[] imagesByPage;

    UglyToad.PdfPig.ParsingOptions parsingOptions = new UglyToad.PdfPig.ParsingOptions()
    {
        ClipPaths = true,
        FilterProvider = MyFilterProviderTest.Instance
    };

    UglyToad.PdfPig.PdfDocument document = UglyToad.PdfPig.PdfDocument.Open(@"Halle12.pdf", parsingOptions);
    {
        documentPages = new Page[document.NumberOfPages];
        
        for (int i = 0; i < document.NumberOfPages; i++)
        {
            // after this line, documentPages[0].ExperimentalAccess.Paths[174][2] contains the Path associated with extra vertical line 
            documentPages[i] = document.GetPage(i + 1);
        }
    }
}

Here image of the path associated with the extra vertical line obtained at debug time (from {(x:732.36, y:378.36)} to {(x:732.36, y:0)} ):

Image

I noticed that, setting the option ClipPaths = false, the path disappear when reading and the pdf is visualized as expected:

Image

However, when reading information about the parameter, I noticed that "Bezier curves will be transformed into polylines if clipping is set to true". According to this description, I expected to have a similar rendering visualization. So maybe some path is badly converted when setting ClipPaths = true.

My questions is:

The extra line with ClipPaths = true is a bug in PdfPig when parsing data? There's a Bezier Curve badly transformed into polylines? Can you please investigate this issue?

I attach a simple WinForm solution for reproducing problem in HalleTest.zip compressed folder.

Thank you in advance for your support.

Halle12.pdf

HalleTest.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions