Excel export issue for converted documents: extra spaces between letters

[Deleted User]
[Deleted User] Posts: 160
I've converted .pdf file to .docx using PDF standard tool(Since JAMA does not allow to import .pdf). Made cosmetic changes like headings, tables and pictures to upload smoothly. I've uploaded successufully without any issue(as usual).

Later I tried to exported to Excel(using Excel roundtrip mechanisation) to make the required changes, I noticed an abnormal text which is available under "Description" field ie for a ex: I've imported following text under description " The password should be encrypted for safety", after exporting to Excel what it is exported " T he e p a s w o r d s h o u l d b e e n c r y p t e d f o r s a f e t y ".

To narrow down the issue then I tried to export the same requirement to  word file, seen no issues. What I realized that, this problem appeared for only .rtf format i.e. "Description" field. No issues noticed for "NAME" field since it is not a .rtf field. Why I'm mentioning this, I exported requirements which are written in word file, I did not see any problem after excel export.

As you know majority of the clients releases their specifications in .pdf hence we've to convert them to JAMA supported format(preferred would be .docx). I do not know what's problem with excel export mechanization since it is working fine with Word format.

I need to resolve this issues since we've to work around in excel than word. Can any one suggest how to overcome this problem.

Thanks in advance.

-Karnala

Comments

  • [Deleted User]
    [Deleted User] Posts: 911
    edited June 2016
    Hi Karnala, that's an interesting issue you present. I'm curious to see what the source of the item looks like. Can you toggle to the Source of the Description for one of these items and share a screenshot of that?
  • [Deleted User]
    [Deleted User] Posts: 160
    edited June 2016
    Kristina,

    I'm not allowed to share any client confidential docs or snap shot on public forums. I'm sorry to say that. I explained what it is going on  with an example.  I did not get source of the item. You mean snap shot from word specification? if so, it is same like normal word format. If not pls. let me know.

    -Karnala
  • [Deleted User]
    [Deleted User] Posts: 911
    edited June 2016
    By "source," I mean source in Jama.
    image
    It is normal for Roundtrip exports to display HTML (to maintain formatting when the info is imported back in), so I was surprised in your example to see actual spaces manifesting. To me that signals a problem with the original source of the item (PDF>Word) and how Jama is interpreting it in the export. For this reason we need to see what Jama is storing. If you cannot share that here (I understand that), you'll have to submit a support ticket to keep it confidential. 
  • [Deleted User]
    [Deleted User] Posts: 160
    edited July 2016
    Hi,

    I imported general . pdf file(not customer) and see below. This will help you to understand more about the issue. Whatever I upload are not client specific and are public documents. If you feel uploaded images are not clear pls. let me know. I've marked with red text what is what.

    imageimageimage
    image 
  • [Deleted User]
    [Deleted User] Posts: 911
    edited June 2016
    Karnala, thanks. Are you able to get a screenshot of the Source (HTML) from the item shown in Jama? I need to see if there are any style or span tags (or something else unexpected) to try to reproduce this issue. There are no known bugs that explain this odd behavior.
  • [Deleted User]
    [Deleted User] Posts: 160
    edited June 2016
    image
  • [Deleted User]
    [Deleted User] Posts: 911
    edited June 2016
    Perfect—thank you. The letter-spacing span tags look to be our culprit. So we have to figure out how they're getting there. We use a third party library within Jama, Aspose, for processing the export of documents. The export for Excel and for Word are separate functions, so I have a feeling they parse better for Word (because it's a higher-fidelity tool for text). That explains why this happens when you export to Excel but not word. So a couple more questions for you...
    1. Does this issue happen when you Export to Excel without using Roundtrip? 
    2. What tool are you using to convert the PDF files to Word files?
    3. What version of Jama are you using?

    I am going to attempt to reproduce this and see if we can find a workaround and/or file a bug.
  • [Deleted User]
    [Deleted User] Posts: 160
    edited June 2016
    1. Does this issue happen when you Export to Excel without using Roundtrip? 
    Karnala: It is seen both export this means general export and round trip excel.

    2. What tool are you using to convert the PDF files to Word files?
    Karnala: As I mention earlier Adobe Acrobat Standard.

    3. What version of Jama are you using?
    Karnala: Version: 2015.1 Build date: 2015/04/09 .

    I think I've answered to your queries and this help you to narrow down issue.

    -Karnala
  • [Deleted User]
    [Deleted User] Posts: 911
    edited June 2016
    Thank you, Karnala, for the explanation.

    I was able to reproduce the problem by converting a PDF to Word. Importing isn't even required—you can just copy & paste the text from Word into Jama to produce the conditions for have the bad export occur. So that narrowed out the possibility that this has to do with how Jama parses imports.

    I then looked at the HTML source of the Word document and sure enough, there were  letter-spacing span tags all over the place:
    image

    This is good because it means the problem is identified—converting PDFs to Word introduces a lot of unnecessary HTML.
    However, this means the problem is not within Jama's scope, because the problem starts before the information is even put into Jama. This means you'll have to find an a way to "un-format" your Word document before importing it into Jama. Hopefully understanding where the issue begins will help you avoid it.
  • [Deleted User]
    [Deleted User] Posts: 160
    edited June 2016
    Thanks Kristina, I've suspected the same but not sure the parsing and other processing in JAMA. Do you have any inputs or suggestion to me, how can I avoid extra spacing.

    Thanks,
    Karnala
  • [Deleted User]
    [Deleted User] Posts: 160
    edited June 2016
    Kristina,

    I found issue where it is coming from( I think you may know). the conversion tool sets  or selects content under any heading(s) as " Body Text". It is quiet natural. What I tried, I've selected "Normal" style other than "Body Text" then I did not see any extra span tags. When we export to word JAMA sets as "Body Text" style hence we did not see extra spaces since word has inbuilt styple. Whereas Excel, does not has any text style. Excel only supports normal style. This is what I seen after several trials. 

    I do not know whether it works or not but what I can say , if you change text styles from any other to normal style while parsing then users will not see problem. Regardless of what the converter tool doing.

    Is it make sense?

    -Karnala
  • [Deleted User]
    [Deleted User] Posts: 911
    edited June 2016
    That does make sense, but the problem is there is no bug on Jama's side that we can fix: even if you copy and paste the text from the Word document into Jama, it will contain the tags. So we'd have to do additional HTML scrubbing on export to Excel. But that isn't really a possibility either, given that Roundtrip needs to maintain all the HTML that is in an item.

    We can consider this as an Idea and something we need change in Jama—I do not think there is an easy solution to have this work. The idea I would really like to see come to fruition is a way to import PDFs natively.
  • [Deleted User]
    [Deleted User] Posts: 160
    edited July 2016
    I'm stuck with tool. Do you have any recommended tool to avoid extra spaces while converting from pdf to word. I've tried few tools but those did not meeting my needs. Pls. do needful
  • [Deleted User]
    [Deleted User] Posts: 911
    edited July 2016
    Karnala, I am sorry that I do not have a suggestion for converting PDFs to Word in a manner that will import readily.
  • BertrandLechevalier
    BertrandLechevalier Member, Jama Connect Interchange™ (JCI) Posts: 4
    edited March 2023

    Hi

    I am reopening this thread because the problem is on JAMA side.
    In any rich text field, you can apply a font color or a backgrond color for example.

    In the description, type "This is an example of JAMA export problem"

    Select the M of JAMA and change the color of the letter to red. Then export into excel.

    You will get "This is an example of JA M A export problem.

    Here is the reason: When you put a color on the letter, JAMA adds an HTML tag <span style="color:#e74c3c">M</span> 

    Then, the excel export behaves as if the <span...> and the </span> are spaces.

    That's really annoying.

    ------------------------------
    Bertrand Lechevalier
    ------------------------------
    -------------------------------------------
    Original Message:
    Sent: 07-06-2016 19:02
    From: Kristina King
    Subject: Re: Excel export issue for converted documents: extra spaces between letters

    Karnala, I am sorry that I do not have a suggestion for converting PDFs to Word in a manner that will import readily.
  • [Deleted User]
    [Deleted User] Posts: 152
    edited March 2023

    Hi Bertrand, 

    It looks like you're already in touch with our Support team, and they're in the best position to work with you on this -- thank you for reaching out to them, and we'll be continuing the conversation through the filed ticket. 

    As a side note, since this thread is a few years old, I'm going to close it in the meantime. 

    ------------------------------
    Carly Rossi // she/her
    Community Manager // Jama Software
    Portland, OR
    ------------------------------
    -------------------------------------------
    Original Message:
    Sent: 03-01-2023 06:13
    From: Bertrand Lechevalier
    Subject: Re: Excel export issue for converted documents: extra spaces between letters

    Hi

    I am reopening this thread because the problem is on JAMA side.
    In any rich text field, you can apply a font color or a backgrond color for example.

    In the description, type "This is an example of JAMA export problem"

    Select the M of JAMA and change the color of the letter to red. Then export into excel.

    You will get "This is an example of JA M A export problem.

    Here is the reason: When you put a color on the letter, JAMA adds an HTML tag <span style="color:#e74c3c">M</span> 

    Then, the excel export behaves as if the <span...> and the </span> are spaces.

    That's really annoying.

    ------------------------------
    Bertrand Lechevalier
    ------------------------------

    Original Message:
    Sent: 07-06-2016 19:02
    From: Kristina King
    Subject: Re: Excel export issue for converted documents: extra spaces between letters

    Karnala, I am sorry that I do not have a suggestion for converting PDFs to Word in a manner that will import readily.
This discussion has been closed.