How do I select text in a PDF?

Brendan Burgess

Founder
Messages
53,309
I have looked for an idiot's guide to PDFs and I have been unable to locate one.

I have no need for writing PDFs.

But from time to time, I get documents in PDF and I sometimes find them very difficult to deal with. For example, I have a long report which I want to comment on. It seems to be an image rather than a text file. I hadn't been aware that there was this difference. Am I right in saying that I can select bits of a PDF text file but I can't select bits of an image PDF even if it looks like text?

Is there any advantage to me in getting Adobe Pro or Nitro PDF or any other package? If it's a text PDF, I can select it and copy it into Word. If it's an image, I can't do anything with it.

I could use OCR software for the image of text, but how good is it? Is there a particular OCR package suitable for text in PDF images?
 
If you can't select text from it, it may be locked in which case Adobe Pro won't help you. However if it really is just image-based - likely in an archival scenario - then the OCR element of Adobe Pro is excellent.

The image type is not really an issue, an image is an image.
 
If it's locked, then can I scan in the letter and use OCR software to read it?

Can I tell from a document whether it is locked or not?

Brendan
 
I've had to do this at various times.

Google "unlock pdf" and you'll get free downloads that will allow you to unlock any pdf. You can then cut & paste.

Generally that will work. If the .pdf file was not created from a word document but from a photocopy scan, then it won't work.
 
If it's locked, then can I scan in the letter and use OCR software to read it?

Can I tell from a document whether it is locked or not?

Brendan

That ^ is what I've done before - a quicker but less satisfactory approach is to use the "Print Screen" button - copy the image of the screen to MS Paint and copy and paste the required text into a word doc. This is useful for diagrams/tables etc but not so much for text.
 
If it's locked, then can I scan in the letter and use OCR software to read it?

Can I tell from a document whether it is locked or not?

Brendan

Hi Brendan,

Go File->Properties and then click on the security tab. That will show you what is allowed. Often on purchased PDFs you can't print or save into other formats for copyright reasons.

Jim.
 
I was forwarded an email by a user of Nitro Pro with a special offer on the following

Nitro PDF Professional v.6 with OCR, English

Seems exactly what I need and it has cost me €85 compared to around $700 for the Adobe version.

I will let you know how I get on.

Brendan
 
I had a similar problem, and I just emailed the writer of the document and asked for the original image in Excel or Word.
 
Nitro PDF worked excellently.

I have just converted an PDF in image format into a Word document. I have noticed only one mistake.

The layout has been retained.
A table looks a bit odd, but that is fine.

Brendan
 
I have just converted the 96 page Central Bank's Consultation Paper on the Consumer Protection Code into a Word file. It only took a minute or so.

Now I get to work on a Word document rather than a PDF which is so much more convenient. OK, I don't get to see two pages of pictures of the Central Bank Building and Logos, but I can live without them.

Why didn't I do this years ago? I have wasted a good part of my life being frustrated by PDFs

Brendan
 

I have to say you picked a excellent choice in nitro pdf, I have used it a few times to edit PDFs for people and myself, yes sometimes it changes layouts but it is a great product
 

Do you really need another product to do this? Couldn't you just cut/paste the text from the pdf, like this; [AAM would only allow me to post 40,000 characters, but I was able to select the whole lot from the pdf]

 
I used to cut and paste like that, but the formatting is all over the place. page numbers appear in the middle of sentences - especially with bills and acts.

Nitro retained the formatting which makes it a lot easier to read. Askaboutmoney doesn't, but see below for an idea.

I

Code:
   [B][COLOR=black][FONT=Calibri]BACKGROUND[/FONT][/COLOR][/B]
  
  [COLOR=black][FONT=Calibri]1.1       The  Central  Bank  of  Ireland  is  committed  to  the  provision  of  a  comprehensive consumer protection framework which sets out requirements for regulated firms when dealing  with  consumers.            On  28  October  2010,  the  Central  Bank  of  Ireland  published Consultation Paper CP 47 [I]Review of Consumer Protection Code [/I](CP47) setting out proposed new and amended requirements in order to strengthen the existing Consumer Protection Code (the Code) which was introduced in July 2006, with an Addendum issued in May 2008.[/FONT][/COLOR]
  
  
  [COLOR=black][FONT=Calibri]1.2       Our  overriding  objective  continues  to  be  the  strengthening  of  the  consumer protection  framework  and  the  introduction  of  revised  measures  which  will  benefit consumers in their dealings with regulated firms.                                                             To this end, the revised Code, once implemented, will provide a number of additional protections for consumers of financial services along with enhancements to existing provisions.[/FONT][/COLOR]
  
  
  [B][COLOR=black][FONT=Calibri]A[/FONT][/COLOR][/B][B][COLOR=black][FONT=Calibri]NALYSIS TO DATE[/FONT][/COLOR][/B]
  
  
  [COLOR=black][FONT=Calibri]1.3       51 submissions were received in response to CP47, all of which are available on our website at  [/FONT][/COLOR][URL="http://www.centralbank.ie/"][COLOR=blue][FONT=Calibri]www[/FONT][/COLOR][COLOR=blue][FONT=Calibri].centralbank.ie[/FONT][/COLOR][COLOR=black][FONT=Calibri].[/FONT][/COLOR][/URL][COLOR=black][FONT=Calibri]   The submissions received represented a cross section of interest from consumer representative bodies, financial institutions, trade bodies, representatives from the voluntary and community sector, regulatory bodies, individuals and academics.  We would like to thank all those who made submissions to us on this very important topic.[/FONT][/COLOR]
  
  
  [COLOR=black][FONT=Calibri]1.4       We have undertaken a robust analysis of comments received during the consultation process in addition to undertaking some further research and analysis of the issues under consideration.  We believe that, the approach now proposed in this consultation paper on many of the topics and issues raised in CP47 strikes the right balance between listening to[/FONT][/COLOR]
  
  [COLOR=black][FONT=&quot]
[/FONT][/COLOR]
 
OK, I get it now. If anyone else is having similar problems with pdf documents produced by public bodies, you could remind them of their obligations under Section 28 of the Disability Act 2005, i.e.

"Where a public body communicates in electronic form with one or more persons, the head of the body shall ensure, that as far as practicable, the contents of the communication are accessible to persons with a visual impairment to whom adaptive technology is available."

PDFs are not generally accessible to adaptive technology used by people with vision loss, unless they are specifically tagged and structured as accessible PDFs, which is quite unlikely. Some public bodies get round this by publishing in HTML, along with PDFs. The HTML version is usually easier to cut and paste.