From: Trastabuga
Subject: Encoding problem with copying from pdf and pasting to emacs
Date: 
Message-ID: <1166720965.037152.287720@79g2000cws.googlegroups.com>
I am rather new to this kind of problem. When I copy a text from a pdf
file like this: "some text" and copy it to emacs I have following
\223sometext\224. Same goes for some other characters like "-".
What can I do in emacs to have same text as what I am copying? Why does
it happen?

Thank you,
Andrew

From: Pascal Bourguignon
Subject: Re: Encoding problem with copying from pdf and pasting to emacs
Date: 
Message-ID: <87psad2m2u.fsf@thalassa.informatimago.com>
"Trastabuga" <·········@gmail.com> writes:

> I am rather new to this kind of problem. When I copy a text from a pdf
> file like this: "some text" and copy it to emacs I have following
> \223sometext\224. Same goes for some other characters like "-".
> What can I do in emacs to have same text as what I am copying? Why does
> it happen?

It would probably be better to ask that in news:comp.emacs or
news:gnu.help.emacs

These bytes encode the curly double quotes in the Windows-1251 encoding.

(ext:convert-string-from-bytes #(#o223 #o224) charset:windows-1252)
--> "“”"

It's probable that the PDF document is not correctly encoded (that it
specifies ISO-8859-1 instead of Windows-1252), or that the PDF reader
you're using doesn't publish the data to the pasteboard with the
correct encoding (perhaps it says it's iso-8859-1 instead of
Windows-1252).  Or perhaps emacs doesn't take any encoding hint from
the pasteboard.  Perhaps there's a way to configure all these programs
to use utf-8 in the pasteboard?

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Nobody can fix the economy.  Nobody can be trusted with their finger
on the button.  Nobody's perfect.  VOTE FOR NOBODY.
From: Trastabuga
Subject: Re: Encoding problem with copying from pdf and pasting to emacs
Date: 
Message-ID: <1166726413.353186.8280@79g2000cws.googlegroups.com>
Pascal Bourguignon wrote:
> "Trastabuga" <·········@gmail.com> writes:
>
> > I am rather new to this kind of problem. When I copy a text from a pdf
> > file like this: "some text" and copy it to emacs I have following
> > \223sometext\224. Same goes for some other characters like "-".
> > What can I do in emacs to have same text as what I am copying? Why does
> > it happen?
>
> It would probably be better to ask that in news:comp.emacs or
> news:gnu.help.emacs
>
> These bytes encode the curly double quotes in the Windows-1251 encoding.
>
> (ext:convert-string-from-bytes #(#o223 #o224) charset:windows-1252)
> --> """"
>
> It's probable that the PDF document is not correctly encoded (that it
> specifies ISO-8859-1 instead of Windows-1252), or that the PDF reader
> you're using doesn't publish the data to the pasteboard with the
> correct encoding (perhaps it says it's iso-8859-1 instead of
> Windows-1252).  Or perhaps emacs doesn't take any encoding hint from
> the pasteboard.  Perhaps there's a way to configure all these programs
> to use utf-8 in the pasteboard?
>
> --
> __Pascal Bourguignon__                     http://www.informatimago.com/
>
> Nobody can fix the economy.  Nobody can be trusted with their finger
> on the button.  Nobody's perfect.  VOTE FOR NOBODY.

Thanks,  Pascal. I wanted to post it to emacs group but acidently put
it here. I'll do it now.

Andrew