Strange characters in text file

However, if I try to ssuggest see the contents of s by inputting "s", right here is what I see:

"x00Vx00ex00rx00sx00ix00ox00nx00Px00ex00rx00sx00ix00sx00tx00:x00 x001x00 x00 "What is going on here? I want to manipulate the actual string, and also not have to deal with these weird x00 personalities. I can just replace all x00's through empty strings, yet on various other lines there are other x** personalities, like xff, for example, and also I would prefer a durable solution.

You watching: Strange characters in text file

Any help would be significantly appreciated. Thanks!


7 comments
share
save
hide
report
80% Upvoted
This threview is archived
New comments cannot be posted and also votes cannot be cast
Sort by
best


*

level 1
gschizas8 years ago
It's obvious (well, to me anyway) that the file is Unicode. In various other words, you're not opening it right.

See more: Paypal Transfer Not Showing In Bank Account, Where Is My Transfer

I'm really sleepy right currently, so here's the brief version: A string is not constantly a precise the exact same as a depiction in bytes. In fact, it's just the very same in a subcollection of all the languperiods, largely English (and Germale and also a few more).

See more: Faq ( Windows Media Player Apply Media Information Changes "

I'm not making much sense, however right here it is: The many common interpretation of a message (a string) is by assigning one letter to precisely one byte. So, "A" is 65 (0x41), "B" is 66 (0x42) and so on This will just be great for around 256 personalities (in reality, a few are reserved for regulate characters). This will certainly cover English and also some of the Western European langueras. This will not leave any kind of space for non-Latin alphabets, such as Greek, Cyrillic, Hebrew or even Japanese and also Chinese. So, rather of making use of one byte per character, you can usage two. So, "A" is assigned aget to 65, yet it's 2 bytes (0x0041, or, as they are created to memory/disk: 41 00). To cut a lengthy story brief, as soon as you open up a document, there is an implied encoding once you open it, and also it's the "English" encoding. This particular file was made in unicode (or UTF-16), so you have to decode the byte data you are really analysis into an actual string. To perform that, in Python 2 (that you seem to be using), you require the codecs module (Python 3 is much, a lot more clear than that)

I probably am not making much sense (because sleep deprivation), but if you watch/read the Pragmatic Unicode presentation (I hope that's the correct link), it must end up being clear to you.

To your trouble at hand: