-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge Telegram JSON export into Signal backup #153
Comments
Hi! This should indeed be possible, as I mentioned in our email exchange, at least to some useful degree. I have already started working on this (nothing usable yet, just pushed some early code this morning, though locally I have more already), at first by installing Telegram to get some data. I will try to ask around to see if anyone I know also has Telegram so I can actually get some real data (I currently only have messages in an empty group I made with just myself). But that shouldn't be a big problem. I had found the same links about the JSON format and now found the actual current export to be slightly different. I agree the format is pretty simple, the problem is I can only deal with what I know. For example, I see in my (very small) export the key Two things I already noticed in my current JSON export:
Anyway, this is just to let you know work has started already, but please do not expect it to be done quickly (it will probably be weeks). I will probably have some questions about Telegram in the process. And, especially in the beginning, expect a ton of errors when trying out new code (I will let you know when there is something to test).
That is very generous of you, thank you so much. |
Hi @bepaald, thanks a lot for starting the work on this already! This all sounds very promising. It's unfortunate that Telegram doesn't provide any documentation, but let's see if they get back to me with that. In the meantime, from your initial exploration it sounds like reverse engineering could work. The bugs and limitations you mention are nothing serious. The problem with multiple attachments might be a bit annoying for messages that contain many photos, but that's not important and in any case it can probably be fixed easily by merging messages that were sent within e.g. 1 sec. Since Signal does not support 'polls', they can probably be disregarded. 'channels' are essentially 1-way chats -- it might make sense to allow import anyway, but I'd say it's not very important. I'm a bit surprised that we have 'secret chats'. Those are e2ee chats, but afaik they are stored only on-device and I did not expect the extract to contain any data relating to them. Anyway, the vast majority of communications on Telegram happens through standard non-e2ee chats (which is the main issue with Telegram), so again this is not a big problem. If you need someone with whom to exchange messages on Telegram I'm happy to help, just let me know via email :) |
I think I could do with a little feedback on this now. If at all possible, maybe you could give it a try? No need to import the result into Signal, just run the tool and look at the output (or paste it here). If there are no major errors, and the tool actually finishes successfully, you could add the Instructions: The tool will need to know which recipient in the backup is yourself. It can usually determine this automatically, but if it fails it will tell you and you will need to add the The tool will first scan the json file for all recipients and tries to match them by name. If it fails to do this for any contacts, you get an error stating so, and you should add the
I might need that, I'll let you know. Currently my JSON file looks something like this:
(Obviously, it has more than just two messages). Depending on the export settings there are other top level data entries (like, 'stories', 'personal_information', 'contacts', etc.), but they are all skipped. I'm sort of assuming the only difference between this group chat and a personal chat is the line Some current limitations (some I've mentioned before):
Anyway, I expect some bugs/problems in a first test like this, but things are mostly working for me currently, so some feedback might come in handy. No hurry, whenever you have time. Thanks! |
@bepaald Sorry I haven't yet found the time to test this out! I'll do it asap (hopefully this week). In the meantime I have some good news: Telegram got back to me about the JSON documentation and apparently they have one! Very weird that it seems to be poorly indexed by Google... This and this are also interesting. It's the opposite of our use case: importing chats from other apps into Telegram. Obviously it's not directly useful here, but maybe it's an interesting source of ideas. |
No worries, there is certainly no hurry. That's a good resource, I've not studied it fully but it looks like I'm on the right track I think. There is a lot more I didn't know existed, but most of those things will not be possible to convert to Signal (either because they don't exist, like HTML5_Game (?), or because they will not be compatible (like payments and gifts). Very busy at work this period, but I will look closer at it when I have time. Thanks! |
Does it make more sense to define a simpler JSON format (+attachments) which is easy to incorporate into signalbackup-tools, leaving the conversion for external scripts? I have a similar need to import chat history from element, and I imagine that implementing an import feature for 1 or 2 apps might lead to a lot of brittle and difficult-to-test code, as well as more feature requests for importing from new apps. |
Well, it is indeed the intention to convert any other JSON formats to the one supported (either by this tool, or by the user) and let the existing import functionality do the actual work. Just not sure whether to use this Telegram format or try to define my own 'simpler' format. I think the Telegram format is already not too complicated. When supporting the huge number of features most modern messengers do (groups, attachments, quotes, reactions, etc.), some complexity is bound to appear. Defining a new universal JSON format for this seems like a big undertaking. Currently in favor of using the Telegram format is:
Just last weekend, I used this function to import plain old SMS messages (as exported by Do you find the format overly complex? Do you need help converting the 'element' chat history? What does that format look like? Note, while the Telegram JSON format has many features, most are not required by this function. At a minimum, only the |
Sorry, I didn't realize that this feature was mostly complete already. I didn't see any information about it in the readme or help output (using the nix package). The Telegram format is probably fine, actually. I don't need any help converting to it at this point, but I may be back with questions later. Thanks for the quick response! |
Ok I finally had time to test this! Sorry for the (huge) delay So far I've only tried to export to HTML and quite a few things already seem to work great! This is quite amazing, great job :) What worked:
Here are some issues I faced:
Overall, I think it would be great if you could add support for single-chat exports and/or for selecting the chats to import from multi-chat exports. It is possible to download single chats from Telegram Desktop by entering the chat and clicking on the three-dot menu > Export chat history. With this, I could try to do more tests with my own chats (which include years of messages) without (hopefully) incurring in the previous issue. Thanks a lot again! |
Thanks so much for your thorough feedback. Those are all clear and actionable issues, I can definitely work with that (in fact, Not sure when I'll have time, but the first thing I'd want to try to deal with is Other priorities are supporting single-chat exports and selecting conversations from multi-chat ones (somehow). But the other issues will all be addressed as well I think. Hopefully I'll have some time tomorrow, but I'll let you know when things have changed enough for testing. Thanks again! |
Awesome, thanks! And yes, the 15GB include media.
|
Just a quick update, I've been very busy the last few days. With this issue, among other things. While solving some of the issues you mentioned above I've also done a lot of refactoring, and not a lot of testing, so I hope I didn't break anything that was working before. Hopefully:
I have not yet dealt with That leaves |
I've tried the new commands both with a single-chat exports and the 15GB full exports, they seem to work flawlessly! Amazing :) I've found one bug though. While trying to export a group, I got the error message I'm not sure what's the best way to address this. Maybe automatically add a "Deleted Account" contact in the Signal backup? But then would that mean that Deleted Account would appear as a member of the imported group? That's not very clean. Perhaps an alternative solution would be to use the |
Excellent, thanks for the feedback. Did you just try the import function, or also the HTML export to look at the results?
Ok, I have been thinking about this, but not sure yet what to do. In the import functions (this, and
That's also a possibility, but then the messages would appear as outgoing messages (colored bubble on the right hand side)... that's also not very clean I think. I'll think about it some more, maybe try some options out next weekend. If you have a strong preference or other ideas, let me know. Thanks! |
All the tests I've run so far are with import + export to HTML, and I've manually checked the HTML to confirm it looks good. I might've missed some issues, but the ones I reported here are all those that I spotted :) I haven't yet tried to actually restore the backup on the phone, but that should be straightforward, right? Regarding the "Deleted Account", everything you say makes sense. I think that the possible solutions from best to worst are:
|
Ok, good.
Well, if the htmlexport would show problems, the backup almost certainly has problems, so that's a good sign. But that doesn't necessarily work the other way around. I think it should be good, but the html export is certainly less picky than Signal itself.
I agree with this. I'm almost certain option (1) will work, and I don't foresee any problems, but whether or not it's future proof will always be a gamble. The type of contact I would insert does not exist in a natural database as far as I can tell, I've been trying to get one in there. Any recipient will always have at least one of: phone number, aci, pni, group_id. The only two exceptions to this are distribution lists (for sharing stories), which have a specific A simple workaround would be to give the fake contact a fake phone number, this can naturally occur in the database, and so should be future proof, but seems like an ugly solution (but would be equivalent to option (2), where the user would also have to come up with a fake number: otherwise the contact will not show up in Signal). I'm postponing the dicision a little bit, because I think I have to rewrite the contactmapping first. I've been using the name of the contacts to do the mapping, but I should really be using the I think I'll have time to do the contact mapping tomorrow. Thanks! |
Yeah it's not super clean, but I think it can be fine. I can simply create a contact with number 0000 or something like that. Having something automated would be a bit nicer, but if there's a risk that Signal might complain about the database one day then I feel it's not worth the risk.
Ok sounds great! I can confirm the |
Just a quick message/question since I'm logged in anyway. I've been redoing the contactmapping, using id's instead of names, but it was a bit more complicated so it's not yet finished. I'm also trying to be more clever about automatically matching Telegram contacts to Signal contacts, but notice I make assumptions occasionally that I don't know are true in Telegram. From your earlier message:
I understand a chat name can be different from the contact name. I'd like the program to automatically link these contacts so that only one of them needs to be mapped. I think can do this if:
Since personal chats only contain (max) two But this is only true if — like I was blindly assuming (and as is the case in Signal) — you can only have 1 |
I'm quite sure you can have only one
As I wrote in a previous comment, I believe this happens when the first message of the conversation was written by Bob and Bob is not in my address book, in which case Telegram automatically names the conversation using only "Bob" (where "Bob" is the first name that Bob has set for himself in his account) rather than "Bob Smith" or just "unknown contact" or Bob's phone number (Telegram doesn't even need a phone number to sign up). To clarify: if I then save Bob's contact as "Bob Smith", then the Telegram app names the chat as "Bob Smith". But apparently this is not reflected in the json export (I guess for good reasons). Overall, I think your assumptions hold :) |
Thanks! I just pushed the update to use contact id's. It's pretty messy code, not too proud of it, but I hope it works. There are still many, many (rare-ish) corner cases in which things can go wrong (like when two different contacts have the exact same name, or a contact is called "user122325" while that is also an existing id in the json file). Also the user supplied map option may not work if one has contacts with So, the main thing that has changed should be that The program now also attempts to determine when different contacts are really the same, so it should hopefully ask for fewer contacts to be mapped manually. This process is helped by a correctly mapped "self", or when importing from a full export (not single chat). There is also a new option Lastly there is the option The change was bigger and more complicated than I thought, I hope I didn't break things that were working before... I also noticed during testing I'm not setting any delivery/read-receipts on the newly imported messages. The Telegram export has no data on this. For me personally, I think I'd like the messages to appear as delivered (but not read) by default. Just because I think most contacts have delivery receipts turned on, but not read receipts? But maybe you (or others) would feel differently? Maybe it should be another command line option? Also, for group messages, Signal also has detailed delivery reports (per group member and with timestamps). These will probably be too difficult to implement correctly as it is not easy (maybe impossible?) to know exactly which contacts were members of the group at the time any message was sent. So even though these message would appear as being delivered/read in the main view, when long tapping the message to look at the details there would be nothing there. I don't consider this too big of an issue, I think it's pretty rare for people to look at that information anyway. |
Ok I've tried to import two chats (one group and one personal) and it seems to be working great! I've spotted only one bug: I think the
Also, is |
Good catch, should be fixed now, thanks!
Yes, to everything. The auto-mapping is definitely not solid enough (and often impossible) for the function to not support manual mapping. Thanks! |
I've tried to import some chats into my actual Signal chats on my phone and it seems to work great! This is really amazing, awesome job! I think there's still some issues with apostrophes, probably when it's in the "forwarded" feel (I'm using
I also have about 40 lines with this:
I can't spot any issue in the imported chats though. The warning is a bit weird since (I think) I'm using Thanks a lot! |
Ah, stupid of me, I fixed it for the message body before, but it didn't occur to me the same could happen int the contact name. Should be fixed now (poorly tested, I'm late for work :))
At first glance I do not think it is a problem (I believe in this part it is only adjusting things to make a cleaner prompt for the user to present unknown contacts (if any)), but it is unexpected that the process would ever reach that code. I'll investigate when I'm back from work. Thanks for the feedback! |
I had a go at removing the |
Hi! I've finally tried to do the big import of a lot of my messages into my actual Signal, specifying manually the chats to import and the mapping for json contacts. It mostly seems to work well, and that's a 12GB import! 🚀 However, I've encountered a nasty bug with the Is there any way I can help you debug this? Btw, I haven't used Also, small issue: there's a typo when using Thanks a lot! |
Ok, that sounds promising
Hm, I was hoping that wouldn't happen. As I mentioned before I haven't so far created contacts in backups before (because I expect it to not work), but with a contact already in the database I was hoping it was ok. Apparently not. I would guess the program needs something in the database filled in which it is currently not for this recipient. The question is what (and then, can we do it)? My first thought is the contact may need an 'aci' (used to be called 'uuid'), which would usually be assigned server side upon registration. I think you should be able to get a crash report by, after letting the app crash, starting it back up and generating a debuglog (settings->help->debuglog). I think the crash (uncaught exception in Signal) should be in there even after restarting. The other option would be to turn on USB debugging on your phone and let the program crash while attached to your computer with I might try this myself at some point, though currently all my testing phones have expired SIM cards, so I can't receive the registration SMS, so I can't currently restore backups on them. So that will be a while, but I will eventually get around to it if you don't beat me to it.
Yes, specifying all contacts should be enough. If you run the tool with If you specify all contacts, I think there should be no output between
Fixed! Thanks!
Thank you once again for your feedback! |
Sorry for the delay! Here's the most relevant line I could find in Signal's debug log:
where XXX is the Signal recipient ID that I'm trying to use (overzealously redacted) . I can send you more lines via email if necessary! In the meantime, I've just bought a cheap SIM card and used that one to create a Signal account for this purpose. I confirm that everything seems to work with that! I'll report back if I spot any issue :) And I'm happy to help you debug the fake user issue and also to give feedback on the documentation for this feature if you plan to write it. I also want to sincerely thank you for working on this in the past few months! It's pretty great that everyone now has a tool to bring their Telegram conversations to Signal. I've just sent you a donation, it's just a small gift to show my gratitude for your work and friendliness :) |
Hey thanks for reporting back. No worries about the delay. That error really does seem like an ACI is needed for recipients with messages in the database. To be sure I think I'd need to see a bit more of the error. Usually the initial exception (in this case
Assuming the error log will confirm creating a Signal contact (real or even fake) out of nothing, I think that indeed was the only option (besides just skipping those messages altogether). Very happy to hear everything else seems to be working so far. I hope it stays that way, but indeed let me know if anything comes up. I'll leave this issue open while I muster up the courage to try and write some documentation for this :) Not my favorite thing to do, especially since I think this function is a bit more complicated for the end-user than most. I'll report back when I have something, so you can take a look at it (if you have time of course), it may be a while though. Initially I'll just quickly link to this issue in the readme.
Thank you for your continuous testing and feedback. Functions like these are practically impossible to implement without valuable feedback. You have been very helpful. And thank you for another generous donation, though not necessary it is certainly very much appreciated. |
Hello, first of all thank you for your amazing work on this tool! With it, I was able to decrypt my database, assign a new id to a broken recipient, and afterwards encrypt and import the database again (worked perfectly). Now, I'm trying to write some translation-scripts to import my WhatsApp chat history into Signal. I skimmed this thread a few days ago, I believe you said you wanted to only accept the Telegram json format, correct? Here's an example WhatsApp json I have right now (one file per chat, slightly modified export of WhatsApp-Chat-Exporter):
|
Hi! Thanks, glad the tool has been useful to you.
Well, it's not that it has to be Telegram's json format, but I did not want to support many different ones (if we could just convert between them). The Telegram format just so happened to be the first one to be completed (and requested), so for now I'm going with it. If there is some other very prevalent json format, I could always write an option to also accept that and do the conversion internally, instead of having the user do it.
I'll try. I believe the program expects a json array signalbackup-tools/jsondatabase/jsondatabase.cc Lines 127 to 143 in c1dcba0
Some of those are obviously not required (if the message contains no attachment, there is no A short example:
For single-chat json, you could just leave out the top level
If you need any more description of any of the fields used, let me know and I'll try to explain as best I can. Also, if the WhatsApp json contains any message-types/attributes that are not available in the Telegram json format, but could be supported by Signal, there should be no problem in adding more fields to the json format this tool supports (they will just be ignored if they don't exist (like in a Telegram json)). It's a lot to explain, I hope that is somewhat clear, but let me know if it's not. Good luck! |
Thank you for your detailed answer. I think that should be all the info I need, I'll look into it in the coming weeks. |
Hi, been trying to import my Saved messages over from Telegram(into Note to Self) and I'm not sure if the listed command will work for that kind of chat? Also how would it handle Messages I forwarded from other contacts into the Saved Messages that aren't on Signal? Thanks for the help! |
I just ran across this myself, mapping the chat IDs, but it doesn't work (yet). |
I did not know "Saved messages" existed in Telegram (nobody told me). Now that I know, this should be a 1 line fix. Is the 'saved messages' conversation completely equivalent to Signal's 'note to self'? That is, all messages in there are always from and to oneself (in the json file, is the Forwarded messages in Signal are not normally annotated in any way (there is no way to tell they are forwarded). Only if adding the option I noticed just now, when forwarding to Saved Messages from a group, it had both Thanks! |
I've just enabled initial support for importing 'saved_messages'-type chats. Let me know what does and does not work. Thanks! |
Hi, thanks for responding. While yes it does seem to work now instead of spitting out errors. I'm now having an issue with my attachments? "[Warning]: Failed to open image for reading: \photos/photo_162@22-08-2024_19-40-48.jpg I keep getting errors like this which well seems to mean it cannot read/import the appropriate file. But I never changed the Export folder structure and a search reveals these files are still in the appropriate folder. Not quite sure what the issue is here. |
Thanks for reporting back. I think I found the problem. I can only reproduce this by running the tool from within the directory where A work-around would be to actually specify a path for the json file, even if it's the current path: Thanks for reporting! |
Should be fixed, it was a tiny change. Let me know if it works now. Thanks! |
Yea that's fixed now. About 95% works. But some files I had with Japanese in the name(somewhat of an edge case I'm sure lol) are failing still, Oh yea according to the terminal what I get for attachment only messages where the file was not found(Which I excluded from my initial export cause size and all) it says it just "inserted empty message". I wanted to ask if you could instead have it insert the filename instead with a [DELETED] tag maybe so that it'd be possible to go over it after importing to Signal and grab whatever isn't there manually? |
This should now also be fixed hopefully. (The cause was Windows not using UTF8 like every other system)
There was actually a little bug in the "inserted empty message" warning, which caused it to show up too often (sometimes in cases where there was no empty message). That should be fixed now. Default behavior currently is to delete messages that have no content (as seen in your earlier warning I could certainly make an option to insert some custom message when an attachment could not be found, but I'd much rather that all attachments are inserted correctly. The goal is that the only time you see If still needed I'll try to add an option for marking messages with failed attachment sometime soon (maybe tomorrow), I think it'll require a little bit more time than these quick little fixes I've been doing today. Thanks! |
Okay tried it again and while this time very few errors popped up, those being styling issues and this But after you patched the Japanese text issue the file size dropped down to 30MB(Same export). I thought that was weird but didn't think much of it, so now after trying to use my Real bigger export of about 400MB, I again got a file of 30MB. I was confused now but decided to try it anyway, and as I probably should have expected it didn't work. Signal started loading through the messages got to 300 or so and then just cut off. No errors or nothing. So I tried the 70MB file and while that is incomplete as you'd expect with all the attachments missing, it still loaded up. So yea I'm not sure what went wrong this time sadly. (Also not really relevant but it did also throw an error/fail to import a Contact I sent, non VCF. Some propietary telegram thing https://i.imgur.com/SJ6YVUf.png) |
I had a guess to where this came from and tried to fix it (though it was a harmless warning).
That is very strange. I did find one more problem just now, simply another place where I needed to deal with Windows filepath encoding, but I would have expected the program to print a clear error message in that case. Still I hope that was it, maybe you could try again? You could consider adding a By the way, you could also use this program on the generated backup with the
Well, it's pretty much expected something like that is not supported, I just hope it is cleanly skipped and doesn't cause any errors/bad data. It would be interesting to know what this message looks like in the json. Sorry the import's not going so smoothly at the moment. But we'll get there eventually! Thanks! -- It's getting late here by the way, if anything's not right this time it'll probably be tomorrow before I can do anything about it. |
Really appreciate you taking out the time for this by the way, the output is full sized now and imported just fine. Though imported files are just left as "Unnamed file" https://i.imgur.com/u2YrcaP.png |
Good to hear it's mostly working!
Should be fixed now.
The log also looks good, I'm just curious about that one Thanks! |
Oh the Empty Message was just the Contact I mentioned before. But yes. Everything seems to work Perfectlyyyy now. I'm very grateful for all this. Cheers! |
Thanks for your valuable feedback, it uncovered and helped solve a few issues. I'll look into the shared contact message later (I just sent myself one for testing), though it'll probably just turn out to the program more cleanly skipping the message, instead of inserting an empty one. Thanks again! |
I originally kept this open because I had to updat the README, but I have since done that (a while ago I think). Now with the most recent issues also solved, I think I'll close this. If any bugs show themselves during use, new issues can naturally be opened, but the original feature request is done. Thanks everybody! |
I've been wanting to switch all my groups from Telegram to Signal for a long time, but what's holding me (and my friends) back is that I don't want to lose the chat history. I was wondering if it's possible to use signalbackup-tools to merge my Telegram history into my Signal backup?
I see that #19 already mentions this use case, but it's unclear to me whether the system is reliable and usable end-to-end. Also, the specific case of Telegram might be easier to address since Telegram allows users to export the full chat history in JSON.
Unfortunately, it looks like Telegram's JSON export format is not documented. I've reached out to Telegram to ask about this. In the meantime, this thread and this blog post look helpful. Overall I feel the format should be fairly easy to understand even without documentation, but I haven't properly looked into it.
I've just made a small donation for encouragement and to thank @bepaald for all the work that he has already put into the project. Happy to make a bigger donation if he or anybody can implement this :)
The text was updated successfully, but these errors were encountered: