Valentin Jacquemin

I migrated all my tweets, here's how

On how I reclaimed my tweets and imported them into my personal website

About a year ago, I closed my twitter account. Before pulling the trigger though, I’ve exported my personal content as I knew that I did not want to loose all that history of my online presence.

Funnily, that export feature still matches the Twitter branding, the guys from X did not bother taking care of that.

My personal twitter archive homepage
My twitter archive as static website. X did not bother rebrading this feature.

I got in my mailbox a fully functioning static website. Where I could browse my own feed as if Twitter still existed. You can search, filter and sort the feed as you please, all locally. The web as I like it.

I did not know what I would do with that archive a year ago. I just wanted to close my account without loosing my content. As I decided to resume my personal website, I knew that my tweets would get a new home in there.

Here’s how I migrated that content.

When you browse the archive you quickly find where to go. And as it’s a static website anyway, you can figure out how the whole thing works by looking to the sources.

twitter-archive> tree -P '*.html' -L2
.
├── assets
│   ├── fonts
│   ├── images
│   └── js
├── data
│   ├── community_tweet_media
│   ├── deleted_tweets_media
│   ├── direct_messages_group_media
│   ├── direct_messages_media
│   ├── moments_media
│   ├── moments_tweets_media
│   ├── profile_media
│   ├── tweets_media
│   └── tweets.js
└── Your archive.html

The entire collection of tweets is in app/tweets.js. From there to my own website, it’s a matter of using a few tools to filter and re-shape that array of microblog entries.

First, I simply kept the json compatible part of app/tweets.js, i.e. I dropped the javascript variable declaration. I saved that into preprocess1.json.

Second, I decided to filter out any replies as I just wanted to keep my own posts. That would be too much of work to bring replies into their context, not even sure that it would be feasible at all.

jq '[.[] | .tweet.id as $id 
  | ( .tweet.entities.media | length > 0 ) as $hasMedia 
  | select(IN(paths; ["tweet", "in_reply_to_user_id"])==false) 
  | del(
    .tweet.extended_entities, 
    .tweet.favorite_count, 
    .tweet.id_str, 
    .tweet.truncated, 
    .tweet.retweet_count, 
    .tweet.id, 
    .tweet.possibly_sensitive, 
    .tweet.favorited, 
    .tweet.entities.user_mentions, 
    .tweet.entities.symbols, 
    .tweet.entities.hashtags, 
    .tweet.retweeted, 
    .tweet.edit_info, 
    .tweet.source, 
    .tweet.display_text_range) 
   | .tweet.entities.media 
   |= if $hasMedia then $id end]' preprocess.1.json > preprocess.2.json

That filtered out about a third of my tweets.

Then I re-shaped the tweets. For example, the json payload gives the expanded URLs of any shortened link present in a tweet:

{
    "tweet" : {
      "entities" : {
        "urls" : [
          {
            "url" : "https://t.co/FhmBWq39rL",
            "expanded_url" : "https://activitypub.rocks/",
            "indices" : [
              "178",
              "201"
            ]
          }
        ]
      },
      "created_at" : "Fri Nov 01 11:54:07 +0000 2024",
      "full_text" : "\"Don't you miss the days when the web really was the world's greatest  decentralized network?  Before everything got locked down into a handful  of walled gardens?  So do we.\" – https://t.co/FhmBWq39rL",
      "lang" : "en"
    }
  },

I definitely don’t want to use the twitter URL shortener anymore. Thanks to that thorough payload, it was easy for me to translate them.

A multipass of the following jq script would convert all the shortened URLs into their expanded version.

cat preprocess.2.json | jq '[.[] 
  | .tweet.entities.urls[0]?.url as $url 
  | .tweet.entities.urls[0]?.expanded_url as $expanded_url 
  | (if $url then .tweet.full_text 
  | sub($url; "<a href=" + $expanded_url + ">" + $expanded_url + "</a>") 
    else .tweet.full_text end) as $text 
  | { tweet: { 
      full_text: $text, 
      lang: .tweet.lang, 
      created_at: .tweet.created_at, 
      entities: { 
        urls: .tweet.entities.urls[1:], 
        media: .tweet.entities.media 
      } 
    }}]' > preprocess.3.json

Once the content reshaped, I created one file per tweet under the notes section of my hugo website. I’ve used a bash script for that:

#!/usr/bin/env sh

file='preprocess.5.json'
length=$(cat "$file" | jq '. | length')
i_tweet=1

for (( i=0; i<length; i++ ))
do
    created_at=$( jq -r ".[$i] | .date | todate" "$file")
    output_folder="output/tweets/$(( i_tweet++ ))"
    out="$output_folder/index.md"
    hasMedia=$( jq ".[$i] | if .tweet.media then true else false end" "$file" )
    lang=$( jq -r ".[$i] | .tweet.lang" "$file")
    content=$( jq -r ".[$i] | .tweet.full_text" "$file")
    template='tweet.pre'

    if $hasMedia = "true" ; then
            template='tweet-with-media.pre'
            filename=$( jq -r ".[$i] | .tweet.media" $file)
            media=$(basename $(ls tweets_media/"$filename"*))
            if [[ "$media" == *.mp4 ]]; then
     	       mediaHtml="<video autoplay loop src='/notes/$media' class='u-video'></video>"
            else
     	       mediaHtml="<img src='/notes/$media' class='u-photo' />"
            fi
    fi
    data=$( m4 -D__CREATED_AT__="${created_at}" \
            -D__LANG__="$lang" \
            -D__CONTENT__="$content" \
            -D__MEDIA__="$mediaHtml" \
            $template)
    mkdir -p "$output_folder"
    echo "$data" > $out
done

That uses m4 too, simply to keep the templating outside, just a bit easier to manage.

Here’s for example how I prepared tweet.pre. The bash script would either use this one or tweet-with-media.pre depending on the presence or not of a media in the tweet.

changequote(<!, !>)
---
date: __CREATED_AT__
lang: __LANG__
params:
  kind: tweet
---
<p>__CONTENT__</p>
<p class="meta original-link">
First published on <a href="https://twitter.com">twitter</a> 
(<a href="/articles/1/">?</a>)
</p>

That’s it! 3143 tweets kept intact, now persisted on my own website.