Artwork

Content provided by Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Ep 022: Evidence of Attempted Posting

33:52
 
Share
 

Manage episode 230309366 series 2463849
Content provided by Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Christoph questions his attempts to post to Twitter.

  • This week, continuing to dig into the "Twitter problem". We want to post to Twitter on a schedule.
  • "Writing code to help out with laziness."
  • Start with data to keep track of: inside (our data) and outside (Twitter data)
  • "Data from a foreign land."
  • We need to determine our "working view" of Twitter's data.
  • What is in our data? For each "scheduled tweet":
    • Text to post: the "status"
    • Timestamp of when to post
  • Timestamps are nice
    • Milliseconds since the epoch
    • Universal instant
    • Allows the client to localize
  • How do we know a scheduled tweet has been posted? A "posted?" boolean?
  • Boolean says, "Yes! It has been posted somewhere on the Internet."
  • Correlating identifiers are more useful than a Boolean.
  • The tweet ID is a correlating identifier. We can use it to lookup all of Twitter's data about it.
  • "We don't need to store all of Twitter in our database."
  • What is the story you need to tell about what happened?
    • A record of all the attempts allows us to tell a story about what happened.
    • Useful to have the timestamp of when our application posted it.
  • Make a separate log for attempts.
    • Attempting to post is a separate concern than what to post.
    • Don't complicate the scheduled tweet information by embedding the log.
  • "Once you have all the data, it allows you to ask new questions you didn't originally think of."
  • Clojure makes it easy to work with a large tree of data that came from an external source. We don't have to care about the structure of that data. We can just write it down.
  • Simply attempt to post the next scheduled tweet that does not have a Twitter ID recorded.
  • If it fails, just record the attempt, and go back to sleep.
  • "Handle the brick in front of you, and if you keep doing that, you'll eventually build the wall."
  • What if we don't hear the success response from Twitter, but it did get posted?
  • Idea: Try to detect if a tweet has already been posted.
  • If we can uniquely identify something by its content, we can know two things are the same without having a common ID.
  • Problem: Twitter can alter the contents.
  • Idea: fuzzy "measure of similarity" between our recent tweets and the next scheduled tweet.
  • We can record the fuzzy match in our attempt log too!
  • If we can correlate by contents, we could even identify when we manually post in advance.
  • As soon as you can determine equality by the substance of the thing itself, you can have more than one writer.
  • How "recent" is "recent"? Is it 100? Is it 200? Is it 500?
  • Even better, fetch all the tweets since the last ID we recorded.
    • we know we're seeing all of the tweets
    • can scan each of those for a match (in the case of a manual post)
    • know when the tweet stream ends, so we can know a posting is still needed
  • The worker will get there eventually. Can just give up on an error. No complex retry and recovery logic.
  • With more than one writer, we still can have a race condition. Ultimately Twitter has to deal with deduplication to avoid a double post in a short interval.

Message Queue discussion:

  • Namespacing in a map is really useful
  • A flat, namespaced map is easier to traverse than a nested map.
  • One use for namespaces: indicate the origin of the data
  • Eg. :twitter/id, :twitter/status vs :local/id, :local/text
  • You see the namespace in your code, so it makes the data origin very visible.

Related episodes:

Related projects:

Clojure in this episode:

  • pr-str
  continue reading

113 episodes

Artwork
iconShare
 
Manage episode 230309366 series 2463849
Content provided by Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Christoph questions his attempts to post to Twitter.

  • This week, continuing to dig into the "Twitter problem". We want to post to Twitter on a schedule.
  • "Writing code to help out with laziness."
  • Start with data to keep track of: inside (our data) and outside (Twitter data)
  • "Data from a foreign land."
  • We need to determine our "working view" of Twitter's data.
  • What is in our data? For each "scheduled tweet":
    • Text to post: the "status"
    • Timestamp of when to post
  • Timestamps are nice
    • Milliseconds since the epoch
    • Universal instant
    • Allows the client to localize
  • How do we know a scheduled tweet has been posted? A "posted?" boolean?
  • Boolean says, "Yes! It has been posted somewhere on the Internet."
  • Correlating identifiers are more useful than a Boolean.
  • The tweet ID is a correlating identifier. We can use it to lookup all of Twitter's data about it.
  • "We don't need to store all of Twitter in our database."
  • What is the story you need to tell about what happened?
    • A record of all the attempts allows us to tell a story about what happened.
    • Useful to have the timestamp of when our application posted it.
  • Make a separate log for attempts.
    • Attempting to post is a separate concern than what to post.
    • Don't complicate the scheduled tweet information by embedding the log.
  • "Once you have all the data, it allows you to ask new questions you didn't originally think of."
  • Clojure makes it easy to work with a large tree of data that came from an external source. We don't have to care about the structure of that data. We can just write it down.
  • Simply attempt to post the next scheduled tweet that does not have a Twitter ID recorded.
  • If it fails, just record the attempt, and go back to sleep.
  • "Handle the brick in front of you, and if you keep doing that, you'll eventually build the wall."
  • What if we don't hear the success response from Twitter, but it did get posted?
  • Idea: Try to detect if a tweet has already been posted.
  • If we can uniquely identify something by its content, we can know two things are the same without having a common ID.
  • Problem: Twitter can alter the contents.
  • Idea: fuzzy "measure of similarity" between our recent tweets and the next scheduled tweet.
  • We can record the fuzzy match in our attempt log too!
  • If we can correlate by contents, we could even identify when we manually post in advance.
  • As soon as you can determine equality by the substance of the thing itself, you can have more than one writer.
  • How "recent" is "recent"? Is it 100? Is it 200? Is it 500?
  • Even better, fetch all the tweets since the last ID we recorded.
    • we know we're seeing all of the tweets
    • can scan each of those for a match (in the case of a manual post)
    • know when the tweet stream ends, so we can know a posting is still needed
  • The worker will get there eventually. Can just give up on an error. No complex retry and recovery logic.
  • With more than one writer, we still can have a race condition. Ultimately Twitter has to deal with deduplication to avoid a double post in a short interval.

Message Queue discussion:

  • Namespacing in a map is really useful
  • A flat, namespaced map is easier to traverse than a nested map.
  • One use for namespaces: indicate the origin of the data
  • Eg. :twitter/id, :twitter/status vs :local/id, :local/text
  • You see the namespace in your code, so it makes the data origin very visible.

Related episodes:

Related projects:

Clojure in this episode:

  • pr-str
  continue reading

113 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide