The problem reads:
“Design a simplified version of Twitter where users can post tweets, follow/unfollow another user and is able to see the 10 most recent tweets in the user’s news feed. Your design should support the following methods:
- postTweet(userId, tweetId): Compose a new tweet.
- getNewsFeed(userId): Retrieve the 10 most recent tweet ids in the user’s news feed. Each item in the news feed must be posted by users who the user followed or by the user herself. Tweets must be ordered from most recent to least recent.
- follow(followerId, followeeId): Follower follows a followee.
- unfollow(followerId, followeeId): Follower unfollows a followee.”
Understanding the data
Each userId is related to a tweetId, in the form of a “twitter post”.
Each userId is related to multiple userIds, in the form of a “follower” or “followee”.
I need to store and organize the data in such a way that makes it easy to relate a userId to a tweetId; to perform find operations.
The data also needs to be organized in such a way that I can find a userId and based on this userId find all the followees, to then extract all the “tweetIds” from the followees.
We could create all these relationships in a Twitter class; using lists, maps and sets. The problem is the code then becomes convoluted and there is no clear separation of logic. The previous leads me to believe I need to use classes.
Class User -id:int -following:Set<Integer> -tweets:List<Tweet> +postTweet():void +follow(userId):void +unfollow(userId):void +getTweets():List<Tweet> +getUsersFollowed():Set<Integer> +setId(id):void
Class Tweet -id:int -datePosted:Date +getId():int +getDatePosted():Date
Class Twitter -userMap:Map<Integer,User> +postTweet(userId,tweetId):void +getNewsFeed(userId):List<Integer> +follow(followerId,followeeId):void +unfollow(followerId,followeeId):void +addUser(userId):void +getUser(userId):User +sortTweetsBasedOnDate(List<Tweet>):void +getAllTweetsFromUser(userId):List<Tweet> +getRangeOfTweetIDs(List<Tweet>,start,end):List<Integer>
Not terribly sophisticated but gets the job done.
The Twitter class ended up being the most complicated one because it is the one that orchestrates the other 2 classes and gives them the functionality.
I chose to use List for non randomized access to data, HashMap for 1:1 relations and Set whenever I need a unique-random-access data.
- Organized data and easy to identify each user and tweetId.
- Users and tweets are also encapsulated.
- Relating data outside its data structure requires some processing – retrieving the news feed of a user requires retrieving tweets of several other users. Perhaps the relationship can be improved?
As part of my learning experience I thought about using the Java Streaming API somewhere in this implementation, but I did not find a good use for it. The stream appears to work better when all the data is in a single point. I thought about using the stream in the news feed method but I have not really found a good use for it when the data is scattered through different users – again, maybe the design needs to be improved.
The design does not consider any scenarios of an actual implementation: networking, databases, distributed systems. These 3 classes get much more complicated if tried to implement in a real scenario.
I also added some tests. To be honest the tests need more work as I am only testing for the ideal case.
A friend of mine suggested to think of how it would work in a database and that the list of tweets would not be held by the user table or class. By moving the tweet list to the Twitter class I was able to remove a lot of the aggregated complexity. In general, the design became much simpler and… I was able to use the Java streaming API!
Currently the design does not need date comparison since its single threaded, in a distributed system the software would more than likely need date-based sorting.