rhythmdb/DESIGN

   1
   2 * Goals for a new Rhythmbox tree database backend
   3
   4 ** Not complicated
   5
   6 We want to present a simple API to the rest of Rhythmbox.  A
   7 full-blown SQL is obviously unnecessary.  The only thing that
   8 Rhythmbox cares about are songs.  So conceptually, our database
   9 will only support getting lists of songs.
  10
  11 ** Permit optimization
  12
  13 In keeping with a simple API, we should also attempt as much as
  14 possible to provide opportunities for optimization.  Specifically
  15 the old RBNode database was highly optimized for genre/artist/album
  16 filtering, and we want to keep that capability.
  17
  18 ** Multiple implementations
  19
  20 The interface should also permit multiple implementations.  For
  21 example, there are the crazy people out there who have all their
  22 music metadata in a SQL database already, so having our own
  23 database doesn't help them much.
  24
  25 It also may turn out that using something like SQLite could
  26 speed up Rhythmbox for large numbers of files (say > 6k or so).
  27 This is an area where the current Rhythmbox kind of falls over.
  28 So permitting multiple implementations will allow us more
  29 flexibility for experimentation.
  30
  31 ** What is a song?
  32
  33 Conceptually, it's just a collection of attributes.  The most
  34 important is the type, which says whether it's a local file,
  35 an internet radio station, etc.  Other attributes include
  36 location, track number, rating, last play time, play count,
  37 quality, etc.
  38
  39 *** What are attributes?
  40
  41 We want the ability to have an arbitrary set of attributes of
  42 various data types, including UTF-8 strings, integers, and random
  43 binary data (e.g. for sort keys).
  44
  45 **** Saved/unsaved attributes
  46
  47 The database should support "temporary" keys, which are not actually
  48 saved to the database on exit.  This is useful for sorting keys,
  49 for example.
  50
  51 ** Multithread-capable
  52
  53 We need to ensure that the UI doesn't block.  Generally this
  54 means using threads.  This introduces a lot of locking issues
  55 that need to be addressed.
  56
  57 The database should be also optimized for common operations,
  58 and this definitely means reading.  Writing should be lower
  59 priority, although we don't want to completely starve writers.
  60
  61 Because we need to support multiple backends, we can only really
  62 support coarse-grained locking (i.e. one big global RW lock).
  63
  64 *** Multithreaded entry attribute changes
  65
  66 This issue is not as simple as one might think.  We definitely
  67 want to be able to use threads for this - for example, after
  68 the user has changed a bunch of songs on disk, we need
  69 to update the UI to reflect those changes, incrementally.
  70 That involves emitting a gtk_tree_model_row_changed signal, for
  71 which we need to hold the GDK lock.
  72
  73 *** Deadlock
  74
  75 We have to consider deadlock situations here, where one thread is
  76 holding the GDK lock, and wants the database lock, and then another
  77 thread is holding the db lock and wants the GDK lock.  In the old
  78 design, the any other thread could grab the GDK lock to update the UI,
  79 which is wrong.
  80
  81 *** RBEntryModel
  82
  83 In general, we don't want the various processing threads in
  84 Rhythmbox to be aware that there is this huge slow UI to
  85 keep updated.  Setting an attribute or delete an entry
  86 shouldn't involve doing anything else besides calling the
  87 function.
  88
  89 However, we do need to update the UI somehow.  The UI
  90 is represented by the GtkTreeModel implementations
  91 returned by a query.  Thus, one plan is to have an update
  92 queue.  Whenever a thread other than the main one modifies
  93 RhythmDB, we push an update notifier onto the queue.  Then,
  94 we have another thread (besides the main one) dedicated
  95 to processing this queue, and emitting the appropriate
  96 row_changed signals.
  97
  98 **** GtkTreeModel mapping efficiency
  99
 100 In order to emit the row_changed signal, we will need
 101 to efficiently look up a GtkTreeIter, given an entry.
 102 Perhaps we have to write our own GtkTreeModel implementation
 103 that combines a GHashTable and a GList?  Or just subclass
 104 GtkListStore, and add a GHashTable index?
 105
 106 *** Deletions
 107
 108 Handling deletions well is important.  Currently
 109 doing a "Select All" followed by "Delete" can take
 110 some time.  Now, this works two ways.  First, the user
 111 can request a deletion.  To handle this, we simply
 112 queue deletion requests from the UI, and then have
 113 a thread which processes those.
 114
 115 However, we also need to handle deletions from other
 116 threads, such as the database processing thread.  This
 117 brings up an issue - we cannot treat deletions like
 118 other operations.  If an entry is deleted from the
 119 database, but the UI isn't updated - the user may
 120 try to change the properties of an entry that's
 121 already been deleted.  How to solve this?
 122
 123 *** Additions
 124
 125 Additions fairly straightforward; we don't have
 126 as many database consistency issues as deletions.  Basically
 127 we just have a thread add something to the database, and then
 128 queue a signal emission for the main UI, much like for property
 129 changes.
 130
 131 ** RhythmDBEntry  (basically equivalent of RBNode)
 132
 133 This is an opaque handle.  You can't access it directly at all.
 134
 135 ** Queries
 136
 137 This is crucial.  Rhythmbox's genre/artist/album
 138 browser is just one example of querying the database.  It's
 139 arguably Rhythmbox's most useful feature at the moment.
 140
 141 First of all, we want to support at least the genre/artist/album
 142 queries.  Equally important though, we need to be flexible enough
 143 to query on any attribute.  This will be an important basis for
 144 implementing iTunes-like smart playlists.
 145
 146 The main idea of the new API is that it simply returns a set of
 147 nodes which match the query.  Then a new object which implements
 148 the GtkTreeModel interface is wrapped around this set.  This
 149 has the potential to be a lot more efficient than the previous approach.
 150 In the previous approach, the whole database was always presented in a
 151 GtkTreeModel interface, and then EggTreeModelFilter was used to filter it.
 152 The downside of this approach is that the filtering is happening a lot
 153 more; for example, finding the next/previous node caused each node to
 154 be filtered again.  For large databases, this is pretty inefficient; if
 155 we have 10000 songs, and we're only viewing say one album (10 songs),
 156 then we have to skip over approximately 5000 songs on average to find
 157 out whether we're at the end or not.  This gets very expensive, very fast.
 158
 159 So here's how it will work in the new implementation.  When the user clicks
 160 on an album, this only requires one query.  So, we will just call
 161 rhythmdb_do_entry_query like this:
 162
 163 album = compute_album_id ();
 164 ...
 165 GPtrArray *set = rhythmdb_do_entry_query (db, RHYTHMDB_QUERY_PROP_EQUALS, "album", album, RHYTHMDB_QUERY_END);
 166 GtkTreeModel *model = rb_db_model_new (set);
 167
 168 Then, we will make the main song view use this model:
 169
 170 gtk_tree_view_set_model (mainview, model);
 171
 172 However, suppose that the user clicks on a genre.  This will require
 173 several queries, because we need to filter the artist and album browsers
 174 by artists/albums which are in those genres.  As before, we filter the main
 175 view by the query:
 176
 177 genre = get_genre_id ();
 178 ...
 179 GPtrArray *set = rhythmdb_do_entry_query (db, RHYTHMDB_QUERY_PROP_EQUALS, "genre", genre, RHYTHMDB_QUERY_END);
 180 GtkTreeModel *model = rb_db_model_new (set);
 181 gtk_tree_view_set_model (mainview, model);
 182
 183 However, we also need to gather the lists of artists/albums.  We use the
 184 specialized query rhythmdb_do_property_query, which returns just a flat
 185 GtkTreeModel.  This call looks like this:
 186
 187 set = rhythmdb_do_property_query (db, "artist", RHYTHMDB_QUERY_PROP_EQUALS,
 188                                   "genre", "Rock", RHYTHMDB_QUERY_END);
 189 model = rb_db_model_new (set);
 190 gtk_tree_view_set_model (artistview, model);
 191 set = rhythmdb_do_property_query (db, "album", RHYTHMDB_QUERY_PROP_EQUALS,
 192                                  "genre", "Rock", RHYTHMDB_QUERY_END);
 193 model = rb_db_model_new (set);
 194 gtk_tree_view_set_model (artistview, model);
 195
 196 Conceptually, the property_query call just gathers all the song entries
 197 which match the query, and returns the unique list of their values for the
 198 specified property.  However, it is not actually implemented this way,
 199 at least not for commonly queried properties such as artist and album.
 200
 201 ** The first implementation - transitioning RBNode to this API
 202
 203 The main idea of RBNode is the tree structure.  If you think about it,
 204 the tree structure is really just a fancy optimization for querying
 205 the artist/album/genre properties.  We will keep this idea, but hide it
 206 behind the flat structure API.  We will transparently optimize certain
 207 queries, such as the artist/album ones.  This will require handling those
 208 properties specially; for example, when we set the artist of a song,
 209 this will require restructuring the tree database to match.
 210
 211 ** SQLite ideas
 212
 213 As we mentioned before, the nice thing about this API is that it could
 214 also be fairly easily implemented by a real database such as SQLite.
 215 For example, this call:
 216
 217 /* Here, 19 is a unique ID referring to a genre such as "Rock */
 218 rhythmdb_do_entry_query (db, RHYTHMDB_QUERY_PROP_EQUALS, "genre", 19, RHYTHMDB_QUERY_END);
 219
 220 could be rewritten in SQL:
 221
 222 select * from songs where genreid=19;
 223
 224 And this call:
 225
 226 /* Here, 37 is a unique ID referring to a genre such as "Rock */
 227 rhythmdb_do_property_query (db, "album", RHYTHMDB_QUERY_PROP_EQUALS, "genre", 37, RHYTHMDB_QUERY_END);
 228
 229 could be rewritten as:
 230
 231 select distinct album from songs where genreid=37;
 232
 233 ** Static playlists
 234
 235 We need to support user-orderable playlists.  Now, they should present
 236 the same GtkTreeModel-type interface as the queries.  This will
 237 allow a lot of code reuse for playback, etc.  However - a playlist also
 238 has to be mutable.  The user needs to be able to drag and drop entries to
 239 reorder them, delete files, add files, etc.  To implement this, we can
 240 make playlists simply be GtkListStores.  This should be relatively
 241 efficient since playlists tend not to be large.
 242
 243 This playlist will need to hook into the entry_destroyed signal to know
 244 to remove entries from playlists that have been removed from the library.
 245
 246 ** Smart playlists (do we want a better name)?
 247
 248 This will be harder.  A "smart" playlist will need to watch for changes
 249 in the database and refresh if appropriate.  Well, it wouldn't be hard
 250 to watch for ANY change in the DB and refresh, but this would obviously
 251 be very expensive.  We'll want to intelligently refresh only if the
 252 smart playlist query requries it.
 253
 254 * A plan for refactoring Rhythmbox to integrate RhythmDB
 255
 256 ** Shell will create/destroy the database object
 257
 258 ** Library takes database object, creates new thread to refresh metadata
 259
 260 ** Morph RBIRadioBackend into the trival class RBIRadioLoader
 261
 262 ** Turn RBNodeView into RBEntryView
 263 *** RBTreeModelNode goees away
 264 Instead, just add a bunch of columns into an RBTreeView, but set
 265 the column_func to get the correct value
 266
 267
 268 Local Variables:
 269 mode: outline
 270 End:
 271
 272 arch-tag: RhythmDB design document