I'm working on a project that manages the workflow, searchability, navigation and presentation of interactive modules. These modules will have all sorts of content such as text, audio, video and images. The architecture of the system isn't what I'm curious
about. I'd like to have a discussion on the best practice for managing these media assets. By the time this application is in full swing, there could be thousands upon thousands of videos, audio files and images. Each of the assets could be shared and reused
among any number of these modules.
So I will likely have a table in the database that has a reference to the asset itself and some other tagging schema that I will use for searchability and such. And for every module activity that uses the asset, there will be a row in an association table
between the activity and the asset tables. This way, there is only one instance of the file, but it can still be used by multiple activities. The question is where do I physically store these assets? My concern is that if I have an "audio", "video" and
"image" directory under one shared application-level "moduleAssets" directory, each of those subdirectories could get huge, and browsing them for maintenance or other purposes, could become a performance issue.
If my description is too abstract, think of Facebook. I have photos, but I tag the photo with my friends. Now, when you browse the friend's photos, it shows up under their list. I imagine the Facebook system doesn't create a copy of this photo for every
person, so I am doubting that each user has their own directory of photos/videos, in the same way that my modules wouldn't have their own photos/audio/videos directories. So how would Facebook store their massive amount of media in a logical way? Now, I've
read some articles on Facebook's image servers and the like. Obviously, I'm not dealing with near the load they are, nor do we have near the same amount of hardware, so I suppose this analogy is more of how they did it before they became big. How do I physically
store a lot of media assets in a logical way that reduces redundancy and optimizes performance using only a webserver and a database?
I welcome all discussion on this topic as I feel there is no "right" or "wrong" answer here. It will likely be a list of pros and cons for various implementations.
"Dream as if you'll live forever, live as if you'll die today." --James Dean
I welcome all discussion on this topic as I feel there is no "right" or "wrong" answer here. It will likely be a list of pros and cons for various implementations.
This is why I love this forum. If you asked this question on SO it would be closed in 2 seconds for being objective. Anyway...
The 1st step is to manage the organization and the "How" for storing and accessing. I strongly recommend using SQL Server's FILESTREAM Storage capabilities to organize your blobbed media on the file system. If you have not heard about it have a look at the
link below:
That only takes care of the "How to physically access and manage" portion of your question. For the actual schema I will also defer to others to chime in to this open-forum question in addition to my experiences below.
Actually I have done exactly what you are doing but maybe not quite to the scale you are mentioning. I have an app that manages thousands of pictures and videos just as you described. In my case everything was organized by Year/Month and that was
really all I needed to do for my requirements. Space is cheap, so I was able to get 100's of GB with no resistance at all that can easily be upped if I needed it. My performance is not impacted by the file system schema I used. You can also share the files
via a IIS site so you could access the files directly via HTTP if desired or needed.
The way I dealt with redundancy was by eliminating it via the associated content meta data. Each media file has associated meta data that could be filtered to prevent duplication of content, but then again your scenario might be different. In the end my
files are rapidly accessible and can be uploaded or downloaded via WCF as well which makes for a really flexible solution in my case.
I really appreciate the link to the white paper. This is sounding like an interesting option. I like how the physical files and database data is automatically syncronized so that there are not files without database rows and vice-versa. I'm still interested
in how HTTP access is controlled. You mentioned that I could share the files via IIS site, I'd like to get a little more clarification on that as it is possible that the client web page might be making requests to "http://mysite.com/assets/audio/whatever.mp3"
and running it on some browser-based media player. Are you suggesting that I simply make a virtual directory within the application directory that actually points to a share that contains the actual physical filepath that is being used by FILESTREAM? If
that is the case, it sounds simple enough. Have you come across any drawbacks with this implementation?
"Dream as if you'll live forever, live as if you'll die today." --James Dean
Are you suggesting that I simply make a virtual directory within the application directory that actually points to a share that contains the actual physical filepath that is being used by FILESTREAM?
Yes exactly in understanding, but there is a potential catch. Now in my case it happened to be another VD outside the actual application, and in my case I did
not use FILESTREAM to store the files, but rather manually saved them using System.IO (project was done prior to SQL 2008).
The caveat to using the FILESTREAM is the blobbed data is stored in such a manner that it is
not directly assessable as "whatever.mp3". The UNC path leads to the
blobbed data. In order to view the file you need to use a SqlFileStream Type (not System.IO) so I don't think the VD pointing to the blobbed data would work in that situation. You might be able to do something like create a custom HttpHandler (on the
site for accessing the files) that when the file was requested could realtime use the SqlFileStream object to pull the file unbenounced to the client. I have not tried accessing the blobbed data directly so someone else might have more details or a better
solution when using a FILESTREAM. Here is some documentation from MSDN on this:
Storage Namespace In FILESTREAM, the Database Engine controls the BLOB physical file system namespace. A new intrinsic function, PathName, provides the logical UNC path of the BLOB that corresponds to each FILESTREAM cell in the table. The application uses this logical path
to obtain the Win32 handle and operate on the BLOB data by using regular Win32 file system interfaces. The function returns NULL if the value of the FILESTREAM column is NULL.
One of the really nice things I like about using FILESTREAM is when it comes to deleting files. In my case with the example I provided, I had to keep reference to both the file URL for read only access, and
also the UNC path for file deletion (can't use URL to delete file). With FILESTREAM, deleting a row deletes the underlying blobbed data which is a plus.
AceCorban
Star
12358 Points
2274 Posts
Mass Media Storage Architecture
Apr 02, 2012 06:08 PM|LINK
I'm working on a project that manages the workflow, searchability, navigation and presentation of interactive modules. These modules will have all sorts of content such as text, audio, video and images. The architecture of the system isn't what I'm curious about. I'd like to have a discussion on the best practice for managing these media assets. By the time this application is in full swing, there could be thousands upon thousands of videos, audio files and images. Each of the assets could be shared and reused among any number of these modules.
So I will likely have a table in the database that has a reference to the asset itself and some other tagging schema that I will use for searchability and such. And for every module activity that uses the asset, there will be a row in an association table between the activity and the asset tables. This way, there is only one instance of the file, but it can still be used by multiple activities. The question is where do I physically store these assets? My concern is that if I have an "audio", "video" and "image" directory under one shared application-level "moduleAssets" directory, each of those subdirectories could get huge, and browsing them for maintenance or other purposes, could become a performance issue.
If my description is too abstract, think of Facebook. I have photos, but I tag the photo with my friends. Now, when you browse the friend's photos, it shows up under their list. I imagine the Facebook system doesn't create a copy of this photo for every person, so I am doubting that each user has their own directory of photos/videos, in the same way that my modules wouldn't have their own photos/audio/videos directories. So how would Facebook store their massive amount of media in a logical way? Now, I've read some articles on Facebook's image servers and the like. Obviously, I'm not dealing with near the load they are, nor do we have near the same amount of hardware, so I suppose this analogy is more of how they did it before they became big. How do I physically store a lot of media assets in a logical way that reduces redundancy and optimizes performance using only a webserver and a database?
I welcome all discussion on this topic as I feel there is no "right" or "wrong" answer here. It will likely be a list of pros and cons for various implementations.
atconway
All-Star
16846 Points
2756 Posts
Re: Mass Media Storage Architecture
Apr 02, 2012 07:57 PM|LINK
This is why I love this forum. If you asked this question on SO it would be closed in 2 seconds for being objective. Anyway...
The 1st step is to manage the organization and the "How" for storing and accessing. I strongly recommend using SQL Server's FILESTREAM Storage capabilities to organize your blobbed media on the file system. If you have not heard about it have a look at the link below:
FILESTREAM Storage in SQL Server 2008:
http://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx
That only takes care of the "How to physically access and manage" portion of your question. For the actual schema I will also defer to others to chime in to this open-forum question in addition to my experiences below.
Actually I have done exactly what you are doing but maybe not quite to the scale you are mentioning. I have an app that manages thousands of pictures and videos just as you described. In my case everything was organized by Year/Month and that was really all I needed to do for my requirements. Space is cheap, so I was able to get 100's of GB with no resistance at all that can easily be upped if I needed it. My performance is not impacted by the file system schema I used. You can also share the files via a IIS site so you could access the files directly via HTTP if desired or needed.
The way I dealt with redundancy was by eliminating it via the associated content meta data. Each media file has associated meta data that could be filtered to prevent duplication of content, but then again your scenario might be different. In the end my files are rapidly accessible and can be uploaded or downloaded via WCF as well which makes for a really flexible solution in my case.
AceCorban
Star
12358 Points
2274 Posts
Re: Mass Media Storage Architecture
Apr 02, 2012 10:29 PM|LINK
I really appreciate the link to the white paper. This is sounding like an interesting option. I like how the physical files and database data is automatically syncronized so that there are not files without database rows and vice-versa. I'm still interested in how HTTP access is controlled. You mentioned that I could share the files via IIS site, I'd like to get a little more clarification on that as it is possible that the client web page might be making requests to "http://mysite.com/assets/audio/whatever.mp3" and running it on some browser-based media player. Are you suggesting that I simply make a virtual directory within the application directory that actually points to a share that contains the actual physical filepath that is being used by FILESTREAM? If that is the case, it sounds simple enough. Have you come across any drawbacks with this implementation?
atconway
All-Star
16846 Points
2756 Posts
Re: Mass Media Storage Architecture
Apr 03, 2012 02:23 PM|LINK
Yes exactly in understanding, but there is a potential catch. Now in my case it happened to be another VD outside the actual application, and in my case I did not use FILESTREAM to store the files, but rather manually saved them using System.IO (project was done prior to SQL 2008).
The caveat to using the FILESTREAM is the blobbed data is stored in such a manner that it is not directly assessable as "whatever.mp3". The UNC path leads to the blobbed data. In order to view the file you need to use a SqlFileStream Type (not System.IO) so I don't think the VD pointing to the blobbed data would work in that situation. You might be able to do something like create a custom HttpHandler (on the site for accessing the files) that when the file was requested could realtime use the SqlFileStream object to pull the file unbenounced to the client. I have not tried accessing the blobbed data directly so someone else might have more details or a better solution when using a FILESTREAM. Here is some documentation from MSDN on this:
Storage Namespace
In FILESTREAM, the Database Engine controls the BLOB physical file system namespace. A new intrinsic function, PathName, provides the logical UNC path of the BLOB that corresponds to each FILESTREAM cell in the table. The application uses this logical path to obtain the Win32 handle and operate on the BLOB data by using regular Win32 file system interfaces. The function returns NULL if the value of the FILESTREAM column is NULL.
One of the really nice things I like about using FILESTREAM is when it comes to deleting files. In my case with the example I provided, I had to keep reference to both the file URL for read only access, and also the UNC path for file deletion (can't use URL to delete file). With FILESTREAM, deleting a row deletes the underlying blobbed data which is a plus.