• Home
  • Projects
  • Substack
Substack

Django: Managing user-uploaded files on App Engine Standard

Notes on managing user-uploaded (media) files in a Django app deployed on App Engine Standard using the django-storages library.
By Przemek, July 2020

In this note we discuss how to configure a Django application running on App Engine Standard, so that media (user-uploaded) files are stored in Google Cloud Storage. (If you’re just getting started with Django on App Engine see also the general introduction here .)

Disclosure: as I write these notes, I work as a Software Engineer at Google. In this post (like everywhere on this site) I share learnings in a personal capacity and speak only for myself.

Django files and App Engine Standard

There are two types of files that any Django app needs to reason about: those known ahead-of time (static files) and those not known before deployment (user-uploaded files, sometimes called “media files” because in Django they are configured using settings MEDIA_ROOT and MEDIA_URL). Here we look specifically at how to handle the user-uploaded files.

On App Engine, the application doesn’t have access to the local filesystem. Static files are served directly from the application image, while user-uploaded files need to handled… elsewhere. Within GCP, Google Cloud Storage is the canonical solution for durable file storage and below we explain how to use if as a backend for the user-uploaded files in our application.

Google Cloud Storage and bucket creation

Google Cloud Storage stores files in buckets. We can have one or more buckets per project – so in particular, we can create a dedicated bucket for user-uploaded files (regardless of whether we already have any buckets for other purposes or not).

To create the bucket, we go to Storage in the Cloud Console and click “Create bucket”.

During the bucket creation flow, we get to pick:

  1. the name for the bucket – needs to be globally unique across all GCP projects
  2. the physical location in which the bucket content is going to be stored. The locations can be regional (associated with a particular geo location), multi-region (spanning a large geo area such as Europe or United States). To be clear, whatever we choose, the files are going to be available to users globally – but putting them close to where most of the users are will improve serving latency. (more information on that in the official docs )
  3. the storage class, which determines the latency of retrieving the files and the storage price
  4. access policy, either - fine-grained, meaning that each file has separate permission set associated with it, or - uniform, meaning that all files in the bucket will be governed by the bucket-level permission set

In principle, the uniform access policy seems appealing. However, I found that with django-storages it’s easier to set up credentials if fine-grained permission is set – see the section below for why is that.

[optional] Import existing files

If we’re setting this up for an existing app, we will likely want to migrate the existing files to the new bucket. From a directory storing those files, we can upload them using gsutil:

gsutil -m cp -a public-read -r <directory> gs://<bucket name>

Note that -a public-read makes the files immediately world-readable to anyone who has the right URL.

django-storages and application settings

django-storages implements Django integration for multiple cloud storage backends, including Google Cloud Storage, and we can use it to enable storing user-uploaded files in the bucket we just created.

pip install django-storages

After adding the dependency, we can edit settings.py:

if os.getenv('GAE_APPLICATION'):
    DEFAULT_FILE_STORAGE = 'storages.backends.gcloud.GoogleCloudStorage'
    GS_BUCKET_NAME = '<bucket name>'
    GS_DEFAULT_ACL = 'publicRead' 
    MEDIA_URL = 'https://storage.googleapis.com/<bucket name>/'
else:
    PROJECT_PATH = os.path.join(os.path.abspath(os.path.split(__file__)[0]), '..')
    MEDIA_ROOT = os.path.join(PROJECT_PATH, 'site_media')
    MEDIA_URL = '/site_media/'

In the snippet above, we configure different setup when running locally for development purposes, so that the development server manages the user-uploaded files locally under site_media.

[optional] bucket prefix

If, rather than storing the files immediately in the top-level folder of the bucket, we have all of them under <optional prefix> (without leading or trailing slashes), we just need to make two changes:

  • set MEDIA_URL to https://storage.googleapis.com/<bucket name>/<optional prefix>/
  • add GS_LOCATION = <optional prefix>

Bucket permissions and access credentials

In the case of my app, the user-uploaded files of the app are meant to be publicly visible on the internet, so I wanted to have all files writable only by the app and publicly readable on the Internet.

One of the benefits of hosting our app on App Engine and using Google Cloud Storage for user-uploaded files, is that the app already has the right Google Cloud credentials set up when running on App Engine that allows it to authenticate requests to Google Cloud Storage. So we should not need to follow the authentication steps of the django-storages documentation that speak about creating a service account and adding the corresponding key via GOOGLE_APPLICATION_CREDENTIALS.

In practice, I found that this is indeed the case – but only if the bucket access policy is set to fine-grained and we set GS_DEFAULT_ACL = ‘publicRead’. Here’s what’s going on.

If we set the bucket permissions to uniform and GS_DEFAULT_ACL = ‘publicRead’, django-storages request to add a file fails with a 4xx error, indicating that the bucket has uniform permissions so we can’t set an ACL on it. Then we’d think that leaving GS_DEFAULT_ACL = None would work, but it doesn’t for different reason:

File "/env/lib/python3.7/site-packages/google/cloud/storage/_signing.py", line 55, in ensure_signed_credentials
"details.".format(type(credentials), SERVICE_ACCOUNT_URL)
AttributeError: you need a private key to sign credentials.the credentials you are currently using <class 'google.auth.compute_engine.credentials.Credentials'>

Without the ACL being set to public, django-storages assumes it’s private and attemps to use the private key to sign the request, which fails because we don’t have the credentials configured.

I reported this issue @ django-storages #909

Fortunately, the workaround is easy enough: set GS_DEFAULT_ACL = 'publicRead' and make sure the bucket has the access control policy set to “fine-grained”.

Conclusion

And that’s it, user-uploaded files should now “just work”!

    Topics

    • Django
    • GCP

    Outline

    • Django files and App Engine Standard
    • Google Cloud Storage and bucket creation
    • [optional] Import existing files
    • django-storages and application settings
    • [optional] bucket prefix
    • Bucket permissions and access credentials
    • Conclusion

    If you liked this and want more ...

    People trying to get along with computers. Things we can do with AI, things we better do ourselves. An occasional segway to Steinbeck's post-rodeo hangover 💫.

    ... check out my weekly column