Table of Contents#
- Prerequisites
- What is
multipart/form-data? - Why
urllib2NeedsMultipartPostHandler - Installing
MultipartPostHandler - Basic Usage: POST Form-Data with
urllib2 - Handling Unicode File Uploads: Common Issues
- Fixing Unicode File Upload Problems
- Advanced: Sending Additional Form Fields
- Troubleshooting Tips
- Conclusion
- References
Prerequisites#
Before diving in, ensure you have:
- Python 2.7:
urllib2is part of Python 2’s standard library (Python 3 usesurllib.requestinstead, but this guide focuses onurllib2). - Basic knowledge of HTTP POST requests and
urllib2. - A target API endpoint that accepts
multipart/form-data(e.g., a file upload endpoint).
What is multipart/form-data?#
multipart/form-data is a MIME type used to send binary data (like files) and form fields in HTTP requests. Unlike application/x-www-form-urlencoded (which encodes data as key-value pairs for text), multipart/form-data splits data into separate "parts," each with its own headers. This is essential for files because binary data can’t be safely URL-encoded.
A typical multipart/form-data request looks like this:
POST /upload HTTP/1.1
Host: example.com
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="username"
john_doe
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="file"; filename="café.txt"
Content-Type: text/plain
[File content here]
------WebKitFormBoundary7MA4YWxkTrZu0gW-- The boundary is a random string that separates parts, ensuring the server can parse each field/file correctly.
Why urllib2 Needs MultipartPostHandler#
Python’s urllib2 can send POST requests using urllib.urlencode for application/x-www-form-urlencoded data, but it provides no built-in support for multipart/form-data. Manually constructing the multipart/form-data body is error-prone: you’d need to generate boundaries, encode parts, and handle headers—all while avoiding syntax mistakes (e.g., missing boundary delimiters).
MultipartPostHandler is a third-party library that extends urllib2 to handle multipart/form-data automatically. It intercepts POST requests, detects file-like data, and encodes the request body with proper boundaries and headers.
Installing MultipartPostHandler#
MultipartPostHandler is not in Python’s standard library, so you’ll need to install it manually. Here’s how:
Option 1: Download from Source#
The original MultipartPostHandler is available via legacy sources (it’s no longer actively maintained but still functional). Save the source code (or a Gist) as MultipartPostHandler.py in your project directory.
Option 2: Use pip (if available)#
Some forks are hosted on PyPI. Try:
pip install MultipartPostHandler If this fails, use Option 1.
Basic Usage: POST Form-Data with urllib2#
Let’s walk through a simple example: uploading a file with urllib2 and MultipartPostHandler.
Step 1: Import Dependencies#
import urllib2
from MultipartPostHandler import MultipartPostHandler # Import the handler Step 2: Create an Opener with MultipartPostHandler#
urllib2 uses "openers" to process requests. We’ll create an opener that includes MultipartPostHandler to handle multipart/form-data:
# Create an opener with MultipartPostHandler
opener = urllib2.build_opener(MultipartPostHandler)
urllib2.install_opener(opener) # Make this the default opener for urllib2.urlopen Step 3: Prepare Data to Send#
Define your form data, including files. For files, use a dictionary where keys are field names (e.g., 'file') and values are tuples (filename, file_object):
# Open the file in binary mode (critical for non-text files)
with open('example.txt', 'rb') as f:
# Data dictionary: regular fields as strings, files as (filename, fileobj)
data = {
'username': 'john_doe', # Regular form field
'file': ('example.txt', f) # File field: (filename, file object)
} Step 4: Send the POST Request#
Use urllib2.urlopen to send the request. The opener will automatically encode data as multipart/form-data:
url = 'https://api.example.com/upload'
response = urllib2.urlopen(url, data=data)
# Print the server response
print(response.read()) How It Works#
MultipartPostHandler detects tuples in the data dictionary (like ('example.txt', f)), treats them as files, and encodes the request body with multipart/form-data headers. Regular strings (like 'john_doe') are treated as form fields.
Handling Unicode File Uploads: Common Issues#
The above example works for filenames with ASCII characters (e.g., example.txt), but it breaks with Unicode filenames (e.g., café.txt, документ.docx). Why?
The Root Cause#
By default, MultipartPostHandler constructs the Content-Disposition header for files using raw filenames:
# Problematic code in MultipartPostHandler
disposition = 'form-data; name="%s"; filename="%s"' % (name, filename) If filename contains non-ASCII characters (e.g., café), this line raises a UnicodeEncodeError (in Python 2) or produces garbled text (e.g., café.txt), because the header is sent as bytes without proper encoding.
Fixing Unicode File Upload Problems#
To support Unicode filenames, we need to encode the filename parameter in the Content-Disposition header using the filename* attribute, defined in RFC 2231. This allows specifying the charset (e.g., UTF-8) for non-ASCII filenames.
Step 1: Patch MultipartPostHandler#
Modify MultipartPostHandler.py to encode filenames with filename*=UTF-8''<encoded_filename>. Here’s the critical fix:
Original Code (Broken for Unicode):
def __get_content_disposition(self, name, filename=None):
disposition = 'form-data; name="%s"' % name
if filename:
disposition += '; filename="%s"' % filename # ❌ Unencoded filename
return disposition Patched Code (Unicode-Friendly):
import urllib # Add this import for urllib.quote
def __get_content_disposition(self, name, filename=None):
disposition = 'form-data; name="%s"' % name
if filename:
# Encode filename as UTF-8 and quote special characters
encoded_filename = urllib.quote(filename.encode('utf-8'))
disposition += '; filename*=UTF-8\'\'%s' % encoded_filename # ✅ RFC 2231 encoding
return disposition Step 2: Test with Unicode Filenames#
Now, use a Unicode filename in your code:
with open(u'café.txt', 'rb') as f: # Note the 'u' prefix for Unicode string
data = {
'file': (u'café.txt', f) # Unicode filename
}
response = urllib2.urlopen(url, data=data)
print(response.read()) # Should now upload with the correct filename! Why This Works#
The filename*=UTF-8''%s syntax tells the server:
UTF-8: The filename is encoded with UTF-8.- The empty string
'': No language tag (optional). encoded_filename: The URL-encoded UTF-8 bytes of the filename.
Most modern servers (e.g., Node.js, Django, Flask) parse filename* correctly, ensuring the original Unicode filename is preserved.
Advanced: Sending Additional Form Fields#
You can mix files and regular form fields (text) in the data dictionary. For example, to send a user ID, file, and description:
with open(u' résumé.pdf', 'rb') as f:
data = {
'user_id': '123', # Text field
'description': u'My résumé (French accent)', # Unicode text field
'file': (u' résumé.pdf', f) # Unicode filename
}
response = urllib2.urlopen(url, data=data) MultipartPostHandler encodes text fields as UTF-8 by default, so Unicode text (like 'My résumé') works without extra effort.
Troubleshooting Tips#
UnicodeEncodeError: Ensure filenames are Unicode strings (prefix withu''in Python 2) and thatMultipartPostHandleris patched to encodefilename*.- File Not Uploaded: Verify the file is opened in binary mode (
'rb'). Text mode ('r') can corrupt binary files (e.g., images). - Server Rejects Request: Check the
Content-Typeheader.MultipartPostHandlershould set it tomultipart/form-data; boundary=.... Use a tool like Wireshark orcurl -vto compare requests. - Outdated
MultipartPostHandler: If you get syntax errors, ensure you’re using a Python 2-compatible version of the handler.
Conclusion#
MultipartPostHandler simplifies sending multipart/form-data with urllib2, eliminating the need to manually construct request bodies. By patching the handler to support RFC 2231-encoded filenames, you can reliably upload files with Unicode names.
While Python 3’s urllib.request and libraries like requests (which natively supports multipart/form-data) are better for modern projects, MultipartPostHandler remains a lifesaver for legacy Python 2 codebases.