Secure file upload validation in Spring Boot

fileprotection

TL;DR

  1. File extension and browser-provided MIME type can both be faked: never rely on them alone.
  2. Use Apache Tika (or magic bytes) to detect the real file type from content.
  3. Always combine: extension check + content-based detection + size limit.
  4. Rename uploaded files with UUID, store outside webroot, and disable execute permissions.

The Day I Thought I Was Safe

A few months into my first production project, I added file upload. Users could submit a PDF report. My validation looked like this:

FileUploadController.java (naïve version)

java
if (!file.getOriginalFilename().endsWith(".pdf")) {
throw new IllegalArgumentException("Only PDFs allowed!");
}

Looked fine to me. We shipped it. A few weeks later, a senior dev on the team pinged me: “Bhai, rename any .php file to .pdf and upload it. See what happens.”

I tried it. It uploaded. Successfully. That file could have been executed if the server were misconfigured. That was my introduction to why file upload security is not just a checkbox.

This guide is what I wish I had back then.

The Two Mistakes Almost Everyone Makes

Mistake 1: Trusting the file extension

A file extension is just part of a filename string. The OS uses it as a hint — nothing more. Any user can rename malware.php to report.pdf before uploading. Your check passes. Their file is in.

Mistake 2: Trusting getContentType()

When a browser uploads a file, it sends a Content-Type header like application/pdf. Spring exposes this via MultipartFile.getContentType(). But here’s the thing:

The browser gets the content type from the file extension. It doesn’t open the file and inspect it. So a renamed PHP file will have its content type reported as application/pdf. And with tools like Postman or curl, an attacker can set any content type they want.

Both of these checks are easy to pass without actually having the right file type.

Think of it this way: checking the extension is like judging a person’s job by what their name tag says. Checking the browser’s content type is like believing it because they typed it themselves on the name tag. Neither tells you if they actually work there.

What Spoofing Actually Looks Like

A spoofed upload is simpler than it sounds. Here’s all it takes from an attacker’s side:

Terminal — curl-based spoof upload
bash
# Attacker uploads a PHP file disguised as a PDF
curl -X POST https://yourapp.com/upload \
-F "file=@shell.php;filename=invoice.pdf;type=application/pdf"

The filename says .pdf. The content type says application/pdf. But the actual content is PHP code. If your server only checks those two things, this gets through.

The Production-Ready Approach

Here’s a layered strategy. Think of it like security checkpoints — the more layers, the harder it is to sneak through.

  1. Extension Check: Quick allowlist filter: reject obvious nonsense early
  2. Size limit: Block resource abuse and DoS attempts
  3. Content Detection: Read actual file bytes: the real security check
  4. Rename + Store Safely: UUID name, outside webroot, no execute permission

First: Configure size limits in Spring

application.ymlyaml

spring:
servlet:
multipart:
max-file-size: 10MB
max-request-size: 12MB

The main validator service

// FileValidationService.javajava
@Service
public class FileValidationService {

// Step 1: Allowlist of permitted extensions
private static final Set<String> ALLOWED_EXTENSIONS =
Set.of("pdf", "jpg", "jpeg", "png");

// Step 2: Allowlist of permitted detected MIME types
private static final Set<String> ALLOWED_MIME_TYPES =
Set.of("application/pdf", "image/jpeg", "image/png");

private final Tika tika = new Tika();

public void validate(MultipartFile file) {
if (file.isEmpty()) {
throw new ValidationException("File is empty");
}

// Extract and check extension
String originalName = file.getOriginalFilename();
String ext = getExtension(originalName).toLowerCase();

if (!ALLOWED_EXTENSIONS.contains(ext)) {
throw new ValidationException("File type not allowed: " + ext);
}

// Content-based detection via Tika (reads actual bytes)
try {
String detectedMime = tika.detect(file.getInputStream());

if (!ALLOWED_MIME_TYPES.contains(detectedMime)) {
throw new ValidationException(
"File content mismatch. Detected: " + detectedMime
);
}
} catch (IOException e) {
throw new ValidationException("Could not read file content");
}
}

private String getExtension(String filename) {
if (filename == null || !filename.contains(".")) {
return "";
}
return filename.substring(filename.lastIndexOf(".") + 1);
}
}

Note: tika.detect(InputStream) reads the first few bytes of the file (called “magic bytes”) to figure out its real type. This is what the browser doesn’t do, but Tika does. This is the key check.

The Maven dependency for Tika

// pom.xml
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>2.9.1</version>
</dependency>

Note: Use tika-core, not tika-parsers. The core library is lightweight (a few hundred KB) and handles detection well. The full parsers bundle is ~100MB : way too heavy for this use case.

Safe file storage service

//FileStorageService.javajava
@Service
public class FileStorageService {

// Store outside the webroot — not under src/main/resources/static
private final Path storageDir = Paths.get("/var/app/uploads");

public String store(MultipartFile file, String originalName) {
// Generate a UUID-based filename — never trust user-provided names
String ext = getExtension(originalName);
String storedName = UUID.randomUUID() + "." + ext;
Path destination = storageDir.resolve(storedName);

try (var inputStream = file.getInputStream()) {
// Stream copy — avoids loading entire file into memory
Files.copy(inputStream, destination,
StandardCopyOption.REPLACE_EXISTING);

// Disable execute permissions (Linux/Mac)
destination.toFile().setExecutable(false, false);

} catch (IOException e) {
throw new StorageException("Failed to store file", e);
}

return storedName; // Save this to your DB, not the original name
}
}

Do You Actually Need Apache Tika?

Tika is the most reliable option for production, but let’s be honest about tradeoffs.

Situation Recommendation
Financial, healthcare, or compliance-heavy app Use Tika
Internal tool with trusted users Extension check may be enough
High-traffic API, lots of uploads per second Tika is fine, detection is fast
Need to detect 100+ file types precisely Use Tika
Tiny microservice, need minimal deps Magic bytes manually (see below)

Lightweight alternative: magic bytes manually

Every file format has a known “signature” in its first few bytes. For example, a real PDF starts with the bytes 25 50 44 46 (which is %PDF in ASCII). You can check this without any library:

// MagicBytesUtil.javajava
public static boolean isPdf(InputStream input) throws IOException {
byte[] header = input.readNBytes(4);
// PDF magic bytes: %PDF
return header.length == 4
&& header[0] == 0x25 // %
&& header[1] == 0x50 // P
&& header[2] == 0x44 // D
&& header[3] == 0x46; // F
}

Caution: Magic bytes work well for common formats (PDF, PNG, JPEG, ZIP). But if you support many file types, maintaining your own magic byte lookup gets messy fast. Tika covers all of this for you.

Performance-wise: Tika detection reads just the first few hundred bytes. For most uploads, it adds under 5ms. Unless you’re processing thousands per second, it won’t be your bottleneck.

Best Practices from Real Systems

1. Always stream the file — never read it all into memory

Use file.getInputStream(), not file.getBytes(). A user uploading a 500MB file with getBytes() will put 500MB in your JVM heap. Do this with 10 concurrent users and you’re looking at an OOM crash. Streams avoid this entirely.

2. Rename every file to a UUID

Never store the original filename from the user. Filenames can contain path traversal sequences like ../../etc/passwd. A UUID filename is safe, unique, and gives you no attack surface. Store the original filename in your database for display purposes.

3. Store metadata in the database

// FileMetadata.java (JPA entity sketch)

@Entity
public class FileMetadata {
private String storedName; // UUID-based, used on disk
private String originalName; // From user, stored but never used as path
private String detectedMimeType; // From Tika, not from browser
private Long sizeInBytes;
private Long uploadedBy; // User ID
private LocalDateTime uploadedAt;
}

4. Store files outside the webroot

If you store uploads inside your static resources folder, someone who finds the filename can access the file directly via URL. Store them at a path like /var/app/uploads/ and serve them through a controller that checks authentication first.

5. Disable execute permissions on Linux

Call file.setExecutable(false, false) after saving. This ensures that even if a malicious file sneaks through somehow, the OS won’t run it as an executable.

6. Optional: virus scanning for high-risk applications

If your users upload files that others will download (document sharing, job portals, etc.), consider integrating ClamAV via the clamav4j library. It’s free, open source, and can scan files before they’re stored. Overkill for most internal tools, but worth it for platforms with public-facing file sharing.

Here’s your content cleanly formatted as a table:

Check Can be faked? Use it? Why
getOriginalFilename() extension Yes, easily As pre-filter Rejects obviously wrong types fast, before hitting Tika
getContentType() from browser Yes, trivially Don’t rely on it Attacker-controlled. Log it, don’t trust it
Tika / magic bytes detection Very hard Yes, required Reads actual file content — the source of truth
Extension matches detected type No Yes, add this check Cross-validate: extension + Tika result must agree

Bonus:

Here I am giving the open disucssion that we have done on Reddit: https://www.reddit.com/r/SpringBoot/comments/1sahvs1/comment/oe17vbj/

Share this article:
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *