What Is Multimodal Search?
Multimodal search enables searching with different input formats: text, image, voice or combinations thereof. Google Lens processes 15 billion search queries per month. Voice search is used regularly by over 40% of Swiss users. Circle to Search on Android allows circling objects on the screen. These channels are growing — and offer new SEO opportunities.
Image SEO: The Underrated Channel
With the rise of Google Lens and visual search, image SEO is becoming a standalone traffic channel. The most important measures:
Alt texts: Descriptive and contextually relevant. Not: 'image1.jpg'. Instead: 'Modern web design for a Zurich law firm — minimalist layout with a dark colour scheme'.
File names: Keywords in file names: 'webdesign-zurich-law-firm.webp' instead of 'IMG_20260315.jpg'.
Image quality: High-resolution, well-lit, in WebP or AVIF format for optimal file size.
Structured data: Schema.org Product, Recipe or ImageObject markup helps Google understand your images in the right context.
Voice Search Optimisation
Voice search differs fundamentally from text search: voice queries are longer (29 words vs. 3–4), more natural-sounding and formulated as questions. Optimisation: FAQ sections with natural questions and direct answers. Target long-tail keywords ('How much does a website cost in Zurich?' instead of 'website costs Zurich'). Implement Speakable schema. Maintain Google Business Profile (50%+ of voice searches have local intent).
Circle to Search and Visual Search
Circle to Search on Android devices enables: circling an object on the screen and immediately receiving information. This is particularly relevant for e-commerce businesses: users can search for products from screenshots, social media posts or photos. Optimisation: product photos on the website with a clear background, Google Merchant Center with up-to-date product data and structured product data (Schema.org Product).
The Multimodal Search Checklist
Implement immediately: 1) All images have descriptive alt texts. 2) Images in WebP/AVIF with optimised file names. 3) FAQ sections with natural-language questions. 4) Google Business Profile complete (for local voice and visual searches). Medium-term: 5) Implement Speakable schema. 6) Video content with transcripts and subtitles. 7) Product data in Google Merchant Center. Long-term: 8) Diversify content formats (text + image + video) for maximum search surface. DLM Digital helps with implementing a comprehensive multimodal search strategy.



