Agentic Object Detection and Document Extraction with Landing.ai
This week, I dive into agentic object detection and document extraction using tools from Landing.ai, one of Andrew Ng’s innovative startups! Inspired by Andrew Ng’s recent post on X about their blazing-fast text extraction upgrades, I put their tech to the test. Here’s what I found:
Agentic Object Detection
Forget training models with tons of coffee cup images! Just describe the object, and the model nails it. Simple, smart, and efficient. For example, I noticed one of the coffee cups has a design made with milk that looks like a tree leaf, so I asked it to detect the ‘coffee in a cup with plant design’, and it successfully identified those cups. This differs from typical computer vision tasks (e.g., object detection or instance segmentation) where models are trained on specific object classes like cars or license plates.
In a screen recording, I specified detecting ‘windows with room lights on’ in a building picture, and it highlighted them with 100% accuracy. Similarly, using the singular ‘building’ (as instructed by the app) on a skyline image detected all buildings perfectly. Besides bounding boxes in the UI, it also provides JSON output with coordinates for API use.
Agentic Document Extraction
Prompted by Andrew Ng’s tweet, I tested document extraction. It handled an invoice, outputting details in markdown or JSON, and a lab report with images and mixed layouts (two-column and single-column) effortlessly. It even described the logo and formatted results consistently. Here is a screenshot from the video showing the extracted text
Andrew Ng’s X Post
Embedded below is Andrew Ng’s tweet:
Agentic Document Extraction just got much faster! From previous 135sec median processing time down to 8sec. Extracts not just text but diagrams, charts, and form fields from PDFs to give LLM-ready output. Please see the video for details and some application ideas. pic.twitter.com/29lOKf6UGO
— Andrew Ng (@AndrewYNg) May 27, 2025
He noted: “Agentic Document Extraction just got much faster! From previous 135sec median processing time down to 8sec. Extracts not just text but diagrams, charts, and form fields from PDFs to give LLM-ready output. Please see the video for details and some application ideas.”
Importance and Potential Use Cases
Why This Matters
The advancements in agentic object detection and document extraction from Landing.ai are significant because they simplify complex tasks that were previously time-consuming or required specialized knowledge. For a general person, this means tools that can understand and process visual and textual information in ways that mimic human intuition but with greater speed and accuracy. This technology can transform how we interact with digital content, making it more accessible and useful in everyday life.
Potential Use Cases
Here are some ways this technology can benefit you, regardless of your technical background:
- Personal Organization and Productivity
- Document Management: Imagine sorting through a pile of receipts, invoices, or personal documents. Landing.ai’s document extraction can automatically organize these into readable formats, saving you hours of manual work. For instance, it can extract key details from your utility bills or tax documents, making it easier to track expenses or prepare for tax season.
- Photo and Image Sorting: If you have a collection of photos, this technology can help identify and categorize them based on objects or scenes, like finding all pictures of a specific landmark or event without manually tagging each one.
- Education and Learning
- Research Assistance: Students and lifelong learners can use agentic document extraction to quickly summarize research papers, extract key data from scientific articles, or even transcribe handwritten notes into digital text. This speeds up the learning process and helps in creating study materials.
- Visual Learning Aids: Teachers can use object detection to create interactive learning materials, such as identifying objects in images for educational games or lessons. For example, a history teacher used it to help students identify artifacts, turning a dry lesson into an engaging game.
- Home and Lifestyle
- Smart Home Integration: Imagine a smart home system that can detect objects in your living space, like identifying which lights are on or off, or recognizing specific items in your kitchen. This can enhance automation and energy efficiency.
- Personal Projects: Hobbyists or DIY enthusiasts can use this technology to analyze blueprints, extract measurements from images, or even identify parts in a hardware store catalog, making project planning more efficient.
- Business and Entrepreneurship
- Small Business Operations: Small business owners can leverage document extraction to handle invoices, client contracts, or inventory lists without needing expensive software or extensive training. It can also help in analyzing customer feedback forms or survey results.
- Market Research: Entrepreneurs can quickly gather and analyze data from various sources, including visual content, to identify trends or customer preferences, aiding in decision-making.
- Accessibility and Inclusion
- Assistive Technology: For individuals with visual impairments, agentic object detection can describe images or detect objects in real-time, enhancing independence. Document extraction can convert printed materials into accessible formats, making information more reachable.
- Language Learning: Learners can use extraction tools to translate and understand documents in different languages, breaking down barriers to information.
- Everyday Simplification
- Travel and International Documents: Travelers can use document extraction to quickly understand foreign invoices or translate important papers, making international trips smoother. For example, extracting key details from a hotel receipt in another language can save time and reduce stress.
- Family Photo Albums: A parent might use object detection to organize years of family photos by identifying specific events or objects, like all pictures from a beach vacation or those featuring a particular family member.
These use cases demonstrate how Landing.ai’s technology can simplify tasks, save time, and open up new possibilities for personal and professional growth. Whether you’re organizing your home, learning something new, or running a small business, these tools can make your life easier and more efficient. Imagine the time you’d save if this technology could handle your document clutter or help you find that perfect photo from years ago—it’s not just for tech experts; it’s for anyone looking to make their day-to-day easier.
Landing.ai Video Demo
Check out this video by Richard demonstrating the technology (note: detection is sped up, so actual performance may be slower):
Playground
You can visit landing AI and try it out if you want to achieve this at their playground
Side Note: Landing.ai Support
I reported a non-working ‘Start for free’ button on their site. I received this email:
Hi Richard,
I hope all is well, and thank you for reaching out. I sincerely appreciate you letting us know the “Start for free” button isn’t working! The team is working on fixing it as we speak.
Best,
***
Then Adrian from Landing.ai confirmed via LinkedIn that the team is addressing the issue.
It seems fixed now—glad to help, and impressed by their proactive response!