We search THE dark side of web

WHAT IS MEMEX?


Today's web searches use a centralized, one-size-fits-all approach that searches the Internet with the same set of tools for all queries. While that model has been wildly successful commercially, it does not work well for many government use cases. To help overcome these challenges, DARPA launched the Memex program in September 2014.

Memex seeks to develop software that advances online search capabilities far beyond the current state of the art. The goal is to invent better methods for interacting with and sharing information, so users can quickly and thoroughly organize and search subsets of information relevant to their individual interests. Creation of a new domain-specific indexing and search paradigm will provide mechanisms for improved content discovery, information extraction, information retrieval, user collaboration and extension of

current search capabilities to the deep web,the dark web, and nontraditional (e.g. multimedia) content. DARPA is funding 17 teams to collaboratively develop software to solve this challenge. NASA JPL, Kitware, and Continnum are working in collaboration to develop and improve the Memex search technology. Currently, the team is using the software to address complex search problems including human trafficking, court documents, and research papers.

WHAT TO LOOK FOR IN MEMEX

Memex Explorer is a pluggable framework for domain specific crawls, search, and unified interface for Memex Tools. It includes the capability to add links to other web-based apps.

ImageCat analyses images and extracts their EXIF metadata and any text contained in the image via OCR. It can handle tens of millions of images.

LegisGATE is an application for running General Architecture Text Engineering over legislative resources.

FacetSpace allows the investigation of large data sets based on the extraction and manipulation of relevant facets.

ImageSpace provides the ability to analyze and search through large numbers of images based on associated metadata and OCR text or by uploading an image.

HOW IT HELPS


These are the domains that we are actively working in with Memex.

GeoInformatics in Human Trafficking

Collects data and information around victims of human trafficking with geospatial informatics capabilities

Facial Recognition

Manages photos of potential terrorists and finds other places they exist on the web

Material Research

Collect and analyzes data from research papers to create shared knowledge around an issue or topic

Court Citations

Crawls the web for court documents to help identify human traffickers

IN THE NEWS


PUBLICATIONS



ROACH: Online Apprentice Critic Focused Crawling via CSS Cues and Reinforcement

In Proceedings of the 14th International Workshop on Mining and Learning with Graphs held in conjunction with the ACM Conference on Knowledge Discovery and Data Mining (KDD 2018) | Read this article

Authors: Asitang Mishra, Chris A. Mattmann, Paul M. Ramirez, and Wayne M. Burke


Deep Mars: CNN Classification of Mars Imagery for the PDS Imaging Atlas

Thirtieth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI), New Orleans, LA, February 2-7, 2018 | Read this article

Authors: Kiri L. Wagstaff, You Lu, Alice Stanboli, Kevin Grimes, Thamme Gowda and Jordan Padams


Always Lurking: Understanding and Mitigating Bias in Online Human Trafficking Detection

In Proceedings of AAAI/ACM Conference on AI, Ethics, and Society 2018 | Read this article

Authors: Kyle Hundman, Thamme Gowda, Mayank Kejriwal, Benedikt Boecking


Cyber Persona Identification via Indirect Feature Analysis

In Proceedings of the GTA3 2018: Workshop on Graph Techniques for Adversarial Activity Analytics held in conjunction with the 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) 2018 | Read this article

Authors: Suzanne Stathatos, Asitang Mishra, Chris A. Mattmann


Ensemble Sentiment Analysis to Identify Human Trafficking in Web Data

In Proceedings of the GTA3 2018: Workshop on Graph Techniques for Adversarial Activity Analytics held in conjunction with the 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) 2018 | Read this article

Authors: Anastasia Menshikova, Chris A. Mattmann


Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science

In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Nova Scotia, Canada, August 2017 | Read this article

Authors: Kyle Hundman, Chris A. Mattmann


Ensemble Maximum Entropy Classification and Linear Regression for Author Age Prediction

In Proceedings of the IEEE International Conference on Reuse and Integration. San Diego, CA, USA., August 4-6, 2017. | Read this article

Authors: Joey Hong, Chris A. Mattmann, Paul M. Ramirez


An Approach for Automatic and Large Scale Image Forensics

In Proceedings of the Multimedia Forensics and Security Workshop colocated with the ACM International Conference on Multimedia Retrieval (ICMR), Bucharest, Romania, June 2017. | Read this article

Authors: Thamme Gowda, Kyle Hundman, Chris A. Mattmann


Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web.

In Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR), Bucharest, Romania, June 6-9, 2017 | Read this article

Authors: Chris A. Mattmann, Madhav Sharan


Improving accuracy of Tesseract in extraction of serial numbers from images of Counterfeit Electronics

In Proceedings of the Grace Hopper Celebration India (GHCI) 2016 Conference, Bangalore, India, December 7-9, 2016 | Read this article

Authors: Z. Parekh, C. Mattmann and K. Singh


Tattoo detection and localization using region-based deep learning

In Proceedings of the 23rd IEEE International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4-8 Dec. 2016. | Read this article

Authors: H. Zhaohui Sun, Jeff Buames, Paul Tunison, Matt Turek, and Anthony Hoogs


Clustering Web Pages Based on Structure and Style Similarity

In Proceedings of the IEEE International Conference on Information Reuse and Integration, Pittsburgh, Pennsylvania, USA, July 28-30, 2016 | Read this article

Authors: Thamme Gowda, Chris A. Mattmann


An Automatic Approach for Discovering and Geocoding Locations in Domain-Specific Web Data

In Proceedings of the IEEE International Conference on Information Reuse and Integration, Pittsburgh, Pennsylvania, USA, July 28-30, 2016 | Read this article

Authors: Chris A. Mattmann, Madhav Sharan


Multimedia Metadata-based Forensics in Human Trafficking Web Data

In Proceedings of the Workshop on Search and Exploration of X-Rated Information (SEXI) - 9th ACM International Conference on Web Search and Data Mining, San Francisco, California, USA. February 22-25, 2016 | Read this article

Authors: Chris A. Mattmann, Grace Hui Yang, Harshavardhan Manjunatha, Thamme Gowda N, Andrew Jie Zhou, Jiyun Luo, Lewis John McGibbney


Creating a Mars Target Encyclopedia by Extracting Information from the Planetary Science Literature

In Proceedings of the AAAI Workshop on Knowledge Extraction from Text, Phoenix, AZ, February 12-13, 2016 | Read this article

Authors: Kiri L. Wagstaff, Ellen Riloff, Nina L. Lanza, Chris A. Mattmann, Paul M. Ramirez


TREC Dynamic Domain: Polar Science

In Proceedings of the Text Retrieval Conference (TREC), National Institute of Standards and Technology, Gaithersburg, Maryland USA, November 17-20, 2015.| Read this article

Authors: Annie Bryant Burgess, Chris A.Mattmann, Giuseppe Totaro, Lewis John McGibbney, Paul M. Ramirez


THE TEAM


NASA JPL and Kitware are working in collaboration to develop and improve the Memex search technology. Currently, the team is using the software to address complex search problems including human trafficking, court documents, and research papers.

JPL logo

The Data Science Team at JPL coordinates research, development and operations of data intensive and data-driven science systems, methodologies and technologies across JPL Engineering, Science and Programs.


DR. CHRIS MATTMANN

Principal Investigator

PAUL RAMIREZ

Co-Investigator

WAYNE BURKE

Data Scientist

DR. LEWIS MCGIBBNEY

Engineering Applications Software Engineer

ASITANG MISHRA

Software Engineer

SUJEN SHAH

Software Engineer

KYLE HUNDMAN

Data Scientist

LAUREN WONG

UX Designer

ROB TAPELLA

UX Designer

KARANJEET SINGH

Data Scientist Intern (USC)

HARSHAVARDHAN MANJUNATHA

Data Scientist Intern (USC)

MADHAV SHARAN

Data Scientist Intern (USC)

THAMME GOWDA N.

Data Scientist Intern (USC)

Kitware, Inc. is a technology company that specializes in the research and development of open-source software in the fields of computer vision, medical imaging, visualization, 3D data publishing and technical software development.


DR. JEFFREY BAUMES

Site Principal Investigator

DR. MATTHEW TUREK

Assistant Director of Computer Vision

PAUL TUNISON

Computer Vision R&D Engineer

AASHISH CHAUDHARY

Technical Lead

DR. ZHAOHUI SUN

Computer Vision R&D Engineer

DAN LAMANNA

Scientific Computing R&D Engineer

FORMER TEAM MEMBERS


JPL logo

The Data Science Team at JPL coordinates research, development and operations of data intensive and data-driven science systems, methodologies and technologies across JPL Engineering, Science and Programs.

BRIAN WILSON

Principal Engineer, Architect

MICHAEL JOYCE

Scientific Applications Software Engineer

DR. KIM WHITEHALL

Scientific Application Software Engineer

MAZIYAR BOUSTANI

Scientific Applications Software Engineer

NIPURN DOSHI

UX Design Intern (Indiana University)

SHIVIKA THAPAR

UX Design Intern (Indiana Univeristy)

Continuum helps people discover, analyze, and collaborate by connecting their curiosity and experience with any data.


DR. KATRINA RIEHL

Co- Principal Investigator

BRITTAIN HARD

Software Developer

DR. MICHAEL SARAHAN

Data Scientist

REED YOUNGBLOOD

Data Scientist

DR. GUILHERME DE FREITAS

Data Scientist

DR. ARON AHMADIA

Computational Scientist

COLLABORATORS