Apache Tika?
Apache Tika is a toolkit/library that uses to detect meta data and extract contents from different types of files (such as .txt, .docx, .pdf, .ppt etc).
Tika parsed all types of file through single interface, with Tika make useful for search engine indexing, content analysis and much more.
Configuration
To setup a tika server on local machine, you need to go download page. Which looks like below image.
From download page just download with the 3rd link Viz. "Mirrors for tika-server-1.25.jar".
After download completion, go to the downloaded file and open a command prompt. Inside command prompt type below command to start tika server.
java -jar tika-server-1.25.jar
Now your tika server has been started on local environment and it'll provide an URL to access tika server.
http://localhost:9998/
After configure tika server go to your GOPATH directory, create a file with name as you want to give file name with .go extension. I'm taking as "tikaExample.go".
Execute below command to get the tika package.
go get -u github.com/google/go-tika/tika
tikaExample.go
package main
import(
"os"
"fmt"
"context"
"regexp""github.com/google/go-tika/tika"
)func main(){
fmt.Println("Main function started")
//Get the file and open it
file, err := os.Open("<file_path>/<file_name>.txt")
if err != nil {
fmt.Println(err)
}//Close the file
defer file.Close()//Create connection with tika server
client := tika.NewClient(nil, "http://localhost:9998/")//Read the content from file
body, err := client.Parse(context.Background(), file)fmt.Println("Extracted content = ", body)
}
Explaination
1. First we need to import required packages then in the main function we've to open the file with Open() with package "os".
2. After then we're connecting the Tika server using Tika URL.
3. Then after we're calling the Parse method of tika package with context and file, which will return the content of provide file, which will printed with println().
Thanks for the solution sir
ReplyDeleteIt's working.